I’ve created a minimal example using a HDP sandbox installation. I am trying to download a csv file, which I can sucecssfully list using the “list remote files node”. After the timeout period I am getting a message:
“ERROR Download 2:45 Execute failed: Could not obtain block: BP-32082187-172.17.0.2-1517480669419:blk_1073742653_1830 file=/user/maria_dev/data/geolocation.csv”
What am I doing wrong? Using Ambari, I can download the file from the VM to the host. So the connection is working and HDFS, too. File Listing works in Knime as well (including navigation through the directory tree of the HDFS)
EDIT: I can connect and read data using python, so the problem is not in the connection itself. Note, that in the python code I am directly connecting to the NameNode
from hdfs import InsecureClient
import pandas as pd
import io
hostname = '127.0.0.1'
port = 8020
hdfs_path = '/user/maria_dev/data/trucks.csv'
local_path = 'C:/tmp'
client = InsecureClient('http://localhost:50070', user='maria_dev')
# Loading a file in memory.
with client.read(hdfs_path) as reader:
features = reader.read()
data = pd.read_csv(io.BytesIO(features), encoding='utf8', sep=",", lineterminator='\r')
print(data.describe())
THX for hlp
Ingo