Hadoop files

I have seen nodes which can upload data/files on to hadoop cluster or download data/files from hadoop cluster. but -->

1. Is there a node to write a file after data processing on hadoop?

2. Or read a particular file from HDFS?

3. Or read and move files from one HDFS location to another?

Please advise.

Thanks,

Rahul G

Hi,

> 1. Is there a node to write a file after data processing on hadoop?

For the case of Spark, yes. When you have some Spark nodes and want to save the resulting data from Spark directly into HDFS (without the data touching the client), then you can use the Spark to XYZ nodes, where XYZ stands for one of the following file formats: Parquet, ORC, Avro, JSON, CSV

> 2. Or read a particular file from HDFS?

There are analogous XYZ to Spark nodes for reading HDFS files into Spark.

> 3. Or read and move files from one HDFS location to another?

We don't have a dedicated node for moving files in HDFS currently. 

 

Best,

Björn

 

 

Hi Bjorn,

What I want to do is I am reading data from Hive External Table and I need to store it in HDFS path which will act as partition of another Hive External table. How do I achieve that? 

For example - Read a data from Hive table, do some processing using database nodes and write that data into some HDFS location. How do I write that data? I tried Node "upload " but it only uploads files into HDFS whereas I want to write some data into HDFS for refreshing partition into Hive External Table. Please advise.

 

Thanks,

Rahul

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.