Hadoop files

rghadge · October 23, 2017, 5:57pm

I have seen nodes which can upload data/files on to hadoop cluster or download data/files from hadoop cluster. but -->

1. Is there a node to write a file after data processing on hadoop?

2. Or read a particular file from HDFS?

3. Or read and move files from one HDFS location to another?

Please advise.

Thanks,

Rahul G

bjoern.lohrmann · November 8, 2017, 12:22pm

Hi,

> 1. Is there a node to write a file after data processing on hadoop?

For the case of Spark, yes. When you have some Spark nodes and want to save the resulting data from Spark directly into HDFS (without the data touching the client), then you can use the Spark to XYZ nodes, where XYZ stands for one of the following file formats: Parquet, ORC, Avro, JSON, CSV

> 2. Or read a particular file from HDFS?

There are analogous XYZ to Spark nodes for reading HDFS files into Spark.

> 3. Or read and move files from one HDFS location to another?

We don't have a dedicated node for moving files in HDFS currently.

Best,

Björn

rghadge · November 20, 2017, 3:55pm

Hi Bjorn,

What I want to do is I am reading data from Hive External Table and I need to store it in HDFS path which will act as partition of another Hive External table. How do I achieve that?

For example - Read a data from Hive table, do some processing using database nodes and write that data into some HDFS location. How do I write that data? I tried Node "upload " but it only uploads files into HDFS whereas I want to write some data into HDFS for refreshing partition into Hive External Table. Please advise.

Thanks,

Rahul

system · June 2, 2023, 9:03pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.