Export data from database to corresponding linux file system and not knime table

gm02 · May 13, 2020, 6:37am

Hi - I have a requirement to export a query output from bigdata using impala or hive into the corresponding linux server and not as knime table and then download as csv. Is there a way to do that ?

AlexanderFillbrunn · May 13, 2020, 7:37am

Hi,
Sure this is possible. First connect to your Big Data system using for example the Hive Connector, then use the DB Table Selector followed by a DB Reader to read data into KNIME. Alternatively, you can filter your data before reading it using a DB Row Filter. Once the data is in KNIME, use a CSV Writer to write it to disk. If the table is too large to read completely before writing it, you can use streaming. Simply wrap the DB Reader and the CSV Writer into a component and configure it for streaming.
Kind regards,
Alexander

gm02 · May 13, 2020, 4:33pm

thank you. I realize adding DB reader to read into knime. but is there a way to directly download from database to file system on the server using any DB nodes. I dont think that is possible , right? Also , i had to use sql executor and not DB table selector as i had to write series of statements like truncate , then create and select. But i could not send this output to DB reader directly. looks i have to use DB query reader again ?

mlauber71 · May 14, 2020, 4:44am

I think you would have to investigate using Spark and hdfs.

https://www.knime.com/nodeguide/big-data/spark-executor

This workflow writes data back from Spark to hdfs. I have not tried it myself.

If you then want to reuse the (parquet) data you could use external tables.

gm02 · May 14, 2020, 6:31am

thanks , but we do not have jobserver implementation in our ecosystem. so not sure how to create spark context, if i use create local big data environment ,how do i get it running on my server and access the hive tables in the database ? Want to try with spark but had issues , so going with hiveql alone for now.

My simple requirement : Find a solution to truncate various temp tables in hive and insert these one by one . there are about 5 tables ,After this i join these tables and build a csv report with a header and then email it out . that has to be emailed out .so i tried to create a db connector -> 5 sql executors or one sqlexcutor with all 5 truncates and inserts -. I used sql executor because i can write truncate and insert query within the same node , but the output port for this is a connection. so i was thinking may be to use db query reader post this, but it would select from the same temp table. and wont it be redundant. is there any easy way to achieve this. I’m sure there is, just that im a newbie to this tool and may be still exploring. thanks. I have done this via unix shell script and trying to convert this process using knime to evaluate the feasibility and benefits.

mlauber71 · May 14, 2020, 6:47am

I fear I do not fully understand your scenario. If you want to use SQL functions you could think about using a ‘local’ SQL database like SQLite or H2.

I have several workflows on the KNIME hub demonstrating Big Data functions, some of them adressing HDFS. Here you could easily combine serveral CSV or parquet files into one external table and then work with that if you do not want to use loop and if you have the data already on the HDFS.

Maybe you could try to explain your scenario or even build a dummy workflow representing your problem. Often with KNIME it is easier to work on ‘real’ data.

And with the local Big Data environment you could simulate most of the Big Data functions that then would also work on a (eg.) Cloudera cluster.

You should keep exploring and asking questions. KNIME is suited to solve most data problems.

ipazin · May 14, 2020, 2:22pm

Hi there @gm02,

don’t fully understand what are you trying to achieve but don’t think it is that simple

In order to move data from one place (source database) to another (destination) with KNIME you have to read data into KNIME. (Actually would be very cool if you would not need to do so but usually you will do it anyways cause you need/want to perform transformations on it). If you perform all your operations (joining, truncating and inserting) in one DB SQL Executor and save it to temp table you can use DB Query Reader after that.

Hope this helps!

Br,
Ivan

gm02 · May 14, 2020, 6:57pm

Makes sense. i read to knime and perform my transformations and then write back to DB . Let me try that way .Also i have multiple joins to be implemented. so i chose sql executor i would want to try and use joiner node may be split into different joins and try that as well. Thanks for your input.

system · November 13, 2020, 7:12am

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.