Export data from database to corresponding linux file system and not knime table

Hi - I have a requirement to export a query output from bigdata using impala or hive into the corresponding linux server and not as knime table and then download as csv. Is there a way to do that ?

Hi,
Sure this is possible. First connect to your Big Data system using for example the Hive Connector, then use the DB Table Selector followed by a DB Reader to read data into KNIME. Alternatively, you can filter your data before reading it using a DB Row Filter. Once the data is in KNIME, use a CSV Writer to write it to disk. If the table is too large to read completely before writing it, you can use streaming. Simply wrap the DB Reader and the CSV Writer into a component and configure it for streaming.
Kind regards,
Alexander

3 Likes

thank you. I realize adding DB reader to read into knime. but is there a way to directly download from database to file system on the server using any DB nodes. I dont think that is possible , right? Also , i had to use sql executor and not DB table selector as i had to write series of statements like truncate , then create and select. But i could not send this output to DB reader directly. looks i have to use DB query reader again ?

I think you would have to investigate using Spark and hdfs.

https://www.knime.com/nodeguide/big-data/spark-executor

This workflow writes data back from Spark to hdfs. I have not tried it myself.

If you then want to reuse the (parquet) data you could use external tables.

thanks , but we do not have jobserver implementation in our ecosystem. so not sure how to create spark context, if i use create local big data environment ,how do i get it running on my server and access the hive tables in the database ? Want to try with spark but had issues , so going with hiveql alone for now.

My simple requirement : Find a solution to truncate various temp tables in hive and insert these one by one . there are about 5 tables ,After this i join these tables and build a csv report with a header and then email it out . that has to be emailed out .so i tried to create a db connector -> 5 sql executors or one sqlexcutor with all 5 truncates and inserts -. I used sql executor because i can write truncate and insert query within the same node , but the output port for this is a connection. so i was thinking may be to use db query reader post this, but it would select from the same temp table. and wont it be redundant. is there any easy way to achieve this. I’m sure there is, just that im a newbie to this tool and may be still exploring. thanks. I have done this via unix shell script and trying to convert this process using knime to evaluate the feasibility and benefits.

I fear I do not fully understand your scenario. If you want to use SQL functions you could think about using a ‘local’ SQL database like SQLite or H2.

I have several workflows on the KNIME hub demonstrating Big Data functions, some of them adressing HDFS. Here you could easily combine serveral CSV or parquet files into one external table and then work with that if you do not want to use loop and if you have the data already on the HDFS.

Maybe you could try to explain your scenario or even build a dummy workflow representing your problem. Often with KNIME it is easier to work on ‘real’ data.

And with the local Big Data environment you could simulate most of the Big Data functions that then would also work on a (eg.) Cloudera cluster.

You should keep exploring and asking questions. KNIME is suited to solve most data problems.

Hi there @gm02,

don’t fully understand what are you trying to achieve but don’t think it is that simple :smiley:

In order to move data from one place (source database) to another (destination) with KNIME you have to read data into KNIME. (Actually would be very cool if you would not need to do so but usually you will do it anyways cause you need/want to perform transformations on it). If you perform all your operations (joining, truncating and inserting) in one DB SQL Executor and save it to temp table you can use DB Query Reader after that.

Hope this helps!

Br,
Ivan

Makes sense. i read to knime and perform my transformations and then write back to DB . Let me try that way .Also i have multiple joins to be implemented. so i chose sql executor i would want to try and use joiner node may be split into different joins and try that as well. Thanks for your input.

1 Like

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.