DATABRICKS(dbfs) file download/ read into knime

Hi Team,

I had connected to Databricks using DBFS node and got the list of files available under dbfs location.

I need to read the csv files from dbfs to knime. When i tried to use CSV Reader node I am getting below error.

WF

Any help will be really helpful for reading the csv file from DBFS location into knime

Hello mathi,

currently you can not read a CSV file directly from DBFS. You first need to download it locally and then read the downloaded file into KNIME. Please have a look at this workflow for a related example.

We are currently implementing a complete new file handling framework for KNIME which will allow you to read/write directly from/to various file systems including DBFS. However we do not have the corresponding Databricks connector as of today but it is planned for the winter release. So please stay tuned.

Bye
Tobias

4 Likes

Thanks for the response @tobias.koetter.

  1. But i need to make the file downloadable to the user using FileDownloadWidget.
    In that case is there any way to make the file downloadable to user from DBFS location?

  2. And while trying to pass the path of the downloaded csv file, I am getting error. Is there any efficient way to make the file downloadable to the user from knime workspace location.

Hi @tobias.koetter,

Reading remote file handling feature is available in new version of knime?

Hi mathi,
yes the new file handling framework is available in KNIME Analytics Platform 4.3. However if you want to provide the CSV file to the user via the File Download Widget you still need to download it first using the Transfer Files node and the Databricks File System Connector.
I will create a feature request to support direct loading of a file from a remote file system as well.
Bye
Tobias

3 Likes

Dear @tobias.koetter ,
My situation is a little different from the one by @mathi : I have already downloaded all the desired DBF files to a specific folder in my local system, and I wish to read them as a loop.
I tried the following sequence of Knime nodes:
Possible_Loop_for_reading_DBF_Files.knwf (19.3 KB)
image
It all went very well on the two first nodes, but I could go no further (most likely due to a mistake while configuring the “File Reader” node). I tried a bunch of individual settings on it, including by checking the option “Support short data rows”, but it didn’t read the files. BTW, I set the “Path” from the output of the previous node (the “Table Row to Variable Loop Start” node) as a flow variable.
Would you mind helping me with this reading of several (80) DBF files, using Knime nodes, instead of coding in a programming language, such as R or Python, for I’m not an IT professional or student (i.e., programmer)?
Thanks in advance.
B.R.,
Rogério.

If you hover over the error node you should get and explanation what the issue is
br

Dear @Daniel_Weikert,
I’ve already passed the mouse over the node with the error message (the “File Reader” node), and got this:
image
When I tried several different settings (on the encodings, on the different options to be checked/unchecked, etc.), no change brought me the expected reading of the files (one by iteration along the loop).
Besides, I tried another workflow (suggested on the Knime.Hub), from:

and inserted the “Java Edit Variable” Knime node between the nodes “Table Row to Variable Loop Start” and “File Reader”. But I got no enhancements on the performance of this node for reading the files.
Could you help me with this configuration? What should be the code for each setting on this latter node?
Otherwise, do you know another node that could read several (DBF) files in a loop like the ones above? Especially one node with no coding (or “low coding”)?
Thank you for any help.
ATB,
Rogério.
P.S.: The DBF files I downloaded are public, but are large enough to be converted into CSV format. Would it be useful if I posted here the URL for accessing the original files?

@rogerius1st I fear that reviving old threads will not give you a different answer (see below). If we are still talking about the same files (DBC/DBF) KNIME simply does not have a node to just import them.

If we are talking about DBF files LibreOffice still does support the format. That would be importing individual files.

Please note. There seem to be several different formats with similar extensions but they are not the same. DBFS in this case is Databricks File System.

1 Like

doesn’t the file reader support the action “support short data lines” or sth. equivalent?