Local File Handling

Hi there,

This might be a super simple solution, but I can’t seem to find it. The new file handling framework is quite powerful, however I am unsure how to use this with just basic local file connections. It seems to a be a lot of back and forth between string-to-path and path-to-string as the readers need a path for reading files, however any sort of manipulation of the path requires the conversion to a string to use a String Manipulation (Variable)/Column Expressions node. This is making the local file processing much more complicated than it was before, when the readers just needed a simple string input. Now there is the path variable type, with which doesn’t play ball with the other variable manipulators - or maybe I’m missing something basic here…

Any ideas are appreciated!

image

Hi @supersharp , the file readers (File Reader, CSV Reader, Excel Reader, etc) can take variable type Path to point to a file, and the List Files/Folders node generates that path.

Also, the new readers can themselves read multiple files within the same folder. It all depends on what you are trying to do after reading the files to know if you need to loop through each file one by one (in which case you can use a List Files/Folders + a Table Row to Variable Loop Start), or if you can just read all files at once directly through the reader.

2 Likes

Thanks @bruno29a. I’m still stuck on how to pass this path, is there a starting point to connect a local path to these readers, they way that it can be done with Microsoft for example?

image

Thanks!

@supersharp you might want to take a look at this example how to work with paths and local Excel files.

This is a collection of more general handlings of path variables.

And then there is the lengthy but very useful:

https://docs.knime.com/latest/analytics_platform_file_handling_guide/index.html

Then if you work with remote connections you might have to combine remote connections with this node to create a path variable:

3 Likes

Hi @supersharp , it is as I said before: You can either use a List Files/Folders + a Table Row to Variable Loop Start to go through each file individually, or use the Excel Reader alone to read all the files at once. It will depend on what you are trying to do after.

I’ve put something together to show both options. The workflow looks like this:
image

Details as follows.
Let’s look at the first method, opening each file individually:
Here, I’m pointing to the data folder of the Workflow, so I’m choosing the “Relative to” path (so I can include the 4 sample files in the export), but it works the same way with Local path. Knime will build the relevant path automatically.

And that’s what the List Files/Folders node returns:
image

After plugging the Table Row To Variable Loop Start, this is what it generates after the first iteration:
image

It created a variable called “Path” that holds the path of the first file. Because I chose the “Relative to”, it generates a path in the format of (RELATIVE, knime.workflow.data, ./test_multi_1.xlsx). If you use Local File System, it will generate a path in the format of (LOCAL, C:\your_path_to_your_file\your_file.xlsx)

In the Excel Reader, you just need to assign the variable “Path” as the File. Just click on the Variable button:

You’ll get a popup to choose the variable:
image

Once you set that up, you will the see options in the Input Location will be greyed out / disabled:

This is because these options are controlled by the variable “Path”. You should also see this message t the bottom which confirms it:

And you should see the Preview of the data of the first file:

You can then do whatever process you want to do - string manipulation, writing to a file, etc, and then close the loop with a Loop End node. Note that, depending on the operations that you do, for example writing to a file, you will not have any data output, in which case you would want to use a Variable Loop End instead, like this:
image

For the second method, opening all files at once:
This an be done directly via the Excel Reader node itself:

What’s important here is that you use the “Files in folder” option for the Mode, and it will look at all the files in the folder that you point to.

And in the Preview, you will see the data of all the files:

If you don’t care where the data comes from, then that’s enough. However, if you do want to know where the data is coming from, you can add the “Append path column” option from the Advanced Settings tab. This will append the path of the file from which the data is coming from:

Here is the demo workflow: File handling.knwf (40.9 KB)

5 Likes

Thanks for the detailed explanations @bruno29a @mlauber71.

Your example highlighted a potential difference in how the variables are being handled by different nodes.

  • In the Table Row to Variable node, the Path column remains a Path variable.
  • However, with Group Loop Start, the Path column gets converted to a String variable.

My application is a simple - read through all Excel sheets in all Excel workbooks within a folder, so need to use the Group Loop start as there are a variable number of sheets per Excel file (the single Excel reader does not allow to read multiple sheets in a single file, and the Table Row to Variable is cannot cycle through folder structures).

Unless there is something else I’m missing, I have 2 suggestions:

  1. All handling of paths to be consistent for different nodes (e.g. a Group Loop Start node should auto convert a Path data type into a Path variable type, like it does for Table Row to Variable node)
  2. The local file handling should perform the same way it is built for the other connectors e.g. AWS, Microsoft, etc. i.e. the ability to use the blue input port, rather than needing flow variables. There is a dramatic difference in build for a local OneDrive vs. a Sharepoint connection to Excel files - yet they are both Microsoft. The local file management is more complicated now because of path variables, and the need to convert variables back-and-forth between string and path variables. Having it work with the blue input ports will prevent these workarounds.

Would you agree?

Thanks!

1 Like

Hi @supersharp ,

From your design:
image

This would give you the path for ONE file only. How are you reading all the files in your design?

And why are you using Group Loop, especially if you are going to group by both Path and Sheet? We know that the Sheet is unique, and you are reading only 1 file, so Path + Sheet will be unique, might as well use the Table Row to Variable Loop Start instead.

The Path variable that you see in the Group Loop Start is a different variable than the original one. The original one is still being carried over. The one you are referring to is a different variable generated by the Group Loop Start - though I don’t know why it changes it to String type:
image

It could be it’s because you are grouping by the Path column, which converts it to string. But while we can see both in the Flow variable list, you are indeed correct that the one as the Path type is not being seen by the Excel Reader. I think that’s a bug.

In your case, you do not need to group by Path. If you remove Path, then it will not create a new variable Path as String, and the original one gets carried over as a Path variable:

But design-wise, there is nothing to group by here, and as you can see, the Group Loop Start outputs a data port, which you don’t need. The best loop to use here is still the Table Row To Variable Loop Start.

You actually need 2 of them - 1 for looping through the sheets, and one for looping through the files. Something like this:

For your Excel configuration, you can choose the Path variable for the file as before. And do these additional steps:
Make sure you choose “Select sheet with name” option. No need to choose any sheet from the drop down:

Then assign the variable for the sheet name in the Flow Variables tab:

If your file will have different structure, check the option “Support changing file schemas” in the Advance Settings tab:

Here’s the updated demo workflow: File handling.knwf (49.1 KB)

2 Likes

Hello there @supersharp!

if reading and understanding it correctly Local File System Connector node that is coming with version 4.5. maybe can help you out. It is already part of nightly build and you can try it out. Download nightly here.

Br,
Ivan

2 Likes

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.