Reading flat files into strings

I have a folder that contains a set of text files and need to read each file in that folder to create a table with one row per file and two columns: 1) filename and 2) all the text in the file as a string, preserving all whitespaces and newline characters and not trying to interpret or parse the contents of the files in any way. Surprisingly, it seems that there’s no standard node to accomplish this simple task. The closest one I could find is the Flat File Document Parser, but it produces parsed documents, not strings. I have been able to emulate the desired behavior by combining List Files/Folder, Table Row to Variable Loop Start, Line Reader, Group By and String Manipulation nodes, but this seems to be an overkill for such a trivial task. Perhaps, there exist a simple file-to-string node somewhere, but I just could not find it?

Hi @drassokh , welcome to the KNIME community forum.

You are right that there isn’t an immediately obvious solution to what you might think would be a relatively straightforward task (or if there is, I never found it either!).

However, in spite of that there is a way of doing what you require, I think, although you would probably never accidentally stumble on it because it is counter-intuitive.

Because you don’t want the files to be interpreted in any way, they need to be handled as “binary” files (even though they are text). For this there is a node “Files to Binary Objects”. However, for some reason, this node by itself doesn’t make things easy for you right away, because it requires the full path names of the files, but not as PATH or STRING but as URI (obviously! :sweat_smile: )

Luckily there is a Path to URI node which can help us with this, and after the files have been read as Binary, the created “binary objects” can be converted to Strings using “Binary Objects to Strings”.

In summary… you can do this:

So still not a “one node” solution, but I hope that moves you forward.

3 Likes

Thanks!! It’s a bit strange that there’s no standard ‘Flat Text Files to Strings’ node (which would understand both URL’s and local file paths to source files) for such a simple basic task as reading a set of flat text files into strings, but your approach works like a charm :slight_smile:

1 Like

Please mark @takbb’s solution solved.

1 Like

I’m a little late to this thread, but I think what you need is just the Tika Parser if I’m not mistaken:

2 Likes

Good point @scottF!

I’ve not used Tika Parser other than in helping somebody on the forum once, so didn’t think of that, though I always associate it more with parsing pdfs and the like rather than plain text files, but that’s my lack of experience with it. :wink:

Now on the subject of extensions, I did a quick hub search and here are a couple of other nodes that I’ve not personally, but look right for the job too. These are from Vernalis:

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.