Parquet Writer: Add support for Path type

mwiegand · March 27, 2025, 7:47am

Hi,

I notiuced that the Parquet writer does not support the path type. Can you consider that when working on the general path support in Knime nodes please?

ERROR Parquet Writer       3:1958:0:1989 Execute failed: Output type mapping is missing for column:
Output Location (Path)

Best
Mike

tobias.koetter · March 27, 2025, 8:38am

Hi Mike,

do I understand you correctly that you want to write a path column to a Parquet file? If that is the case what is the use case for this? Would you want to work with the Parquet file also outside of KNIME?

The problem with this is how we would represent the proprietary KNIME path type in Parquet so that it can be processed with other tools. Internally the path consists of the following three parts:

Category e.g. local or connected
(Optional) specifier e.g. amazon-s3:eu-west-1
Path e.g. /home/user/file 1.csv

I would suggest to either use the Path to String node or for more flexibility the Path to URI node (which allows you to select the URI format) and then the URI to String node.

Bye
Tobias

mwiegand · March 27, 2025, 9:23am

Hi @tobias.koetter,

my primary intent is less about transferring data, as I too see little value in app specific data types being made available to other apps, but in saving values temporarily in most efficient way. With efficient I mean:

Less storage consumption usign compression
No pre- and post processing required (Path > String > Write > Read > String > Path)

My current situation is that I process >1.500.000 JSON files, which I am distilling down to a usable structured data set, and the processing overhead of handling paths adds quite some time to an already lengthly process. The JSON files contain a varying set of information so extracting the data into a structured format / DB is not directly possible.

There is one more aspect, appending data to an existing file similar to CSV, but that is a different subject.

Best
Mike

mlauber71 · March 27, 2025, 9:27am

@mwiegand I would assume that Parquet will not support a KNIME specific format like Path variable …

mwiegand · March 27, 2025, 9:58am

@mlauber71 you are right, I totally forgot this file format is not developed by Knime. Sorry for the commosion …

hotzm · March 27, 2025, 10:33am

What’s wrong with the KNIME table format for temporary storage? The Table Writer does not support append, sure, but its storage format is the KNIME table format, which is compressed as well.

mwiegand · March 27, 2025, 11:19am

Missing compression and some odd situations I reported presumably related to JSON resulting in file corruption respectively the inability of reading the file in that workflow (I know, odd and uspecific but real).

I suppose it was that topic (just for reference):

Though, you mentioned the table file type supports compression. I checked and it didn’t seem to be the case. Maybe because I created a collection (set) of a path column type?

Best
Mike