Parquet Writer: Add support for Path type

Hi,

I notiuced that the Parquet writer does not support the path type. Can you consider that when working on the general path support in Knime nodes please?

ERROR Parquet Writer       3:1958:0:1989 Execute failed: Output type mapping is missing for column:
Output Location (Path)

Best
Mike

Hi Mike,

do I understand you correctly that you want to write a path column to a Parquet file? If that is the case what is the use case for this? Would you want to work with the Parquet file also outside of KNIME?

The problem with this is how we would represent the proprietary KNIME path type in Parquet so that it can be processed with other tools. Internally the path consists of the following three parts:

  1. Category e.g. local or connected
  2. (Optional) specifier e.g. amazon-s3:eu-west-1
  3. Path e.g. /home/user/file 1.csv

I would suggest to either use the Path to String node or for more flexibility the Path to URI node (which allows you to select the URI format) and then the URI to String node.

Bye
Tobias

2 Likes

Hi @tobias.koetter,

my primary intent is less about transferring data, as I too see little value in app specific data types being made available to other apps, but in saving values temporarily in most efficient way. With efficient I mean:

  1. Less storage consumption usign compression
  2. No pre- and post processing required (Path > String > Write > Read > String > Path)

My current situation is that I process >1.500.000 JSON files, which I am distilling down to a usable structured data set, and the processing overhead of handling paths adds quite some time to an already lengthly process. The JSON files contain a varying set of information so extracting the data into a structured format / DB is not directly possible.

There is one more aspect, appending data to an existing file similar to CSV, but that is a different subject.

Best
Mike

@mwiegand I would assume that Parquet will not support a KNIME specific format like Path variable …

@mlauber71 you are right, I totally forgot this file format is not developed by Knime. Sorry for the commosion …

1 Like

What’s wrong with the KNIME table format for temporary storage? The Table Writer does not support append, sure, but its storage format is the KNIME table format, which is compressed as well.

Missing compression and some odd situations I reported presumably related to JSON resulting in file corruption respectively the inability of reading the file in that workflow (I know, odd and uspecific but real).

I suppose it was that topic (just for reference):

Though, you mentioned the table file type supports compression. I checked and it didn’t seem to be the case. Maybe because I created a collection (set) of a path column type?

image

Best
Mike