I notiuced that the Parquet writer does not support the path type. Can you consider that when working on the general path support in Knime nodes please?
ERROR Parquet Writer 3:1958:0:1989 Execute failed: Output type mapping is missing for column:
Output Location (Path)
do I understand you correctly that you want to write a path column to a Parquet file? If that is the case what is the use case for this? Would you want to work with the Parquet file also outside of KNIME?
The problem with this is how we would represent the proprietary KNIME path type in Parquet so that it can be processed with other tools. Internally the path consists of the following three parts:
Category e.g. local or connected
(Optional) specifier e.g. amazon-s3:eu-west-1
Path e.g. /home/user/file 1.csv
I would suggest to either use the Path to String node or for more flexibility the Path to URI node (which allows you to select the URI format) and then the URI to String node.
my primary intent is less about transferring data, as I too see little value in app specific data types being made available to other apps, but in saving values temporarily in most efficient way. With efficient I mean:
Less storage consumption usign compression
No pre- and post processing required (Path > String > Write > Read > String > Path)
My current situation is that I process >1.500.000 JSON files, which I am distilling down to a usable structured data set, and the processing overhead of handling paths adds quite some time to an already lengthly process. The JSON files contain a varying set of information so extracting the data into a structured format / DB is not directly possible.
There is one more aspect, appending data to an existing file similar to CSV, but that is a different subject.
What’s wrong with the KNIME table format for temporary storage? The Table Writer does not support append, sure, but its storage format is the KNIME table format, which is compressed as well.
Missing compression and some odd situations I reported presumably related to JSON resulting in file corruption respectively the inability of reading the file in that workflow (I know, odd and uspecific but real).
I suppose it was that topic (just for reference):
Though, you mentioned the table file type supports compression. I checked and it didn’t seem to be the case. Maybe because I created a collection (set) of a path column type?