I have a KNIME flow with a loop that 100 JSON files (each has between 7’000’000 and 12’000’000 rows) from a folder. At the end of the flow a series of rule based row splitter nodes split the data into a number of streams based on an ID value in one of the fields.
My problem is that the CSV writer at the end of the stream is generating a new file for each of the 100 original files so instead a final result of a single file for each stream, I end with 100 files for each stream.
The columns/fields for each stream is identical so there is no need to worry about row header mismatch, short rows or anything like that for the out put from a single stream.
Uploading to an online service was considered I don’t think any online service would appreciate me dumping 350Gb into their systems on a weekly basis.
I could write the whole lot to a DB with a table for each stream and then fetch each table back to write to a CSV but that is just a lot of rework, overhead and time.
Is there a way for me to append/merge/concatenate the end result of each stream at the final CSV Write node into a single CSV file so that I end up with a single file for each ID.