Resizing / concatenating files to specific number of rows

Hi Everyone,

Thanks in advance for the help. I have a list of 10K files which have have varying numbers of rows. I want to write new files which have a specific number of rows say 100. I’m pretty sure I need to use a recursive loop here. Unfortunately the size of the files don’t allow for combining all rows before I use the recursive loop.

Is there a way I can loop through my files adding the contents until the number of rows gets to 100 and write that file? Then continue the loop for another 100K write file etc?

I think the concept is I need to create some sort of index with number of cumulative rows added.

I also need to be able to account for a situation where I have 90 rows and the incoming file is larger than 10 rows. For example if it’s 50 rows I don’t want to end up with 140 rows. Ideally I could figure a way to add 10 rows to that and then put the rest into a new file. This may be too much to ask though…

Thanks,
Jason

Certain Reader nodes allow to limit the amount of rows (e.g csv reader, excel reader) you want to read. Beside that there is a row sampling node. Does that help?
bR

1 Like

Hi @j_ochoada

I created this workflow resizing.knar (49.8 KB) . I takes 6 txt files as in input, and starts with a Chunck Loop. The chunck (number of files to read in every run) depends on the size of your files and what KNIME can handle (so a little trial and error). After the files are grouped together, the flow creates batches (in this example, batchsize=5). For every batch an output file is created. At the end of the worklfow the records that did not fit a batch are collected. This may be the input for another run. Notice that the csv writer generates an empty file at every last run of the Group Loop .

I’am aware it is not the final solution to your question, but maybe it gives you some inspiration and helps you moving forward.

gr. Hans

1 Like

Hi,

Thank you both @Daniel_Weikert and @HansS for the suggestions. I think looking into your ideas I can put together something which will work. Especially if I can swap incoming file size for row count.

Thanks to you both for the help!
Jason

2 Likes

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.