How to archive rows in a csv file?

Hello, has anyone already set up a workflow to automatically archive rows contained in a csv file?
Thanks

Hi @etoughet, can you give more detail about what you envisage happening when you say “automatically archive”, as I suspect you have a very specific use case in mind.

Hello Takbb, I’m working on a project to migrate tableau prep to knime server.
I have csv files in a workflow that have millions of rows, several years worth of data. I wanted to know how I can say for example to take only the data of the last 3 years and archive the rest of the data.

Does each file contain data that spans many years, (and so each record therefore has some kind of datestamp), or do individual files represent defined date ranges?

If each file contains date-stamped records, you could have a workflow to read in each file, then use a row splitter to divide data into “recent” and “to be archived”, with each output sending to a CSV writer to write files; one to a “current” folder, whilst the other writes out to an “archive” folder. This would be performed in a loop.

btw, welcome to the KNIME community @etoughet !

I have files containing date-stamped records, so I’d like to try your solution.
Do you have an example workflow please?

Hi @etoughet , give this demo workflow a try. I don’t know if you’re on Windows. If not, you’ll need to modify the output folder locations defined in the Variable Creator node. Currently it points at c:\temp

Archive old records from CSV files.knwf (164.0 KB)

In this demo, there are two sample csv files in the workflow data folder, created using the nodes in the yellow box at the top. That yellow box contains a couple of components from the Knime Hub. One (Extract Data Table from delimited text) is used for ease of pasting in demo data, which I had chatGPT generate for me. The other “Open File or Folder” provides a convenient way of viewing the workflow data folder.

The main workflow reads that data folder and processes each file found. It separates files based on a cutoff date defined as a string in the Variable Creator node.

I make use of a home-grown component “Path to Extracted File Name Parts” which wraps up the mechanism for get the file name out of a path. It uses the file name to create new output files in the “current” and “archive” folders. To do it without the component, you would use the Path to URI and URL to File Path nodes, but I just find this more convenient :slight_smile:

The records written to each depend on the cutoff date which is used within the Rule Based Row Splitter.

I hope that helps

3 Likes

Thanks so much Takbb

1 Like

I’m on windows, i try now!
Thanks

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.