Handling Large CSV Files (approx. 1GB and 4.5M rows)

I am new to Knime and have been trying different options.

I have been able to read a 0.75GB file with 4.26M rows using the CSV Reader Node and have displayed it in tabular format with the Interactive Table Node. Is it then possible to run SQL queries on the Interactive Table and produce reports? If so, will you please be kind enough to point me in the direction of the documentation which shows how this may be done?

I have tried the SQLite Connector with a small sample data set, but although one is led to believe that the iris.csv format is csv, it’s actually tab-delimited, which presumably explains why I received an SQLite error when I tried to read data from the sample file. Is there a true comma delimited format available for this node, as attempting to run a replacement script on the target file is unworkable? In any event, I think I read somewhere that SQLite only handles some 65K rows so perhaps this is a non-starter.

Please let me know if Knime has the functionality to perform this sort of task on a flat CSV file, as the platform being used is only a desktop Linux PC running MySQL, so importing files of that size into the database will take many hours and is not really practical. Especially, since each file represents a monthly dataset and I ultimately need to produce reports on at least a 12 month basis.

Many thanks for your help.

Hello wishbone42,

Unfortunately I cannot help you with SQLite, but regarding the csv-files:

You can read a tab separated file using the CSV-Reader. Then simply use a CSV-Writer and you are done.

Best,
Ferry

Hello Ferry,

Thank you. I am able to read the CSV file (which is comma separated) and display it in the Interactive Table Node. I can also see how one might use the CSV Writer to output to a file. However, my question was how do I run an SQL query on the data in the Interactive Table Node? Or is this essentially read only? What I wish to do with the CSV file is run a query without having to load it into a regular database. (This would appear to be possible with relatively small files with SQLite, since it uses a flat file tab delimited format for its data structure.)
Is there another method in the workbench that I could use to provide this functionality?

Thanks.

Hello wishbone,

As far as I know there is no way to do an SQL query on a KNIME table.

But you should be able to do this with the SQLite Connector. There is no limit on the number of rows (theoretically 18446744073709551616 but I doubt you have that many rows, and if you do, KNIME would collapse anyways ;-)) and it is pretty fast. I just tested it with one million rows and it worked like a charm.

Creating a database with 5 million records and reading them all in again takes about 2 minutes on my machine, so I don't think you'll find a better solution that allows you to do what you want to do.

Just if you haven't found this post and need a little clue how to start: Apply sql query to a table output by a node

Best,
Ferry

PS: You might want to convert your SQL queries into a series of KNIME nodes. Tell me if you want a little guidance on how and why you could do that.

Hello Ferry,

Apologies for the delay in reply. For some strange reason I had the notion that a flat file structure as used by SQLite was going to better than MySQL. However, after I had major difficulties getting the file into SQLite (running out of swap space, etc.), I loaded the 750MB file into MySQL from the command line using LOAD DATA INFILE in a few minutes. Not that that’s particularly relevant to the KNIME Workbench, but might be of some help to others in their implementations.

Thanks for the reference above. I’ll take a look and let you know if I need any more assistance.

Regards,

Malcolm