I am having challenges opening a 5gb csv file

I have this file that i will be using for my academic work (dissertation). I have tried using the CSV node and File reader node to read this file to no avail.
Any help would be appreciated

Thank you

Hi @odartey, and welcome to the Knime community. When reading a CSV file I prefer to use the File Reader node, but are you having trouble loading it?

Please see the image

image

Best Regards

3 Likes

Hi @odartey and welcome to the KNIME forum !

I agree with @mauuuuu5, the File Reader node is much more efficient than the CSV reader node.

Having said this, 5Gb is quite a big CSV. Could you please provide with a minimum of information about what you are trying to read and under what conditions please ? For instance to begin with:

  1. Operating System (Windows / Linux / Mac OS):
  2. Current Maximum Amount of RAM Memory in your computer:
  3. Disk Space left in your working disk (KNIME uses it to do Memory Swapping) for big files:
  4. Number of rows in your CVS file (even approximate):
  5. Number of Columns (even approx.):
  6. If possible, a copy of the few first lines of your CSV file to understand what it looks like:

All this may help people here to guess whether the task is feasible :slight_smile: (or not :cry:) and may be suggest plausible solutions.

Letā€™s hope it is possible :wink: :+1:!

Best

Ael

5 Likes

You could try and use the knime local big data environment and Hive and Import That as an external table. You might have to provide the structure by hand or by importing the first x lines as a sample.

No guarantee but it might be worth a try.

Also you could see if you find an idea in this collection

4 Likes

Helloļ¼Can it run on the server?

1.Windows
2.8gb
3.300gb
4. Exceeds the number of rows available in MS Excel
5.cant tell for that
6.that also canā€™t tell.

I hope the above will help

Thanks

No, I have not tried that

Yes, I have problem using it to read the file. My laptop hangs after a while and then the laptop freezes until I restart it

HI @odartey

Thanks for this information. Using a Windows Shell or PowerShell, you can extract the first few lines of your csv file using the following command:

C:\Your_File_Directory> type your_file_name.csv

The type command should return the first 10 rows of your CSV file.

Could you please execute this command and copy and paste the results in your next answer?

To know how many lines your CSV file has, you would need to type the following command in a Windows Shell too:

type your_file_name.csv | find /c /v ""

The type command prints all the lines and the find command counts them and just returns the total number of rows in your CSV file.

Could you please execute this command too and and report here the number of rows ?

These three informations will tell us the number of rows , columns and the header or your CSV file.

Hope this helps.

Best

Ael

4 Likes

Hi @odartey besides @aworker approach you can use another software called Ultraedit which can visualize and edit large CSV files, so you can check if there is a page break or anything else. For instance, in my case, it helped me to remove some garbage at the end of the file which was causing me trouble with the File Reader node.

Let us know, how it goes.

Mau

2 Likes

Use a python source node and read the data in chunks.

1 Like

Hi @odartey , letā€™s say you eventually are able to open the file, whatā€™s next? What kind of operations do you need to do to the data?

Itā€™s probably possible to read the data in chunks, but this is feasible depending on what you need to do with the data. For example, if you need to deduplicate the data, then youā€™ll need to be able to read the whole data.

You can read the data in chunks via Python like @Daniel_Weikert suggested. You can even read line by line via the Line Reader ā€“ KNIME Hub node.

I would not try to open this csv via Excel. You can use UltraEdit like @mauuuuu5 suggested. Notepad++ or Textpad would also do.

In pure windows command line, you can also use the type command (my days using DOS, thatā€™s pre-1995) to view the content of the file, but you probably do not want to just execute type (there will be plenty of lines scrolling).

You can use it to count the number of lines:
type <full_path_of_your_csv_file.csv> | find /c /v ""

This should tell you the number of lines the file has - note, this can take a while to run, but it will run.

You can also do:
type <full_path_of_your_csv_file.csv> | more

This will show you the content of the file, pausing at each page break where you can press any key to show next page, and press CTRL + C to stop the command at any time, meaning you can use this to preview the data.

So, what operations do you need to do on the data?

2 Likes

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.