I have this file that i will be using for my academic work (dissertation). I have tried using the CSV node and File reader node to read this file to no avail.
Any help would be appreciated
Thank you
I have this file that i will be using for my academic work (dissertation). I have tried using the CSV node and File reader node to read this file to no avail.
Any help would be appreciated
Thank you
Hi @odartey, and welcome to the Knime community. When reading a CSV file I prefer to use the File Reader node, but are you having trouble loading it?
Please see the image
Best Regards
Hi @odartey and welcome to the KNIME forum !
I agree with @mauuuuu5, the File Reader node is much more efficient than the CSV reader node.
Having said this, 5Gb is quite a big CSV. Could you please provide with a minimum of information about what you are trying to read and under what conditions please ? For instance to begin with:
All this may help people here to guess whether the task is feasible (or not
) and may be suggest plausible solutions.
Letās hope it is possible
!
Best
Ael
You could try and use the knime local big data environment and Hive and Import That as an external table. You might have to provide the structure by hand or by importing the first x lines as a sample.
No guarantee but it might be worth a try.
Also you could see if you find an idea in this collection
Helloļ¼Can it run on the server?
1.Windows
2.8gb
3.300gb
4. Exceeds the number of rows available in MS Excel
5.cant tell for that
6.that also canāt tell.
I hope the above will help
Thanks
No, I have not tried that
Yes, I have problem using it to read the file. My laptop hangs after a while and then the laptop freezes until I restart it
HI @odartey
Thanks for this information. Using a Windows Shell or PowerShell, you can extract the first few lines of your csv file using the following command:
C:\Your_File_Directory> type your_file_name.csv
The type command should return the first 10 rows of your CSV file.
Could you please execute this command and copy and paste the results in your next answer?
To know how many lines your CSV file has, you would need to type the following command in a Windows Shell too:
type your_file_name.csv | find /c /v ""
The type command prints all the lines and the find command counts them and just returns the total number of rows in your CSV file.
Could you please execute this command too and and report here the number of rows ?
These three informations will tell us the number of rows , columns and the header or your CSV file.
Hope this helps.
Best
Ael
Hi @odartey besides @aworker approach you can use another software called Ultraedit which can visualize and edit large CSV files, so you can check if there is a page break or anything else. For instance, in my case, it helped me to remove some garbage at the end of the file which was causing me trouble with the File Reader node.
Let us know, how it goes.
Mau
Use a python source node and read the data in chunks.
Hi @odartey , letās say you eventually are able to open the file, whatās next? What kind of operations do you need to do to the data?
Itās probably possible to read the data in chunks, but this is feasible depending on what you need to do with the data. For example, if you need to deduplicate the data, then youāll need to be able to read the whole data.
You can read the data in chunks via Python like @Daniel_Weikert suggested. You can even read line by line via the Line Reader ā KNIME Hub node.
I would not try to open this csv via Excel. You can use UltraEdit like @mauuuuu5 suggested. Notepad++ or Textpad would also do.
In pure windows command line, you can also use the type
command (my days using DOS, thatās pre-1995) to view the content of the file, but you probably do not want to just execute type (there will be plenty of lines scrolling).
You can use it to count the number of lines:
type <full_path_of_your_csv_file.csv> | find /c /v ""
This should tell you the number of lines the file has - note, this can take a while to run, but it will run.
You can also do:
type <full_path_of_your_csv_file.csv> | more
This will show you the content of the file, pausing at each page break where you can press any key to show next page, and press CTRL + C
to stop the command at any time, meaning you can use this to preview the data.
So, what operations do you need to do on the data?
This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.