I work with huge row data around 2 millions record , can KNIME handle this much of data ? it will be smooth ?
Hey and welcome to the forum!
I think the general answer to that is yes. How well it performs will depend on topics like:
- your hardware
- Knime config (e.g. allocating enough RAM - ideally more than the default 2gb - in knime.ini)
- What you are doing with the data and what optimizations are possible (e.g. parallel processing)
If you can share some more context Iām sure more specific guidance can be given!
@Yas_Yas knime can handle large files. One question is how large is the whole data in a compressed form like zip or parquet. How many columns.
I have collected some hints about very large files. Though you might just be fine with standard procedures if you have enough RAM.
When it comes handling analytical workloads against large datasets locally in KNIME, DuckDB (via DB nodes) is BY FAR the best and the easiest option. You can upload the jdbc driver found in maven repository & then use the DB nodes to process the data. If you convert your dataset to parquet before loading it (e.g by the ā csv/excel reader to parquet writer node,ā method) , itāll be even faster. All of the knime-native optimizations mentioned above are good, nevertheless duckdb is a state of the art olap engine and all knime-native tweaks will only give get you so far (i.e. a small fraction of the performance benefit you can get via duckdb). Happy to elaborate on any of the above if youāre interested in this particular solution
Hi @Yas_Yas and welcome to the forum,
I think KNIME has no problem handling these. It just takes its time.
I have recently started to work intensively with daily snapshot data from databases and these a datasets with 3 to 5 million rows and a 130 columns (so roughly half a billion data points).
Again, KNIME can handle this and still works like a charm. It just takes longer, because it needs time to process the data.
I did some data transformation (like date to string) and then uploaded this to an AWS S3 bucket ā 18 GB csv file - our technical partner who needed this data said, this was the biggest he has seen and was not sure if his ātoolā could handle it
I think one of the advantages of KNIME is the visual representation of the progress through the execution queue you see progressing below the node as well as if you hover over the node you also see how many lines out of the total number of lines it already has processed.
Try this with Excel
If you want I can look up my hardware but in general I can say it is a somewhat decent 3 year old laptop (so nothing super fancy).
Hope this helps.