KNIME handling of many columns

Hi, when I have a table, purely of text and numbers which has 20,000 columns, KNIME is very very slow indeed and struggles to handle it when you view the table with either the Interactive Table Viewer, or just right clicking on the node and viewing the output.
This happens whether I have 1000 rows, or just 1 row, it is just as slow. It also causes certain nodes a long time to launch.
Any tips or solutions to this?
I am on KNIME 4.6.1

Hello @richards99,

could you give a screenshot of the problematic part of your WF (without any sensitive data) maybe if we have a look we could suggest some workaround or other (less memory need) logic for your WF.

In the meantime you can investigate the “usual things” when memory issues happen:

  1. playing around a little bit with the heap space memory of you KNIME (in the knime.ini setting the xmx value higher) KNIME is rather column sensitive than row sensitive, probably that is why you experience the same issue independently the number of rows but the same column number. In the KNIME Workbench Guide you can find how to set the xmx value and other settings possibilities as well, it worth to have a look.
  2. try to write data into disc and not keep it in memory
  3. if it is possible, try to transpose your data (limitation here: this is only useful if your columns are all of the same type and you have fewer rows than columns)
  4. in this quite old but still valid blog post, and in this and this threads you can find many other tips and tricks, how to handle a memory issue.

Please let us know if it helped you!
Regards,
Dora

2 Likes

Hi Dora,
Many thanks for the reply. I do not believe the issues relates to memory. I already have the heap space set to 12Gb, and the Heap monitor at the bottom of the screen gets nowhere near the 12Gb. I am using a Mac on the latest OSX if that info helps too.
The actual workflow will run no problem at all, it is in actually displaying the output tables or configuring certain nodes where there is a significant slow down and freezing of KNIME.
I am not really convinced showing you the workflow will help, there is nothing special in there. It is simply loading in a CSV file from DepMap which has all the human genes as Column Names, so there are a lot of them, then doing a couple of data transformations, that is it.

Simon.

Hi @richards99

Could you please precise the name of the DepMap file name you are working with and having troubles ? There are a few available at the DepMap web page (DepMap Data Downloads) and I would be glad to download the one you are using in particular and have a go to it.

Thanks in advance @richards99

A bientĂ´t,
Ael

2 Likes

Hi Ael,
Thanks for offering to take a look. There are a few which are problematic. The one I have been analysing most causing me problems is " CRISPR_gene_effect.csv".

Simon.

Note you will need to change the advanced settings of the CSV reader to increase the maximum number of columns, otherwise it will not read the file in.
Simon.

Avec plaisir Simon :slight_smile: !
Thanks for your quick reply too. I’ll have a quick look at it and let you know soon.

A très bientôt,
Ael

2 Likes

@richards99 these two things come to mind:

More things about performance here.

One step could also be to investigate special settings mentioned here (edit: or mabye not FILE READER freezes - Part (2) - #9 by marc-bux)

5 Likes

Salut Simon,

Complementary to @mlauber71 information and hints, I just tried on my side with a Windows 10 PC (128 Gbytes) and I didn’t find any problem to load the data and visualize/scroll it with the -Interactive Table (local)- and the -RDKit Interactive Table- nodes.

I also tried the -Table View- node based on JavaScript and this one took much more time to get ready but eventually the data got displayed and I can scroll it too. The reason for the time lag in the latter is most probably the need of data conversion between Java and JavaScript tables. The scrolling of columns is a bit slower too but reasonable.

Hope it helps.

Meilleurs voeux,
Ael

3 Likes

Thanks for all the suggestions @mlauber71 such as the Cache node, the Columnar Table Backend, and changing the compression type. None of these worked sadly, the table remains super slow to move around.
I already have the Heap Space shown at the bottom of KNIME, and it remains at 3Gb, with the max size set to 12Gb in the knime.ini file.
Not sure what the problem is as @aworker has kindly tried it out and has no problems with it. The main differences being aworker is using a Windows PC, whilst I am using a MacBook Pro (Intel), and aworker’s PC has 128Gb versus the 16Gb in the MacBook.
Seems a bit of a mystery where the problem is.

Simon.

1 Like

@richards99 ,

I have tried the “CRISPR_gene_effect.csv” file with both File Reader and CSV Reader nodes to load the file, plus with Columnar and Standard table back-ends without any problems opening and viewing the file contents.

You may want to check the settings within your KNIME.INI file and make sure that you have the -XX:+UseG1GC option. This was introduced when KNIME switched to the G1GC garbage collector, which I thought was standard now, but may not be. Before it was introduced I had a lot of problems viewing large data sets. If you look at the post that mlauber71 referenced on heap space exhaustion there are some changes to garbage collection and information logging that Marc Bux suggested. The memory doesn’t need to be full for garbage collection to become an issue, sometimes, if objects are big enough, it struggles to use memory efficiently.

The only other thought is can you confirm that the data is loaded as decimal columns and not strings. We are all assuming that KNIME is reading the file the same way. It’s a long shot but not impossible for two different platforms to interpret csv files differently.

DiaAzul.

5 Likes

Really appreciate everyone’s help.
Thanks @DiaAzul , I can confirm the UseG1GC option is in the knime.ini file.
I have rebooted the Mac, loaded KNIME right away, and tried out the CSV file, and when viewed the problem is there straight away. Garbage Collector showing around 1Gb in use.
I have tried disconnecting all external monitors, so just on the laptop screen and problem persists. I have also resized the view window to make it really small, and still the problem persists.
I wonder if this is specific to Mac.
Simon.

Salut Simon,

I was wondering this too. Unfortunately I cannot test on my side. I’m uploading here the simple workflow I set so that other people can try it on different configurations if they wish. It has part of the data already inside the workflow just to allow people to check:

Maybe worthwhile also to mention that I tested it on two different KNIME versions (4.5.2 & 4.6.1) on Windows 10 and it works fine on both too.

What KNIME version are you using @richards99 ?

Hope it helps.

Bonne journée !
Ael

1 Like

Using the latest version, KNIME 4.6.1

:frowning:

@richards99

I tried to load your file on another computer - same set up as the other - and ran into the same slow performance that you are reporting. After a lot of digging I finally found the difference in configuration between the two which is the “Wrap Column Header in Table Views”. Once I deselected this the display performance of the tables improved dramatically.

Can you look and see if this is the cause of your problem. It would be useful feedback for others if this is the cause.

DiaAzul

10 Likes

That is it, because I always have that turned on !!!
As soon as I turned this off, the table scrolls around smoothly!!
Cannot believe such a minor feature has been giving me such a headache.

Thanks so much for all the digging around on this, it has been driving me crazy. I hope there is a fix KNIME team can do to allow quick table movement and multiline column headers.

Simon.

6 Likes

This is some excellent detective work, @DiaAzul. Thanks to you and everybody else on the thread that has put time into troubleshooting this issue.

I will raise this with the developers and see if there’s something we can do to improve performance here.

EDIT: Ticket opened, AP-19421.

7 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.