Performance large data handling

Hi all,

right now I’m working on very large image analysis with KNIME and I have some performance issues. After some image processing and labeling I do segmentation. I get easily over 300000+ cells (rows).

Especially plotting (2D/3D Scatterplot, scatter plot etc.) is getting very slow and hiliting cells in the plot gets difficult because it gets very laggy (yes, I have to plot them all :) ).

My workstation: 

Win7 64bit

12 GB Ram

Intel Xeon X5650 @ 2,66 GHz

 

Is my workstation just not fast enough ?

 

Kind Regards,

Flo

You might want to check how much memory is assigned to the JVM wihtin the knime.ini contained in the knime installation, we also recommend having a look to the Eclipse memory monitor which need to be enabled under File > Preferences > General > Show Heap Status. You could also try to put a Cache node just in front of the Viewer.

Hi Florian,

we are aware of this topic, but since now we didnt have so many requests on such big data. Anyways, the last weeks/months we got more and more requests concerning super large image files. Can you explain your data and what you want to do with the data a little bit more in detail?

  • What is the dimensionality of such data?
  • What do you want to do? Segmentation? Classification? Tracking?
  • Which data format do you use (HDF5?).

We want to support this kind of data in the future, so it would be great if you could share some of the experiences with us.

Related to the hiliting: Can you maybe create a small workflow using the data generator + some hiliting nodes where the hiliting gets slow? maybe we can reproduce this issue and fix it then!?

 

Thank you!

Christian

 

Hi,

I went through the Heap Size settings (at least from what’s in the FAQ) and the Heap Status shows that there is plenty of room left ;)

The Cache Node helps a little thanks :-)

I have to admit that the “official” Data Views from KNIME are working better than the ones from the community nodes. Like the 2D/3D Scatterplot from Erl Wood is getting very quick laggy.

One other thing, that is a little baffling, is when you load big files with the image reader. When you watch the percentage from the image reader status it jumps very quickly to 99% and then stays there a long time. For example I loaded a 800 MB file and it jumped in seconds to 99% and stayed there for about 2 hours until the reader was done.

The reason I’m saying that is that we have in (bio)Microscopy very high resolution images and we have to handle not only a lot of images but also very large data images like over 1 GB per file. That is a general problem nowadays in the Microscopy world how to handle the big data :)

 

Kind regards,

Flo 

Hi,

First of all we use the CZI file format (ZEISS Microscopes).

I have made a tile Region with about 7600 tiles. This would be an example of one big image (resulting from many tiles).

https://www.dropbox.com/s/2ytbil61kby5nst/image%20dimension.tiff

https://www.dropbox.com/s/wcxsq0jri9xszpi/Experiment-09.zip

 

The second example would be a lot of small images (resulting from many positions.)

Here I have about 1700 files containing about 35000 images.

https://www.dropbox.com/s/c4ep6t7ypbogz4v/Test_Data.zip

 

What do I want to do? Like you said e.g. segmentation, classification and tracking. Here is an example workflow:

https://www.dropbox.com/s/4gzfc64xuxad9h7/test_welldata_readout.zip

I found about 1.300.000 cells and I want to plot them all on lets say Num Pix on Sum or so.

 

I hope this helps for now,

Of course I can provide more data.

 

Kind regards,

Flo

 

Hi Flo,

Fantastic, thanks a lot! I will take a look! This helps us a lot!

Christian