I would like to set up an image processing pipeline to analyse HCS datasets in KNIME. The task is relatively simple: segment in one channel and calculate feature in 2 other channels. This is working straightforward and quite nicely in KNIME! :-)
However, I have some performance issue with the Image Reader node, which is currently somewhat blocking me from analyzing larger amounts of images.
The data originates from 384 well plates and for each well 3 stacks with 9 slices have been acquired. The images are 2048x2028 16-bit unsigned (8Mbyte). For the analysis I loop through the dataset and want to load the 27 images from one well in each iteration for further processing. However, it currently takes approx. 40 secs for the Image reader node to load them ("check file format disabled"). This massively increases processing time especially once many 384 plates need to be processed. Once the images are loaded I have no performance issues at all.
Are there any "tricks" to speed up image loading with this node? Or alternative ways of reading the images into KNIME (would be great to have an input port for a file path table though).
Thanks in advance,
you are right, 40seconds for a single image is very long (actually too long;-)), there must be something weird going on. We will try to solve your problem, therefore I have some questions:
- Which format do you images have (for example tiff, czi, lsm, ... etc)?
- Internally our Image Reader Node uses SCIFIO (http://scif.io/) and BioFormats (http://loci.wisc.edu/software/bio-formats). So it would be interesting if the loading is faster with the latest FIJI (fiji.sc) releases (which uses the same libraries).
- Which version of KNIME Image Processing are you using (nightly build vs. stable)?
- Could you send us some example data (firstname.lastname@example.org)? Then we can try to reproduce your problem locally. We will treat the data as confidential of course.
Concerning the InPort for File-Paths: We actually have this inport. You just have to configure the Image Reader (same tab as "check file format disabled") to use the corresponding column of the input table.
Thanks for the quick response!
Actually the 40 seconds is for loading the 27 images ;-) ..but still substantial. The file format we use is just tiff. In Fiji the 27 images load fast (1-2s).
I am running KNIME Image Processing 126.96.36.199402102021 in KNIME 2.9.4.
I see it only when I load the 27 images from the folder containing all the raw data (approx. 10.000 files). When I copy a small subset to a new folder they load with good speed. I see this behaviour when I manually select 27 files or provide a table containing the image paths to the Image reader node.
I have the feeling this is not related to the file-/imagetype but I will send you some images anyway later today so you can have a look.
as soon as you have send us some example images, then we will test it. Maybe we do something wrong internally. If FIJI can load it that fast, we should also be able to do so, as we internally use the exactly same libraries ;-)
thanks for the data. I found the problem and fixed it. We will release KNIME Image Processing 1.2.0 tomorrow and this fix just made it to be part of the release ;-)
The problem was, that each file tried to find its metadata in the directory and therefore scanned all the other files in the same directory, which is pretty expensive if youhave 10k files.
The trick is to switch off the option "isGroupFile" in the dialog of the Image Reader. In version 1.1.x this didn't really work, but in the new version 1.2.0 this will speed-up the Image Reader dramatically.
PS: http://www.knime.org/about/events/webinar-what-is-new-in-knime-210 may be interesting for you as well.
That's great news, this will help a lot. It seems the timing was just about right :-)
Looking forward to test the new release.
is it working with KNIME Image Processing 1.2.0?
Yes, it is working with very good performance now....thank you very much again!
Can you give more details about the Load group files option ? It seems to indeed slow down the reading of the images. Still it is checked by default so in what case is this option useful ?
Thanks (Nice new forum interface by the way !)
This option is for file formats that store on image distributed over several files, e.g. one tif-file per time point. It is checked by default as we try to err on the side of being able to read a file correctly over reading it fast. We should probably document this behavior better though.