How to increase the Search performance from a Data set in KNIME?

A interactive search widget (multi column) knime model is designed to retrive the results from a huge data set, currently this is working with a 12M rows of data which is having aroung 7 columns.

There is a data additon daily happening at the rate of 1M rows to the above said data set. Current performance of the Data retrieval is poor, eventhough the 12M data is kept in a Table and connected directly to the Widget component.

Any suggestion to increase the search performace of this model? Please suggest.


First of all - 12m rows is pretty heavy I think and with that growth rate (unless there are also records dropping off…) I’m not sure how sustainable it is to keep the data in .table format.

What you can try is:

  • Wrap both the reader node and the filter widget into another component and activate “simple streaming” - you’ll need to install KNIME Streaming Execution extension for this. You simply right-click => configure on the new component go to Job Manager Selection tab and switch to Simple Streaming from the drop down. Then play around with the chunk size and see what performs best.
    You can use Timer Info Node to get the execution time and compare.

  • check what the maximum allowed RAM usage is in knime.ini in the root folder of your knime installation. The default is this:

Check how much RAM you have available and try increase it to maybe half of your computers RAM (so if you have 32gb then set it to 16 gb). For 16gb you change that red highlighted row to -Xmx16g

Overall it might be worthwhile investigating if the data can be moved to a well-performing database although not sure how well that works with the interactive search widget.


@gokulpnair besides using a proper database with indexes you could try a local standalone database like H2 or SQLite, maybe create an index and try to search this database columns.

CREATE INDEX idx_column_name ON table_name(column_name);

or you can even try composite indexes:

CREATE INDEX idx_first_last_name ON employees(first_name, last_name);

But clever tricks can only substitute power of your machine or the use of a proper database on a server so far …


I m not sure this will improve search performance but you idea to check for overall performance settings is a good one.


Agree it’s not guaranteed as it depends on the WF set up (which nodes are used and are they compatible with streaming, is the data read every time or only at the beginning…), but under certain circumstances it can help - that’s why I thought it is worth a try :slight_smile:

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.