Hello KNIME community
I would like to search if a cell table contain a string (accession number) from another dataset. And I would like also to append a new column to the dataset which corresponds to the accession number found.
I tried to use the reference row filter but it works only with an exact match but it that case I would like to find if an accession number exists in a string which contains one or many accession numbers.
Do you have any suggestions ?
I believe the easiest way to do this is with a rule based row filter and a loop over your patterns of interest. Have a look at the attached workflow and let us know if it does what you need.
It works thanks.
Do you know if it is possible to improve the performance.
In fact, if I made the same thing in JAVA, it took just few secondes to compare String attributes of 2 Objects using the "contains" method, whereas it is takes more than 1 minute with KNIME.
I think flexibility comes with price. The java contains method looks for exact matches, while the rules look for regexes and can be more complex. (I guess the few seconds for contains were not just for two single strings, but for many more. Given that KNIME has to select the columns from the rows and evaluate all rule conditions before returning the filtered table a bit more than 1 minute might not be too bad. You might try surround your pattern with \Q and \E or find an efficient regular expressions.) Anyway, if you find a hot spot with a profiler, that might worth looking at it.
If there are many patterns to search for this approach is likely not the most efficient. What are the approximate number of patterns and rows that you are working with? I may be able to do better :)
The number of pattern is around 150 and the data more than 40000. So, it is not a huge comparison.
In the event you're feeling benevolent, I'm facing a similar problem. The difference is that I have about 1 million patterns to check against, and the rows vary from 1,000 - 100,000 a day.