Check if a cell contain a string

joel · October 14, 2014, 9:53am

Hello KNIME community

I would like to search if a cell table contain a string (accession number) from another dataset. And I would like also to append a new column to the dataset which corresponds to the accession number found.

I tried to use the reference row filter but it works only with an exact match but it that case I would like to find if an accession number exists in a string which contains one or many accession numbers.

Do you have any suggestions ?

Regards,

Joel

Aaron_Hart · October 14, 2014, 12:00pm

Hi Joel,

I believe the easiest way to do this is with a rule based row filter and a loop over your patterns of interest. Have a look at the attached workflow and let us know if it does what you need.

Regards,

Aaron

joel · October 14, 2014, 3:08pm

Hi Aaron

It works thanks.

Do you know if it is possible to improve the performance.

In fact, if I made the same thing in JAVA, it took just few secondes to compare String attributes of 2 Objects using the "contains" method, whereas it is takes more than 1 minute with KNIME.

Regards,

Joel

aborg · October 14, 2014, 5:26pm

Hi Joel,

I think flexibility comes with price. The java contains method looks for exact matches, while the rules look for regexes and can be more complex. (I guess the few seconds for contains were not just for two single strings, but for many more. Given that KNIME has to select the columns from the rows and evaluate all rule conditions before returning the filtered table a bit more than 1 minute might not be too bad. You might try surround your pattern with \Q and \E or find an efficient regular expressions.) Anyway, if you find a hot spot with a profiler, that might worth looking at it.

Cheers, gabor

Aaron_Hart · October 15, 2014, 10:44am

If there are many patterns to search for this approach is likely not the most efficient. What are the approximate number of patterns and rows that you are working with? I may be able to do better :)

joel · October 15, 2014, 2:07pm

Hi Aaron

The number of pattern is around 150 and the data more than 40000. So, it is not a huge comparison.

Regards,

Joel

theamazingdrew · February 19, 2016, 12:28am

Aaron

In the event you're feeling benevolent, I'm facing a similar problem. The difference is that I have about 1 million patterns to check against, and the rows vary from 1,000 - 100,000 a day.

Thoughts?

Drew