RegEx case sensitivity in Documet Viewer Node

TimB · June 5, 2014, 3:10pm

I'd like to report a small issue with the regular expression (RegEx) visualization in the document viewer node. From time to time I'd like to use it to figure out what parts of a doc could be filtered using a certain regular expression. To investigate the usage of some specific abbreviations in a collection of documents I tried to detect abbreviations by RegEx, let’s take a simple one here:

find all terms that consist of three capital letters i.e. [A-Z]{3}

In an example serias of:

abc ABC aBc

The term “ABC” should be identified, however is not marked in this view.

Using the RegEx [a-c]{3} detects all terms (correctly), since [a-c] (or [a-z] in general) is not case sensitive as far as I know.

Speculating from some online articles I read and my (fairly limed) knowledge of programming this might be due to how the RegEx term is transformed in the searching code. For the KNIME filter nodes using RegEx there is an additional check box for “case sensitive” for the RegEx and this works fine then, however sth. like this does not exist here which leads to a wrong output fur such RegEx, doesn’t it?

May be the search field could be modified to deliver a correct result even for case sensitive searches/RegEx?

FYI: There is a regular expression search function in the freeware notepad++ hat also does the job of "RegEx” testing nicely (next to several online resources).

kilian.thiel · June 5, 2014, 6:17pm

Hello Tim,

the terms are converted to lower case before regex matching. This should definitely not happen. I will put this on the list. Thank you for posting.

Cheers, Kilian

TimB · June 10, 2014, 11:14am

Thanks for your answer and the quick reply.