Row filter based in a group of words

odiaz2309 · November 1, 2022, 10:22pm

Hi team,

I have a column in a table with customers names which I need to filter based in words contained in part of their names (for example “S.A.”, “SA”, “S A”). I have tried with the Rule Based Row Filter node like this:

Unfortunately, it doesn’t work because I need only the specific word as I wrote it in the node, an exactly coincidence. The result shows me names that contain partial coincidence like S.A.S. or SAS which are other kind of custumers, even any name that contain SA inside (for example “SALES”).

I’ll be thankful any help with this trouble

Adrix · November 1, 2022, 10:42pm

HI ,

In fact SALES is part of true as it is presented.
Apologies if it will sound a little simplistic, why do not include a Blank space in the position you want.
FOR ex : “* SA *”
As you are preseting the problem i think will do the job

odiaz2309 · November 1, 2022, 10:55pm

Don’t worry Adrix, you’re helping me

Actually, just after open the forum, I tried another option putting only one “*” at the beginnig of the word and it works. However, I would like to find and easy way to do it, because at the end what I need It’s to clasified the costumers by the abbreviations at the end of their names, for example if the name contains de abbreviation “SA” “S A” “S.A.” or “S.A” assing the kind of enterpraise “ANONIMUS SOCIETY”, I mean that based in a determined group of abbreviations create a new column with the meaning.

Adrix · November 1, 2022, 11:01pm

Got it .
so for sure it always will end with “* SA” , “* S.A”, “* S A” (space after the star). correct?

If the solçution above don´t solve , try this, it will also garatee the cases Sa S.A, sa …

$Nombre o Razón Social$ MATCHES “(?i).* SA$”=>TRUE
$Nombre o Razón Social$ MATCHES “(?i).* S.A$”=>TRUE
$Nombre o Razón Social$ MATCHES “(?i).* S A$”=>TRUE

there is a “slash” in the second line of the statments but when o paste in text it desapears

odiaz2309 · November 2, 2022, 2:51am

Thanks a lot, but I just realized that the abbreviations can be not only at the end of the customers names, but they could be after one or two words (for example GOOGLE SA ANALYTICS), so if I put " *SA" in the rule based row filter node, the result give me this kind of rows (GOOGLE ANALYTICS SA) with the abbreviation only at the end of the name.

Where can I find an explanation about the position of the * or even thing like the one that you suggest me (?i).? Is there any forum or link where I can understand the logic of this characters at the momento of make a text filter? I have seen many similars

mlauber71 · November 2, 2022, 12:22pm

@odiaz2309 you can check out the different options to use RegEx to identify a string like “S.A.S.” here:

Tools like https://regex101.com/ and https://regexr.com/ would allow you to test and train your RegEx code but also get an explanation about what it is doing. And then (well) googeling often helps. You then might have to check the various quirks of implementing it into KNIME (namely the escape and double escape and when to use what).

Adrix · November 2, 2022, 12:32pm

the (?i) will make the RegEx case insensitive.
@mlauber71 already provided the references ( I don’t have any better)

IMO if you are in hurry get sticked to the like expression and wildcard “*” .

will mean everything untill , of after depending if it is palced before or after.
Simply adjust the the “blank space”

Ex :
“* SA *”

Hope this help , if doesn’t please provide a table with the sample it would be ease to help with examples

odiaz2309 · November 8, 2022, 8:05pm

Hi malauber71,

The first option “Rule Engine” works perfect.

Thanks a lot

system · November 15, 2022, 8:05pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.