Identify variable amount of similar sub-strings in a string and identify there positions for extraction of the sub-strings and nearby text

Hello All, new to KNIME and trying to solve the following:

I have a string from which I want to extract sub-strings that all begin with the same text in this example “hoi”. The amount of times that the text “hoi” appears can vary and the text between does not follow any logic. When the text “hoi” is encountered I want to pick up the text “hoi” and let’s say the next character at the right of the text “hoi” for that I first need to know what the position is of each of the “hoi” text in the string. A loop construction seems the logic solution that stops the moment that after the last “hoi” no further "hoi’ patterns in the text are found. Have been looking at loop examples and trying to get it working but by lacking experience I get stuck. Anybody out there having any bright ideas :smiley:

I attached an example on how I search 4 times for “hoi” in the source string and 4 times pick up the next position in a variable. Of course this is not the solution as it is a fixed amount of runs, but it gives an idea on what the problem is that I want to solve.

Test.knwf (34.9 KB)

Hi @Lynkie01 and welcome to the KNIME forum

Please find below a possible solution to your question:

It is based on the -cell splitter- and the -moving aggregation- nodes:

20211006 Pikairos Identify variable amount of similar sub-strings in a string and identify there positions for extraction of the sub-strings and nearby text.knwf (118.6 KB)

Hope it helps.

Best

Ael

7 Likes

Dear Ael,

Works like a clock ! I’m impressed thanks for your swift feedback, much appreciated.

1 Like

I tested your only your first workflow. If you can use Regexextractor you can also do it with json

reg

br

2 Likes

Hi @Daniel_Weikert

This is an very interesting solution. Could you please upload your workflow here ? Thanks.

Best

Ael

KNIME_project.knwf (15.4 KB)

br

2 Likes

Hello there,

seems like a job for regex? (regexReplace() function can be used within String Manipulation node.)

This one comes pretty close: regexReplace($Test$, ".*?(hoi.{3})" , "$1\n"). Issue is the text after last hoi that I can’t seem to catch. Any ideas?

Btw welcome to KNIME Community @Lynkie01!

Br,
Ivan

1 Like

Thanks Daniel, very effective solution with NotePit for KNIME solution. Works well with the exstention installed. Thanks for your effort.

1 Like

Hi Ipazin, thanks for your post showing the power of regex ! The minus of this solution would be that you need to be sure about the frequence that "hoi"appears in the string that you are processing. Other solutions posted do not have that issue and as such a a bit more flexible. In the situation for which I’m building the KNIME solution that is a plus. But as said nice example of regex !

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.