Replace - PadLeft

andymesmo · March 8, 2022, 6:08pm

Hey guys,

I’m getting better at KNIME and I need some help. I think it’s a little short for the result I need. Attached is the data flow and the comment inside the node.

I need to use the column express node and make a change to the first occurrence of the specific lines when I encounter a string |50| and change to |53|. I tried to use the padLeft function in the node and substring code but, it’s swapping all occurrences.
The verbose comment is inside the node.

The original file is huge and unstructured so I don’t use a delimiter. This file is a sample only.

Thanks.
firstsearch.knwf (20.1 KB)

bruno29a · March 8, 2022, 6:40pm

Hi @andymesmo , the padLeft() function is used for padding something, in this case, to the left.

For example padLeft("1", 3, "0") means I want to have at least 3 characters, and fill up with “0” if the input data is less than 3 characters. So “1” will become “001”.

Similarly padLeft("11", 3, "0") will become “011” and padLeft("111", 3, "0") or padLeft("1111", 3, "0"), etc will remain as original data, since they are already 3 or more characters.

I haven’t looked at your workflow, but I can’t see how padLeft can be used based on the explanation of “change to the first occurrence of the specific lines when I encounter a string |50| and change to |53|”. Could it be that you misunderstood what the padLeft function does?

EDIT: I downloaded your workflow. In your expression:
replace(column("Col0"), padLeft("|50|", 0),"|53|")

padLeft("|50|", 0) basically would be just the same as the original string “|50|”.

So your expression is the same as:
replace(column("Col0"), "|50|", "|53|")

which means replace all instances of “|50|” with “|53|”, provided the if conditions are met that is.

What exactly did you want to change? Can you explain a bit? It sounds like not all “|50|” should be change. In which case should it change?

duristef · March 8, 2022, 6:56pm

if you need to change the first occurrence of “|50|” into “|53|” this regex should do the job regexReplace(column("Col0"), "^(.*?)\\|50\\|(.*)$", "$1|53|$2")
in the “Column Expression” node. Or, you can use a slightly different string in the “String Manipulation” node regexReplace($Col0$, "^(.*?)\\|50\\|(.*)$", "$1|53|$2")

bruno29a · March 8, 2022, 7:16pm

Nice one @duristef . It’s probably needed in the Column Expression since this replace seems to be applied only under certain conditions.

@andymesmo the regexReplace suggested by @duristef will work if you want to replace only the first occurrence of “|50|”.

With respect to padLeft("|50|", 0), you can always test this separately, in the Column Expressions. Just create a new expression (that you can delete after), add the expression, click on Evaluate, and you can see the result:

Similarly, you can test @duristef 's expression like this:

bruno29a · March 8, 2022, 7:41pm

Hi @andymesmo ,

Just going back to what you were attempting, with substr().

Basically the high level logic should be:

Get substr from beginning to before first occurrence of your search string ("|50|")
Get substr from after first occurrence of your search string to the end
Glue both substrings with your replace string ("|53|")

Something like this would do the trick:

                searchstring = "|50|";
                idx_of_searchstring = indexOf(column("Col0"), searchstring);
                idx_of_string_after_searchstring = idx_of_searchstring + length(searchstring);
                join(substr(column("Col0"), 0, idx_of_searchstring), "|53|", substr(column("Col0"), idx_of_string_after_searchstring));

As usual, we can test this:

stringdata = "|abc|50|50|123|50|12|";
searchstring = "|50|";
idx_of_searchstring = indexOf(stringdata, searchstring);
idx_of_string_after_searchstring = idx_of_searchstring + length(searchstring);
join(substr(stringdata, 0, idx_of_searchstring), "|53|", substr(stringdata, idx_of_string_after_searchstring));

Results:
|abc|53|50|123|50|12|

And this would go right in your if block:

andymesmo · March 8, 2022, 7:44pm

@bruno29a

Thanks a lot for the help too.

@duristef’s suggestion worked

Once again, thank you for everything

bruno29a · March 8, 2022, 7:47pm

No problem @andymesmo , pleasure to help.

If you used the regexReplace suggested by @duristef , then you should assign the Solution to @duristef 's post, which is the #3, not to me, since the credit should go to @duristef in that case

andymesmo · March 8, 2022, 7:50pm

@duristef

Thanks a lot for the help

Your suggestion has been used successfully.

Can you please explain to me the function with its respective characters used to obtain the result?

Thanks.

andymesmo · March 8, 2022, 8:32pm

@bruno29a

Very good!!!
Sensational.

I also liked this method used with these other functions.

Congratulations.
The good thing about KNIME is that it is very flexible with several solution options for the same case.

@bruno29a, this suggestion will also be useful.

I’m already using @duristef’s suggestion

Thank you all.

bruno29a · March 8, 2022, 8:40pm

Hi @andymesmo , the regular expression "^(.*?)\\|50\\|(.*)$" basically means exactly the same thing as what I did with the substr().

Before looking at the expression, let’s first look at this one instead:
"^(.*?)50(.*)$"

The difference here is that we are looking for “50” instead of “|50|” (I’ll get to “|50|” later).
^(.*?) means anything from the beginning of the string up to the first occurrence of “50”. If it was ^(.*), it would be up to the last occurrence of “50”. That’s the first expression, and the results can be referenced to by $1

The other group after “50” is (.*)$ means anything after “50” to the end of the string, and the result of this expression is referenced to by $2 (group 1 as $1, group 2 as $2, group 3 as $3, etc).

So, the expression should be:
^(.*?)\|50\|(.*)$

Now, because we are passing the expression as a string, we need to also escape the slash, since the slash itself is also interpreted as the escape character.

That is why the expression ends up being: "^(.*?)\\|50\\|(.*)$"

For the replace portion "$1|53|$2", it’s basically gluing the result of the first group and the second group with “|53|” in between.

andymesmo · March 8, 2022, 8:49pm

@bruno29a

Surprising!!!

My God!
There is no better explanation than this.

Very enlightening each step.

Thank you very much.

duristef · March 8, 2022, 8:50pm

@andymesmo,
@bruno29a 's explanation is absolutely perfect.
Regular expressions, basically, are powerful “pattern finders/replacers”. If you want to learn how to build them you can find many good tutorials online, for example https://www.regular-expressions.info/ or https://regexr.com/, where you can also test the regex you build.

andymesmo · March 8, 2022, 8:54pm

@duristef

Excellent shared content too.

I was not aware of these tutorials regarding the function. I’m already following.

I learned a lot today from these high-level interactions.

Thank you all

duristef · March 8, 2022, 10:12pm

Glad I was able to help!

andymesmo · March 12, 2022, 4:35pm

After clarification and research, I added below some of the variables used:

system · March 19, 2022, 4:35pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.