Replace in Column Expression

Guys,
Once again I ask for help, because I believe that because I am a beginner I am applying some wrong rule.

I learned here on the forum with the help when I needed to use the Column Expression node where it was useful at the time. Now I’m applying what I learned in a case but I’m not succeeding.

I need to change each line of the file by applying the following rule:

change the last line information from |12120| to |23110|

also change from |41100| to |23110|

each time it contains the following information in the line:
“|C191|” or “|C195|” and “|98|2202|” or “|98|1202|” and the last field like |12120| or |41100|

If there is another method with another node to apply this rule, I’m ready to learn as well.

Here’s a small table model and the code I’m trying to apply. My code is applying to every file even on lines that shouldn’t be changed.

Thank you for everyone who can help me achieve the result.

if(or(and(contains(column(“Col0”),"|C191|","|C195|")),("|98|1202|","|98|2202|")) == true)
{
replace(column(“Col0”),"|12120|","|23110|");
replace(column(“Col0”), “|41100|”, “|23110|”)
}

Replace.knwf (10.7 KB)

Hi @andymesmo , would it be possible to share some sample data?

1 Like

Definitely, @bruno29a .
Follow sample

tks
efdSample2.txt (2.1 KB)

Hi @andymesmo , thank you for sharing the data.

Is a whole like a record, as in there is only 1 column, or is the data delimited by pipe “|”?

You mentioned “and the last field like |12120| or |41100|”, does this mean that it’s pipe delimited?

If it is delimited by pipe, some rows are missing columns. You can see for example, line 5, it has 9 pipes as opposed to the 4 lines above it that have 13 pipes. Similarly for lines 12, 15, 18, 26, 27 and 32. Is that “normal”? If that is the case, node that the last information on these lines will be in the 8th or 9th column instead of the last column.

1 Like

@bruno29a , thank you for returning.

The file does have different structures because it is a tax file with different standards. I don’t consider “|” (pipe) as a delimiter because of the structure of the file itself. This is a small sample of the 48 million lines that have the original file where I need to make changes to the rule mentioned in the first message.

I developed a few lines to meet the need, but with this method, I need to run a few times making the changes every “run”

When I tried to nest with “Else if” I ended up making changes to lines that it shouldn’t, so I did it in part at the moment. The purpose is to try to apply a rule to one node or another, but I want to avoid running the same flow sometimes having to make changes between one execution and the other.

Follow the code I used temporarily where I changed the “IF” and “Replace” line to apply all the rules

//alterar |C195|; |98|2202| → |41100|,|23110|

if(and(contains(column(“Col0”),"|C195|"), (contains(column(“Col0”),"|98|2202|", “i”))) == true)

{
    replace(column("Col0"),"|41100|","|23110|")  //alterar |41100| para |12120|
   
}

//alterar |C191|; |98|2202| → |41100|,|23110|
else if
(and(contains(column(“Col0”),"|C191|"), (contains(column(“Col0”),"|98|2202|", “i”))) == true)
{
replace(column(“Col0”),"|41100|","|23110|") //alterar |41100| para |12120|
}

else 

{
    column("Col0")

}

Hi @andymesmo , I understand for the file structure.

Your use of if and else if and else seems to be in order, though I’ve not looked into the conditions in them.

It’s a bit hard to grasp your rules/conditions, and I think that’s perhaps the issue if you are not getting the expected results. You need to just revise the rules and write them down clearly, and it will help you write the correct conditions in the if statements.

For example, may be use some brackets to clearly define which OR goes with what, and which AND goes with what:
“|C191|” or “|C195|” and “|98|2202|” or “|98|1202|”

And for this part:
replace(column(“Col0”),"|12120|","|23110|");
replace(column(“Col0”), “|41100|”, “|23110|”)

This is going to change ALL instances of “|12120|” or of “|41100|”. Is that what you want, or did you want to change only these if they are at the end of the string?

1 Like

Hi @andymesmo , I went ahead and made the assumption that your rules are actually:
(“|C191|” or “|C195|”) and (“|98|2202|” or “|98|1202|”)

And of course, the last substring should be “|12120|” or “|41100|”

So basically, the rules would come down to:
if record has substring of (“|C191|” or “|C195|”) and (“|98|2202|” or “|98|1202|”) and last part of record is
“|12120|” or “|41100|”, then replace last part of record with “|23110|”.

I implemented this piece of Column Expression:

searchstring_length = length("|12120|"); //Can also use |41100|, they're of the same length
record_length = length(column("Col0"));

searchstring_position = record_length - searchstring_length;
last_substring = substr(column("Col0"), searchstring_position);

if((contains(column("Col0"), "|C191|") || contains(column("Col0"), "|C195|")) && (contains(column("Col0"), "|98|2202|") || contains(column("Col0"), "|98|1202|")) && (last_substring == "|12120|" || last_substring == "|41100|")) {
    join(substr(column("Col0"), 0, searchstring_position+1), "|23110|");
} else {
    column("Col0");
}

This is the result I get based on the logic:

Explanation: In the end, I did a replace that is done without a replace() function. I applied the rules I stated above, and if the rules are satisfied, I took the substring up to the last part (so record excluding the last part) and then added “|23110|” to it.

If the rules are not satisfied, then keep as is.

I saved the results in a new column so you can see which rows changed.

Here’s the workflow: Replace in Column Expression.knwf (8.0 KB)

1 Like

@bruno29a , thanks again for the help master.

The rules you applied are exactly as you understood them. In fact, it can be confusing, but thank you for having the clarity of understanding.

I confess that I’m learning a lot using the tool in real situations. I don’t know all the “syntaxes” like using the logical operator && and the operator | | in the expression within a code. Maybe that’s why I have a limitation to develop code like yours, with a high level of knowledge. One day I want to reach that level of yours. Now I looked for the logic table on the internet and I understand the syntax better.

For the part below, the goal is to change ALL instances from |12120|" to “|23110|” and from |41100|” to “|23110| obeying the rule in the lines that contain (“|C191|” or “|C195|”) and (“|98|2202|” or “|98|1202|”) and this was the attempt with the replacement. But your code perfectly met the case I presented.

replace(column(“Col0”),"|12120|","|23110|");
replace(column(“Col0”), “|41100|”, “|23110|”)

Grateful

No problem @andymesmo .

“&&” simply means “AND”, and
“||” simply means “OR”

1 Like

@bruno29a , what you did in the code was spectacular and it really met what needs to be done in the changes when I ran it on the original file but, I noticed that there was a small change in Kb in the size of the file in your property.

Could you please clarify about the variable (searchstring_position = record_length - searchstring_length) ?

In the original file, there can be no change in its structure, except for the content of the rule, even if, for example, there is a space at the end of the line in the original file. It must be kept.
I will share the node settings in the zipped file.

I noticed that in the variable (searchstring_position ) stores the length of the column - the length of the string to change “|12120|” and that in the use of “join” this variable was used with “+1”

Taking advantage of the clarification, in the “join” line what does the “0” element mean? What happens on that line, please?
join(substr(column(“Col0”), 0, searchstring_position+1), “|23110|”)

In your message below the image with the result there is an explanation. I believe that modifying this should solve it. I’m uploading the original zipped file (the smallest I have). I tried to tinker with the “join” code to only change and compare the file size. Unsuccessfully.

Below are the file sizes after testing.
Name Size
ASV - EFD 022019_changed.txt 202029 KB
ASV - EFD 022019_changed_knime.txt 202044 KB

Compressed files: ASV - EFD 022019_changed.zip - Google Drive

Hi @andymesmo , I’m using the file that you provided, without modification.

After you clarified that you want to replace ALL instances of |12120| and |41100|, use the replace() function as you showed, instead of the join(). The join() that I used was to replace only these instances at the end of the record since I thought that’s what you wanted to do. The join() will not replace all of the instances, only those at the end.

The replace() statement will replace ALL instances, so use the replace that you have, that’s good enough

replace(column("Col0"),"|12120|","|23110|");
replace(column("Col0"), "|41100|", "|23110|");

The “0” in the “join” line actually belongs to the substr() function, which simply says to start at index 0. You can read more on the function by clicking on it from the function list in the Column Expression.

EDIT: Actually for the replace() statements, you should use nested replace() statements in order to apply both, like this:
replace(replace(column("Col0"),"|12120|","|23110|"), "|41100|", "|23110|");

1 Like

@bruno29a

The use of the nested replace brought a more effective result where the file size did not change as expected, meeting the criteria of the rules.

If at any point I’ve been pushy, or uncomfortable, I apologize @bruno29a but I was able to gain more knowledge with your clarification and guidance.

thank you so much

Hey @andymesmo , there’s nothing to apologize for, you have not been pushy or uncomfortable at all.

You are most welcome and I am happy to help.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.