String Split and Replace by

Toby76 · May 5, 2021, 6:50am

Dear Forum,

I have the challenge to derive a new category based on last two digits of a Cost Center code.
Operation Description:
1st Step: Out of the Cost Center code last two digits should be split off as shown in Column “Appendix”
2nd Step: The 2 digit appendix string should be converted as below:
IN '-> CC
AU '-> CA
SE '-> CS
Blank → MIXED

I have attached the data set. Appreciate your hints on how to solve it a lot.

Thanks you very much in advance!
Best regards,
Data_Set.xlsx (11.5 KB)

takbb · May 5, 2021, 7:08am

Hi @Toby76 Can I just clarify your Operation Description.

1st Step… when you say “last two digits should be split off”, in your example, Appendix appears to contain last two letters but is blank if there are not letters. Is the rule that appendix is actually the last two “characters” if they are non-numeric, or that it is the last two characters provided that the Cost Center ID has a length of 8?

thanks

Toby76 · May 5, 2021, 7:13am

Dear takbb, thank you for your question. There is also the case of 6digit cost center. In that case, it blank should be classified as “mixed”.
Please let me know if i need to clarify more.
Again , thank you!

takbb · May 5, 2021, 7:17am

Hi, thanks but I was more trying to clarify the derivation of your “Appendix” column. On the basis of the rules you stated, it looks to me like Appendix for cost center ID “818875” would be “75” as you said it should be the last two digits. Your rules as stated don’t describe how Appendix would ever be blank, so I’ll assume you are right-padding the cost center id to be 8 characters in length.

Toby76 · May 5, 2021, 7:35am

Now I think understand, thank you for asking again. For 6digit basically nothing needs to be split off. 6 digit Cost Center’s category shall be indicated as “Mixed”. Only the 8 digit Cost Centers carry the identifier to derive CA, CC, CS.

takbb · May 5, 2021, 7:39am

Ok, no problem. My padding to 8 characters will satisfy that requirement then.

Here is a workflow containing a couple of ways of achieving this.

In both cases a String Manipulation node is used to handle your first Step. It right-pads to 8 characters and then takes the final 2 characters from the cost center id. So it will contain the two characters, or in the case of cost centre id being only 6 characters long, it will contain an empty string, as it then strips any white space.

strip(substr(padRight($Cost Center ID$,8),6,2))
This piece of code says pad the cost center id with spaces on the right to ensure it is 8 characters in length, then take the substring of that starting at position 6 with length 2 characters, and from the result strip any whitespace.

After that, in the first flow (the basic option) step 2 is handled by a hand-coded set of rules in a rule engine. I’ve included a catch all at the end as I don’t know your data, and it is possible that your rules won’t cover all eventualities, in which case it writes “unknown” to the Result column

$Appendix$="IN" =>"CC"
$Appendix$="AU" => "CA"
$Appendix$="SE" =>"CS"
MISSING $Appendix$ OR $Appendix$="" =>"MIXED"
TRUE => "unknown"

You would need to adapt this to your data as required.

The second suggestion is a little more advanced, but provides a lookup table to satisfy your second step. This translates the two character codes, but doesn’t match the empty string, so that is handled by a subsequent rule engine. There are a couple of extra nodes to ensure your rows are put back in the correct order as the Join can affect the ordering.

It’s not bullet-proof and there are other ways of achieving this too, but hopefully gives you something to work with
KNIME_StringSplitReplace.knwf (38.7 KB)

takbb · May 5, 2021, 7:57am

A “no-code” alternative to the String Manipulation could be to use a “Cell Splitter By position” node with the following config

which tells it to split into two columns “X-Discard” and “Appendix” at position 6, so the first 6 characters go into the first column (which we will be discarding, hence the name I chose) and the remainder into “Appendix”.

You would then add a column filter to remove the unwanted “X-Discard” column.

Toby76 · May 5, 2021, 8:49am

Hello Brian, thanks a lot for your effort and help. I will try it based on your provided workflow.
Be safe!

Toby76 · May 5, 2021, 10:33am

Exactly what I needed, works like a charm!
Thanks a lot Brian!

system · May 12, 2021, 10:34am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.