Multiple Stringmanipulations and the best practice advice

This is my first contact in KNIME after I´m search for a fast data manipulation and automation workflow in gemini. I´m from germany (any german groups here?).

I need KNIME for adress manipulation operations. As you can see in my screenshot I´ve try to manipulate the “Briefanrede” variable. In this variable there are many mistakes and I will correct this or build my own one

  1. Nachname contains Adelstitel und Namenszusatz. (Graf von) - only Zitzewitz is the nachname
  2. Briefanrede Herr Professor Graf von Zitzewitz: only Sehr geehrter Graf von Zitzewitz is correct. When a “Adelstitel” is filled, the Briefanrede is only Adelstitel+Namenszusatz+Nachname. When in field AkadTitel is a Prof.Dr or Prof. or Prof. Dr. med. we only wrote something like Sehr geehrter Professor Mustermann. There are complex structures.

Which nodes are the best and easiest to build. I´m no script master. Perhaps it will be possible to show up all AkadTitel first, build a list of these and do a replacement with it?

Sorry for my english

if its getting “more” complex, i would always opt for the Rule Engine Dictionary.
that way you have your data in, and a second input for your rules.

unfortunately, rule engine is still limited to selective results but cannot do operations (you cannot say: if a is present and b is missing, write Text + b.

Knime has always been lackluster in this regard. Obviously, you can write long or nested if...elseif...else... statements in the Expression Node or Column Expression node (or Java Snippet / Python Snippet) but all of those are lackluster.

there are still plenty of ways to solve this and it heavily depends on your data.
My favorite approach (even if by far not the best performance) is using Row Splitter (I usually prefer Rule Based Row Splitter) and stepwise break the data into smaller chunks and process each chunk. At the end you just concat all small tables together. The nice part about this is that your dataset gets smaller and smaller, but you will always be able to easily catch “new” variations or cases you haven’t considered if your last chunk is not empty

2 Likes