String Manipulation and cleaning data

I connect to a csv file and extract a list of first names
They are not cleaned and have examples like Title in the Firstname Mr or Miss
or Full stops in them, or they have two names like Steve White, where we want to split them up and put Steve as the first name and White as the last name.
If the firstname is Mr.Steve then we would want to put Mr in the title field and the update the firstname to Steve.
I know I can find things with RegEx but need a little bit of help to create the logic to split things up rather than just do one step

Hi @arvindev ,
You might want to try using the Cell splitter node, you split on “.” and output the string as new columns, and then you can rename those columns using Column Renamer.

Let us know if it helps.

Thanks,
Sanket

1 Like

Thanks foy your help! I did as you said and now I get the two fields. What I want to do next is delete any fields thats are less than 1 characters in the first split column.
And then what I need help with is how do I compare the columns and perform updates on calculated function.
For example if the Column 1 field is ‘Mr’ then I want to replace it with the Column 2 value.
I also want to know how I can create a new column called Title if the Firstcolumn has ‘Mr’ or ‘Mrs’ in it and then replace it with the Column 2 value.

You’d need to use a column expressions node to calculate the length of the value in a column, then you can filter rows based on thata column.

Use a column expressions or rule engine node for your other comparisons/replacements.

Where can I find any examples of Column expressions i…e examples of complex use of expression builder?

@arvindev here is an example of how to employ a local LLM to sich a split with somewhat unstructured data. Maybe you can adapt the prompt.

2 Likes