Remove everything after a specific string

JulianK98 · April 19, 2022, 3:31pm

How is it possible to define a string, after which everything else in a cell should be removed?

badger101 · April 19, 2022, 3:34pm

Hi @JulianK98 , depending on the pattern that you see in all the rows, you’ll have to use ‘Cell Splitter’ Node, ‘Cell Splitter by Position’ Node, or ‘String Manipulation’ Node (using its “substr” formula) followed by Column Filter Node.

You’d have to upload a sample of data if you’re not sure how, so people here can help.

JulianK98 · April 19, 2022, 3:44pm

Hi, thank you!

Unfortunately I really don’t know how.
Here’s a sample. I would like to remove everything except “Stadt Leverkusen” and “Stadt Marl” in this List.

Test1.xlsx (9.0 KB)

badger101 · April 19, 2022, 4:06pm

Solution Stripping.knwf (41.8 KB)

I’m sure others can do it way simpler than what I did here; please find the solution as attached.

Update: Sorry when I re-read your question, you want to extract only the first two words right? Give me a few minutes to re-do the workflow, cause the one I sent was the opposite of what you wanted.

badger101 · April 19, 2022, 4:19pm

Solution Stripping.knwf (47.2 KB)

Here’s the updated solution.

What you seek is either in the output of the last Node (i.e. Duplicate Row Filter), or in the output of the second last Node (Concatenate), depending on the final form of the extract that you want. @JulianK98

JulianK98 · April 19, 2022, 4:22pm

Oh thank you very much, that works here! But what if the original sheet is way longer and there are several parts like these. And in between there are also parts of the list which can stay the same. Then this Partitioning Nod won’t work, right?

badger101 · April 19, 2022, 4:31pm

By “way longer”, your concern is that there’ll be more variations of the first two words other than merely “Stadt Leverkusen” and “Stadt Marl” right?

And can you clarify on what you meant by “parts of the list which can stay the same”?

I can adjust the workflow for the first concern using GroupBy Node.

Also, can you inform me what the final form of the extract you would like to see? (e.g. Is the end goal is to extract the names of the cities mentioned in the whole database?)

JulianK98 · April 19, 2022, 7:24pm

Here’s an example about how the real sheet I have to reorganize looks like. (It’s about German cities and public organizations of the cities btw)
In the end I want just the city names in the list (like Stadt Marl, Stadt Leverkusen etc). As you can see, there are some rows in which are already just the names of a city (rows 11,12,23).
So these rows can stay the same.
And for the other parts I have to delete all the stuff behind the city name.
So I would have to execute your workflow for specific parts of the list (row 1-10, row 13-22, row 24-27…). And that won’t work simply with the Partitioning Nod?
Test1.xlsx (9.1 KB)

badger101 · April 19, 2022, 7:37pm

Hi @JulianK98 ,

Stadt Workflow.knwf (27.1 KB)

For the new dataset file you shared, here’s the solution to your end goal.

This works following two conditions:

Each row of your dataset begins with the two words following the same pattern of Stadt + X.
There is no other characters (including spaces) before the first word (Stadt).

Additional note:

X = name of the city
If X is a two-word city name, that name should be hyphenated. (I’m not saying Baden-Württemberg is a city, but let’s assume it is for the sake of argument, it’s a good example of a hyphenated compound word that will work for my Solution).
If X contains an umlaut like Köln, make sure all of Köln variants are consistent, as opposed to Koeln, for example. Inconsistency of spellings & umlauts will lead to inaccurate duplicates in the final result.

system · July 18, 2022, 7:37pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.