Identify prefix in a column

Claire · September 24, 2020, 12:16pm

Hi,

In my table, I have a column which contains sample names. These sample names have a common prefix which varies from one experiment to another. I need to remove this prefix. Is there a node which can identify that prefix for me?

Thanks for your help,
Regards,
Claire

tommy · September 24, 2020, 12:43pm

hi @Claire, is there any common rule regarding the prefix?

number of characters at the beginning of the string?
delimiter “-”, “_”, " " or other?
If so, the String Manipulation node would be the solution.
If not, it gets more complicated (Cell splitter, Constant Value Column Filter,…)

Greetz, Tommy

Claire · September 24, 2020, 1:02pm

Hi Tommy,
Unfortunately no. I thought about using a delimiter but users are not following consistent rules and the same delimiter can be used both in the common prefix and then in the remaining part of the sample name.

Cheers,
Claire

HansS · September 24, 2020, 1:07pm

What is the length of the prefix. Is it always the same?

Claire · September 24, 2020, 1:08pm

From one experiment to another, no. But within the same column, yes.

Cheers,
Claire

tommy · September 24, 2020, 2:34pm

hi @Claire

so you have to generate all possible prefixes as substrings and check if the value column starts with the corresponding substring. Then you have to choose the minimum of the matching substring. That would be your prefix.

Please have a look at the following HUB workflow:

In my example the max. length of the prefix would be 10 (=number of loops). You may increase or decrease that number. You have to re-run the prefix simulation (top part) for each of your experiment.

Hope that helps, Greetz, Tommy

Claire · September 25, 2020, 8:34am

Hi Tommy,

Many thanks for your example. It’s great.

Cheers,
Claire

system · March 26, 2021, 8:34pm

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.