Split one column to multi columns based on specific string

Hi there,

I am trying something simple but cannot find it the forum. So apologizes in advance if it is too obvious.

I have one single column with a lot of strings and need to parse data in it at a later time. However, within the same column there’s one String that appears as a common pattern. Every time that same string appears, I would like to break out to another column and continue as I will have this happening several times.

Eg:
xyz
abc
sdf
sdfsdf
aaa
a
dfdsfadsgag
sdfd
xyz
qwqq
qqqq
zzzzz
343ass
sdfasd
923743274

Output col1:
xyz
abc
sdf
sdfsdf
aaa
a
dfdsfadsgag
sdfd

Output col2:
xyz
qwqq
qqqq
zzzzz
343ass
sdfasd
923743274

Thanks a lot
J.

Hi @jarviscampbell

From your data example, I understand that you are wanting to break each group of rows into a new column whenever a row containing “xyz” is encountered.

If you have data in a column that you want to group together based on a “marker” row, then a “pattern” that I find useful for this is a Rule Engine followed by a Moving Aggregation. Those two nodes don’t obviously stand out as the ones to use in this situation but they do work remarkably well together.

image

Firstly the Rule Engine is used to create a new column to mark the “group-start” rows. It is important that it marks these rows with a 1, and all other rows with a 0.

This returns the following result:
image

It doesn’t look much, but combining that with a Moving Aggregation to cumulatively sum these “ColumnGroup” values:


We get…
image

and you can see that each of the groups of rows that we want to break out now have a unique label identifying their group.

Now we need to batch these up so we can turn them into columns. To collect rows together, the GroupBy node can be used to turn multiple rows into a single row containing a List[].

image


and then follow it with a Column Filter to retain ONLY the List column:

so we get this:
image

After that, the Transpose can switch this from being rows to columns, so now each of the groups becomes a column,
image

followed by an UnGroup node which is very useful in turning single row List[] items back into multiple rows

image

Some tidying up, and you have output similar to what you are after

image
image

Split rows in to columns.knwf (49.0 KB)

5 Likes

Thank you very much @takbb for your throughly analysis. I am trying to run my own file w your solution, however, apologizes in advance, but I should have mentioned that the ‘xyz’ is part of the string in a cell.

Unfortunately with your solution it only accounts for a cell with exact match.

So Column 1 will be:
xyz saaaa dfdf
abc adsfsafd
sdf
sdfsdf
aaa
a
dfdsfadsgag
sdfd
xyz wlaksjdf
qwqq sadfaf
qqqq
zzzzz
343ass
sdfasd
923743274

Output col1:
xyz saaaa dfdf
abc adsfsafd
sdf
sdf
sdfsdf
aaa
a
dfdsfadsgag
sdfd

Output col2:
xyz wlaksjdf
qwqq sadfaf
qwqq
qqqq
zzzzz
343ass
sdfasd
923743274

Thanks a lot,
J.

Hi @jarviscampbell , you could adjust the rule engine rule to either use LIKE and wildcards, or MATCHES and a regular expression. The idea of how to do the rest should remain the same.

2 Likes

Worked like a charm @takbb, Thank you very much!

2 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.