How to split the contents of a column into different records, having rest of the columns duplicated?

Hi

I have a table like this

Col1 Col2
record1 1. sample content_1 of record1
2. sample content_2 of record1
3. sample content_3 of record1

record2 1. sample content_1 of record2
2. sample content_2 of record2
3. sample content_3 of record2

I want the table like this
Col1 Col2
record1 1. sample content_1 of record1
record2 2. sample content_2 of record1
record3 3. sample content_3 of record1

record4 1. sample content_1 of record2
record5 2. sample content_2 of record2
record6 3. sample content_3 of record2

Note: The table is an Excel Sheet and
1. sample content_1 of record1
2. sample content_2 of record1
3. sample content_3 of record1 is in one cell.

Thanks

Hi Nagaraja,

See attached workflow, hope it helps.

  1. Instead of XLSX file as a data source I have used “Table Creator” node with manually appended data.
  2. Data contained in single cell has to be splitted somehow, I have used “Regex split” node,
    but if you have there some delimiter like comma it’s much easier.

Martin K.

Split_content.knwf (4.7 KB)

1 Like

Hi Martin,

Thanks for your time.

Well, I think I was failed to place my query properly,
The intention of the splitting here is to extract all the contents which are in the same cell with bullets(in this case it’s numbers) irrespective of the contents it has.

(1. \D+\d \D\D record\d) (2. \D+\d \D\D record\d) (3. \D+\d \D\D record\d) in this Regex the key word “record” is static, I mean, it can contain anything and any number of words.
All I want is the contents with bullets should come in different rows, irrespective of texts in it(because it’s not “record” at the end always, it could be a random word, a dot or just a space, literally anything), number of words it has.

Hoping for a more dynamic solution,

Thank you very much.

Hi Nagaraja,

Sorry, I have placed such a horrific regexp string just to demonstrate possibility to split text with regular expression :smiley: . Instead of this, you could try “Cell Splitter” node, if the bullet is used as a delimiter.
Check the corresponding cell in the table outputting from Excel Reader node if this is applicable.

Martin K.

Thanks for the reply Martin,

Yeah I thought of using the “Cell Splitter” but I was wondering, how am I going to use multiple delimiters(in this case they are 1. 2. 3. 4. so on)

Hi Nagaraja,

If content of the cell is divided by non-static delimiters like numbers 1. 2. 3. etc.,
apply RegexReplace function in String Manipulation node to replace variable delimiter with
static value of your choice like semicolon. Then use Cell splitter node to split single content into
appropriate columns.
See attached example.

Multiple_delimiters.knwf (7.6 KB)

Martin K.

Thanks for the logic Martin.

There’s still a problem,
If at all I have a number in the content like “first 25 string” then it gets split too.

Can you please give me a more robust Regex?

Thanks

Hi Nagaraja,

Try updated version, I hope it’s O.K.

Martin K.

Multiple_delimiters2.knwf (10.0 KB)

1 Like

Brilliant, Thank you very much Martin.

Can I also take “space” into consideration? Like "1. " instead of “1.”
This is just to avoid any sentences ending with a number followed by full stop(.)

I’m sorry for asking simple questions, I’m very new to Regex and I will be more than happy if you can share some materials on Regex for me go through.

Thanks for your help.

Hi @Nagaraja_Ganiga,
you can try the following website: https://regex101.com/ to interactively build regex expressions.
best,
Gabriel

2 Likes

Hi Nagaraja,

As gab1one posted, try regex101.com or other online regex testing page to experiment with
various alternatives of regex expression. It’s not so easy to tune a single expression in such
a way that all variants of the data strings would be covered to split cell according expected result.
Regards !

Martin K.