Splitting data in cells

svijay · March 7, 2023, 2:52pm

Hi Everyone,

I’m trying to split data in an excel spreadsheet so the output is event name and data

There are several event names + dates that I need to split.

I have created a reference table and joined the data but that is quite time consuming for large quantities of data. Is there an alternative solution? Can you help, please?

Thank you!!!

gonhaddock · March 7, 2023, 3:10pm

Hello @svijay
Assuming that your text start with ‘event’ separated with comma - space, and ‘date’ format afterwards starting always from day; You can test with the following code in a ‘Regex Split’ node:

(.*),\s+(\d+st.*|\d+nd.*|\d+rd.*|\d+th.*)

BR

svijay · March 7, 2023, 3:57pm

What if there is no comma? between ‘event’ and ‘date’?

svijay · March 7, 2023, 4:00pm

I just tried it and it did not work!

gonhaddock · March 7, 2023, 4:40pm

Hello @svijay
Yes, it was too literal…

Trying to deliver a clean separator its good to add OR alternatives, aiming to remove the comma in one step when there is a case:

(.*?)[\s,.;]+(\d+st.*|\d+nd.*|\d+rd.*|\d+th.*)

with this code will split any of those characters included in square brackets. If you find more special characters in the separator that you want to exclude, just add them into the code.

Clean, ist it?

BR

svijay · March 8, 2023, 3:18pm

Thank you this code is super useful.

One question what if the text looks like this:
Playing professional : basketball, 1st January, 3:00pm - 5:00pm

Will the extra “:” create a problem? Would the code change?

gonhaddock · March 8, 2023, 4:25pm

Hello @svijay
No, it won’t affect. So you can drop your ‘colon (:)’ within the square brackets.

It does only affect the symbols together to the last separator group space/es (\s+) plus special characters in any sequential order, and as many as keep in the + (see example):

(.*?)[\s,.;:#]+(\d+st.*|\d+nd.*|\d+rd.*|\d+th.*)

This magic happens because of the ‘?’ in the first capturing group, that turns it into lazy mood.

BR

svijay · March 8, 2023, 4:37pm

Thank you very much!

Is there anything else I need to input into Regex node?

For some reason the code is not working and I’m wondering now I’m doing something incorrectly. Thank youuu

gonhaddock · March 8, 2023, 4:51pm

Hi @svijay
Based on the picture in your description you are selecting $Event name$ column, that is the already split sample. You would need to select an ‘splitable’ column. Otherwise you should provide a testing sample in a file or a shared workflow.

Please find attached the latest posted sample text in a workflow:
20230308_splitting_text_stepping_on_staring_of_date_v0.knwf (12.9 KB)

Good luck!

svijay · March 8, 2023, 4:57pm

Thank you - it is strange it is appearing as something already split when from my side I am doing exactly what is showcased in the sample text!

Thanks for you’re help - I’ll figure it out.

gonhaddock · March 8, 2023, 5:14pm

Maybe your text have some unicode hidden characters, that are not included in the regex. By providing few cells of the text column, we could inspect those problems…

svijay · March 9, 2023, 2:51pm

Hi, so I have figured out went wrong. The above formula works very well for the example I shared with you. However, that example was using dummy data. The actual data is sensitive so I cannot easily share.

The actual data consists of multiple event names so I’m now wondering if the (\d+st.|\d+nd.|\d+rd.|\d+th.) is ok to use if the event names are different? As I think the code might not be applicable to different event names. Thank you so much for all your help!

system · June 7, 2023, 2:52pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.