Extracting only the last few words.

badger101 · April 19, 2022, 10:51am

Greetings!

I want to extract the last few words from sentences of varying length.

I have looked at the forum and I found this closed thread (see below), but it only dealt with the last word instead of the last few words. In my case I am looking to extract the last 2 or 3 words. I don’t think the Column Aggregator (as mentioned in that thread) has such configuration.

I am thinking I can use the column filter node, but that would require the table to be rearranged (Please refer to the attached image)

Would appreciate some help with this.

How do I do the table rearrangement? If I am to rearrange through Column Resorter Node, it’ll affect all rows.
If there are other ways to work around my idea without table rearrangement, please share. Thanks!

gonhaddock · April 19, 2022, 11:12am

Hello @badger101
Maybe this workflow can be useful for your case…

You may need to name your columns in a sequential way. And work around in the ‘Rank’ node with descending order.

BR

badger101 · April 19, 2022, 11:30am

Thanks @gonhaddock , I have read the forum where the workflow originates from. But I don’t know how to connect that issue with mine, and I also don’t understand the issue completely hence I can’t connect how it would be useful to my case. Maybe if you can clarify it in simple ways and help point the connection, it’ll be helpful.

bruno29a · April 19, 2022, 12:54pm

Hi @badger101 , here’s a way to do this:

Input:

Results:

The idea is to do this using Regex.

The approach:
1- Aggregate all columns so that each row becomes a string (Concatenate with a space while ignoring the missing values):

2- Use Regex to extract the last 2 words:

3- Split the 2 words into 2 columns using the Cell Splitter

Here’s the workflow: Extract only last few words.knwf (10.2 KB)

badger101 · April 19, 2022, 1:01pm

@bruno29a Thanks, I’ll upload the workflow soon and will update here.

badger101 · April 19, 2022, 1:55pm

@bruno29a Thank you again. Your solution works for the dummy data I’ve given. Since I am working with names of servers, the regex formula that you provided in the workflow which is

regexReplace($Concatenate$, “(.*)\s(\w+)\s(\w+)$”, “$2 $3”)

only works for some rows. The image I attach below shows the rows labeled with green colors which gave correct results as I wanted. The non-colored rows gave the incorrect output.

(Note: The server names examples are editted for privacy reasons. I replaced the real alphabets and real numbers with random alphabets and numbers, but the structure and positions are similar to the real raw data.)

Can you help me understand the regex formula you provided, so I can manually edit it to fit my data?

badger101 · April 19, 2022, 2:11pm

Updated: I think I have found a way to the last question. I just need to use the URL Domain Extractor Node, and add http:// or https:// at the beginning of the server names for the Node to work.

Thank you. I have marked your workflow as Solution since it works on the dummy data. @bruno29a

bruno29a · April 19, 2022, 2:12pm

Hi @badger101 , it seems like \w does not include words with dashes (-).

I modified the expression to include the dashes.

Here’s the updated workflow: Extract only last few words.knwf (10.4 KB)

system · April 26, 2022, 2:13pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.