Selecting sections or URL strings

Hi all,

I know there is an node that offers the equivalent of Excel’s ‘text to columns’ and there are options to take segments of strings based on which part of the string they are in - however, is there a way to separate sections that appear in different positions within a URL?

For example, if I had the set of URLs below, and only want the final set of numbers (in this case represented by 12345, but of course each number is different in the real scenario):

www.example.com/folder/subfolder/12345
www.example.com/folder/subfolder/12345?random
www.example.com/folder/12345
www.example.com/folder/12345?random
www.example.com/folder/subfolder/docid=12345
www.example.com/f/12345
www.example.com/folder/6789/12345

How might I extract these numbers, as they are document IDs that I need to use for unique identifiers.

Thank you in advance for any help,

Kind regards,

Steve

Assuming you only want the numerals, then a Java Snippet node is probably the best bet, as follows:

File f = new File(c_url);
C_output = f.getName().replaceAll( "[^\\d]*([\\d]+)[^\\d]*" , "$1" );

Where c_urlrefers to you incoming column, and C_output to your new column.

You will also need a line as follows in the custom imports section of the snippet:

    import java.io.File;

Steve

1 Like

Hi @SteveO

Another way to do it, is replace all possible “strange” characters in your url with a String Manipulation node. Then use a Cell Splitter node, to split te url on the delimiter “/”. With a Column aggregator you can find the minimum value over the columns (numbers, before characters). See the enclosed workflow.

split_url.knwf (10.5 KB)

gr
Hans

I think that use of string manipulation is simpler, see below


Also, the last line will have joined numbers not just last one.

1 Like

Hi,

You can use a modified version of the regex provided by @s.roughley in a String Manipulation node suggested by @izaychik63:

regexReplace($column1$, ".*\\/[^\\d]*([\\d]+).*", "$1")

This regex works for the last URL as well.

Best,
Armin

4 Likes

Nice one :slight_smile:
Ivan

1 Like

Wow these are all excellent, thank you very much to all of you. I have to say, this forum seems full of extremely helpful people - very kind of you!

2 Likes

This worked perfectly!

Is there a way to have the resulting data, plus also keep the URL in another column - I assume I need to use a node before-hand to duplicate the column first?

If you use “Append Column” option, the main string will remain untouched and the result will appear in a new column.

Armin

Great, thank you!

1 Like

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.