Hi all, hope everyone is doing well. I have a table with a col for full URL string and another col for just the domain portion of that URL. I have many more cols but looks like this (blueshoes/greenshoes/pinkshoes/).
I need to extract the subdirectory url after the domain portion using regex, like this.
I can take the full url and replace the domain url portion to extract the remaining portion of the url IF for one row with String Replacer because I can just hard-code the domain url Pattern for that one url…but the list have many different URLs so my replacement pattern list grows. I am able to create in the workflow .txt file with [“domain portion of url with / at the end”, “/”] to use as the dictionary .txt. It gets rid of domain portion after .com/ and appends / again.
It doesn’t do anything. I have a row with https://www.site1.com/123/456 so that should output /123/456, but it has the original full url.
Is there a better way to do this or am I missing something with the node config?
Thank you very much and stay healthy!!
So you have some URLs like https://www.site1.com/123/456 and you want /123/456 as output.
Is this what you you want?
Yes, and I’m able to do it for one row or if the entire data sheet contains urls from one domain with different subdirectory urls. I have many URLs from different domains in each rows (https://www.site1.com/asdf/qwer/asdf, https://www.differentsite.com/oij/nj/rth, so forth) and to parse the domain portion for each row, I’m trying to replace https://www.site1.com/ with / while in the loop for each row.
Check this one:
KNIME_project5.knwf (7.6 KB)
Hi @mehrdad_bgh Wow, power of regex with genius. This is AWESOME!!! It’s so much simpler than all the steps I have (in my unsuccessful attempt). Let me run more urls now. Thank you so so much, @mehrdad_bgh!!
Hi @mehrdad_bgh, yes this totally works. Is there a way to get the first level directory portion? That was my next step and I couldn’t do it together with how I was trying to address. You proved that I need to learn regex. Thank you, @mehrdad_bgh!!
I’m glad it worked for you.
Try this regex for your first level directory:
Let me know if you need anything else.
I’m having difficulty incorporating this step (/[^/]+.) into the workflow you shared earlier. I’m getting mixed results. I’ll keep trying. Thank you again, Sir!!!
Try this one:
KNIME_project5.knwf (6.5 KB)
Yes, this works beautifully. And it’s just that one regex. Thank you again so much for your support and help!! I’m loving KNIME and the community and folks like you providing so much support. Thank you for your kindness and please stay safe and healthy. I’m going to keep working on it now.
Thank you, Sir.
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.