Text processing on URLs

I'm trying to do text processing on a list of URLs to see which words appear more frequently in the paths of the URLs, but when I replace the symbols [!#$%&'\"*+,.\?:;]+ with a space(" "), it still treats the whole document as one word, for example if I try to use the "Bag of Words creator". 

As an example, I want that this URL

https://www.modernghana.com/news/788266/church-must-join-sexual-health-education-campaign.html 

gives me something like ["church", "sexual", "health", "education", "campaign"], but what I get is just "church must join sexual health education campaign".

It's a bit hard to explain, but I hope you understand the problem. 

Hey marit,

I can't reproduce your problem. Which nodes did you use? 

I attached a screenshot of my solution. I haven't got any problems with the tokenization of the document.

Cheers,

Julian

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.