I'm trying to do text processing on a list of URLs to see which words appear more frequently in the paths of the URLs, but when I replace the symbols [!#$%&'\"*+,.\?:;]+ with a space(" "), it still treats the whole document as one word, for example if I try to use the "Bag of Words creator".
As an example, I want that this URL
https://www.modernghana.com/news/788266/church-must-join-sexual-health-education-campaign.html
gives me something like ["church", "sexual", "health", "education", "campaign"], but what I get is just "church must join sexual health education campaign".
It's a bit hard to explain, but I hope you understand the problem.