How to get substring from a string column in KNIME ?

Hello all,

I’m relative new in KNIME and I’m facing few difficulties. I have a Table file which I imported/read in my workflow. This file includes various tweets from different user in the column “Tweet” and this column has the whole text as well as all Hashtags. I want to get these Hashtags in a new column and remove all text around. Is there a simple way to do it. I tried different nodes but so far wasn’t successful in my approach. As an example below my data (I only want to get marked words):

Thank you in advance

Regards
hm1995

Hello @hm1995,

welcome to KNIME Community!

Usually (in simple cases) to get a substring from a string one can use substr() function from String Manipulation node. However that won’t work in your case. So I share with you workflow from KNIME Hub which analyzes Twitter data. Take a look and hopefully you’ll manage to find a way to extract hashtags. If not or have some questions feel free to come back to this topic and I’m sure someone will give you a hand.

Br,
Ivan

2 Likes

Hello Ivan,

thank you for your reply. I tried it out for my workflow but unfortunately it couldn’t extract all the Hashtags from my tweets. Is there any other way to do it?

Thank you for your help.

Regards
hm1995

Hi,

You can do it with the “Regex Extractor” node (Palladian extension)
something like this …
Regex extract.knwf (13.3 KB)

I choose the Rows option for output you can also choose list, columns,…

Regards
Andrej

4 Likes

Hello @hm1995,

You did extract some but not all? There is Row Filter inside Extract Hashtags Metanode that leaves only top 100 hashtag based on count. Maybe that’s the reason?

Another option is regex as @andrejz demonstrated as long as you write it good enough :wink:

Br,
Ivan

Hi Andrej,

thank you it worked quite well in that case :slight_smile:

Regards
hm1995

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.