Number filter node text processing

Hello everyone,

Seems that in my workflow doesn't work this node. There are some numbers terms in the input document, and when I see the output of the node, still there! 

In fact, when apply the Bag of Words node, without any POS tagger node before (ok, I know, there will not be any tag associated to any word), but the problem is that the BoW does some kind of filter, and deletes some of my terms! :S

Is this normal? I mean, I must do something wrong, but I don't know which!

Can anyone help me?

Have a nice day!

Hi enribueno,

I am not quite sure if I understand your problem correctly. You are having number in your input document, then you are creating a bag of words (without POS tagging before) and then you are counting (TF) but not all numbers that you expect show up the bow. Is that what you mean?

Maybe this is due to tokenization. Can you share a small example workflow and data set to show the problem. That would make it much easier for me to see what exactly is the problem.

Cheers, Kilian

Thank u kilian.thiel but  solve this problem! :)

Now I have another one haha.. when I want to extract the RSS info, I do the same as I've seen in other examples:

"Table Creator"node (e.g http://rss.nytimes.com/services/xml/rss/nyt/Politics.xml) >> "HTTP Retriever" >> "Feed Parse" node >> "String to Document" node --------> that runs great!

 

But when I want to use a filter in the RSS (using for example tools as feedrinse o feedsifter), and put the URL in the "Table Creator node" cell , I cannot obtain a list of document (to appy a text mining)

e.g. I used the filter "Obama" in the previous RSS (thanks to feedsifter tool) which show you below:

http://feedsifter.com/ f=http%3A%2F%2Frss.nytimes.com%2Fservices%2Fxml%2Frss%2Fnyt%2FPolitics.xml&Obama

I don't know which step I have to do in order to fix it :S

Could you help me?

Thanks in advance. Have a nice day!