Parsing 10-K´s

Hi everyone,

I have a question about the bag of words creator. I’m trying to count the occurence of specific words in american 10K files for multiple documents. However, the bag of words creator takes all words into account.


Using a filter after the BOWC or after the TF node does not work.
How could I achieve that?

Could you share your existing workflow? That would make it easier for someone to help. Forgetting KNIME for the moment, what/how would you like to filter?

KNIME_project.knwf (26.7 KB)
I would like to count only word like “AI” “blockchain” etc.

The data is stored on your computer and consequently not available in the workflow. Create a folder named “data” in your workflow and store the data there.

KNIME_project.knwf (1.3 MB)
hope this works

Got it. I’m working on something. I’m not a text analysis expert, but I think I may have something useful. I’ll get back to you.

Try this. The component has the following tables:
Term count for a selected term
TF for a selected term
IDF for a selected term
TF for all terms
IDF for all terms

Sample output:

2 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.