TF-IDF top 10 terms by document, visualisation

MvBreemen · June 1, 2020, 4:16pm

I have a BOW with 12 documents, approx 3000 terms per doc, total 36000 terms. For each term I have calculated the TFIDF. Now I want to filter, for each document, the top 20 TFIDF score terms. Then I would like to visualise these in a bar chart. I should end up with 12 bar charts, one for each document, each bar chart showing the TFIDF for the top 20 words. I have seen many examples just filtering the top X words ACROSS all documents, but what is the point of a TF_IDF if you cannot compare across documents?

Anybody any advice?

Jeany · June 3, 2020, 3:59pm

Hi,

Thank you for your question. I put together a quick and dirty workflow (IDF_perdoc.knwf (39.1 KB) ) that you can hopefully easily adjust to your needs. The trick is to use a group loop start to loop over all documents, filter for each doc the top 20 words, display those in a bar chart and save the chart in a table using the image to table node. In the results of the loop end node you can find all collected barcharts for each document.

Hope that helps,
Jeany

MvBreemen · June 5, 2020, 10:33am

Hi Jeany,
How do I get the datafile that is the start of your workflow?

Jeany · June 5, 2020, 10:52am

Hi,

The data is automatically extracted using a shared component (https://kni.me/c/m7PihSx4L8-tdftr) that gathers data from PubMed, in this case by searching for the word “KNIME”. You can simply execute it and get the data. If you want to adjust the search you can do so via the configuration of the shared component.

Let me know if you run into problems,
Jeany

system · December 4, 2020, 11:04pm

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.