Online Course: [L4-TP] Introduction to Text Processing

Course Focus

This course is about text mining, its theory, concepts, and applications. Specifically, the course focuses on the acquisition, processing and mining of textual data with KNIME Analytics Platform.

You will learn how to use the Text Processing Extension to read textual data into KNIME, enrich it semantically, preprocess it, transform it into numerical data, and extract information and knowledge from it through descriptive analytics (data visualization, clustering) and predictive analytics (regression, classification) methods. The course also covers popular text mining applications including social media analytics, topic detection and sentiment analysis.

Put what you’ve learnt into practice with the hands-on exercises.

Course content

  • Introduction to Text Processing and Importing
  • Text Tokenization and Enrichment
  • Preprocessing
  • Transformation, and Classification Models
  • Visualization, Clustering, and Topic Modeling
  • Movie Forecasting Use Case, Recap and final Q&A

If you are interested in signing up:

1 Like

are the exercise the same as in your online course

Dear Outi,

Exercises are different between the self-paced course and the instructor-led course (the one currently ongoing). I hope this answers your question.


Dear Satoru
Can you share a link to the sentiment word dictionary please? and even demonstrate how to bring it in to Knime? I only found .txt file versions
many thanks

Dear Outi,

You probably have to create a list of positive and negative terms from the .txt version. You can threshold the PosScore or NegScore to choose positive or negative terms, respectively. Some entries may be represented by a single SysetTerm, while others may have multiple terms.

Unfortunately the SentiWordNet has not be incorporated into KNIME yet…


aaa, that’s why you used MPQA-OpinionCorpus ?!
thank you Satoru!

Dear Satoru
my apologies, another question:
I’m going through 11- Visualisation exercise (Solution).
If I want to just gain sentiment tag clouds, I can jump straight from ‘Preprocessing 2 (with Bag of Words)’ to Visualisation (with TF, Document Data Extractor etc)?

When using multiple dictionaries for tagging, what was the trick that ‘newest’ dictionary does not overwrite previous tags?
many thanks again

Dear Outi,

To generate a tag cloud with sentiment information, you need to tag the data with positive and negative sentiment tags. Then you need to perform the preprocessing steps, then generate TF, then you are ready to generate a tag cloud. You can assign different colors based on sentiment tags in the tag cloud.

Unfortunately there is no way to keep tags from the first tagging only. If terms are tagged by multiple tags, then you may need to use Tags to Strings node, for example, and select a particular type of tags you are interested.