01_Topic Detection Analysis_Training

This workflow applies the Topic Extractor (Parallel LDA) node to detect 10 topics and describe each one of them with 5 keywords. LDA is a generative probabilistic model considered an unsupervised algorithm that finds out the top n topics, described by the most relevant m keywords. This is implemented in KNIME Analytics Platform through the Topic Extractor (Parallel LDA) node available within the Text Processing extension. LDA represents documents as random mixtures over latent topics, where each topic is characterized by a distribution over words (Blei, Ng and Jordan, 2003). The overall workflow constitutes the training model. In addition to the Topic Extractor (Parallel LDA) node the workflow includes the following steps: importing, cleaning up, and transforming the data.

This is a companion discussion topic for the original entry at https://kni.me/w/aU8cH900Zyq7LOzv