Here we transform the collection of documents into numerical vectors. The dataset used in this example is the KNIME Forum Dataset. After the pre-processing phase, the relative term frequency is computed for each term inside the Transformation component. The input data set is partitioned into training set and test set. The term frequencies from the training set are used to build a vector representation of the distinct terms identified by the BoW with a Document Vector node.The same Document Vector transformation is then applied to the Documents in the test set.
This is a companion discussion topic for the original entry at https://kni.me/w/hOzdDxM3RkKuX1oU