Hello, I'm using the Ngram node to pull specific information from a set of PDF Parsed files, and I'm trying to understand the -NGram Creation- Node.
Is there a link that could provide more information about this node and how it works? To pull information from a PDF what do you prefer to use?
Thanks,
Hello mmon9998,
An n-gram is a sequence of n "words" taken from a text.
The NGram Creator node in KNIME Analytics Platform allows you to specify whether to create word or character N-Grams. It even allows to decide whether to create 2-grams, 3-grams, etc. Furthermore, the node allows to specify whether to output a bag of words like structure or a data table containing N-Grams and their frequencies in the corpus and documents.
Regarding the usage of the node, it really depends on the type of information you want to extract and what you want to achieve. On the EXAMPLE Server, under 08_Other_Analytics_Types/01_Text_Processing/07_Sentiment_Classification_with_NGrams you will find an example workflow that shows an application of the NGram Creator node.
A good resource for N-Grams is available here: http://web.mit.edu/6.863/www/fall2012/readings/ngrampages.pdf
Hope that helps,
Best,
Vincenzo