Let me give you an overview on what I am doing, so that you can have better idea and easily guide me to the right direction.
So, I’m using KNIME for extracting information from PDF’s files (mostly these files are Future Trend Reports). I’m using following workflow in order to capture common information out of these files.
The information which I’m getting as a result is as follow:
and here is the excel file of that cloud:
As a result, I am getting common words like “digital”, “data”, “automation” etc etc, I got the common words out of it, but they are not making sense. I need to detect common phrases which has been used in those documents (reports) to have better understanding because as an individual word, it doesn’t make any sense that what does e.g data mean in these documents. Is it like “more data is needed” or “data threat” or something like that, similarly to automation and other words too.
So, I need to detect multi-words/phrases out of these documents (PDF’s report). Would you like to tell me how is that possible as I haven’t found any example on KNIME platform.