This thread is for posting solutions to “Just KNIME It!” Challenge 38. This week we’ll dive into text processing in order to extract the context surrounding a target word.
I moved back to some R scripting again, looking for the flexibility of ‘ggwordcloud’ chart. The R branch within the component can be deleted as well (if you aren’t a R fan), and the analysis will keep operative returning a matrix of preceding and trailing words, available in component’s output port.
As the testing dataset is very sort, I moved into simple bag of words analysis; then abandon the initial momentum on performing Trigram analysis and so on.
Hello KNIMErs,
This is my shared component in Hub. Named ‘Word Windows’. Some improvements from my last post have been deployed (define N neighborhood window, non case sensitive for target word input, and word-cloud assigned to port as an output image)
@kwatari I didn’t realised either on the ‘Term Neighborhood Extractor’ node availability; as I did not explore any other solution in advance… I built the workflow from scratch based in regex detectors.
Thanks to @victor_palacios for proposing these interesting challenges. I’ve just learnt now on how to share a component