I'm trying to configure a Python node to run the Phrases module from gensim (replaces frequent collocated tokens with a single 'bigram' token, e.g. 'new', 'york' becomes 'new_york'), but I'm running into issues as gensim runs mainly using list of lists, and cannot get the output as Document DataFrame for further text processing.
I can get the list of lists (tokenized string) processed correctly (tokens vs. tokens_bigram), but not sure how to transform it into a DataFrame as output of the node so that other text processing nodes can read it. See script attached.
Anyone with experience using KNIME and gensim for text preprocessing?