DL4J Text mining: Cannot reproduce results from word verctor learner (KNIME 3.3) in DOC2VEC learner (KNIME 3.4)

Dear KNIME DL4J people,

I haven't found anything useful to this topic on the internet so far, so I dare to open a new thread.

Recently, I upgraded my 3.3.2 KNIME installation on my office machine to 3.4 . Unfortunately, I lost all old (deprecated) nodes, so my text mining workflow (very similar to the example knime://EXAMPLES/04_Analytics/14_Deep_Learning/07_Simple_Document_Classification_Using_Word_Vectors, but with my owndata) failed.

I tried to restore the workflow using DOC2VEC Learner and the new Vocabulary Extractor node instead of the old nodes + filter. I tried both algorithms and a separate preprocessing pipeline but could not reproduce my old ROC/AUC from about 0.9. I reached about 0.5 instead (wwhich seems a very significan difference).

How should the pipeline and the parameters not included in the 3.3 learner version look like that I can reproduce the results?

Additionally, the new results look more smilar to what I get if I split my data into training and test first and use the Word Vector Learner only to the training set while using Word Vector Apply to the test set (in the example workflow, the Word Vector Learner was applied prior to splitting)

I darely hope for helpful hints!

Kind regards, Anna


Hi Anna,

your problem most likely results from changed default parameters between the deprecated and the new Doc2Vec Learner node. I just answered a very similar question in the forum (https://www.knime.com/forum/knime-textprocessing/difference-between-deprecated-and-current-doc2vec-learner-and-vocabulary#comment-27237) . You could have a look there and set the parameters accordingly.

I hope that helps.