Deep learning + text processing question (word2vec + CNN)

Dear KNIME wizards

I am trying to generate text classification using deep learning. I am able to train a word2vec (or doc2vec) model (using Word Vector Learner node) and learn/make predictions with traditional data mining classifiers like decision trees/ naive bayes/ SVM.

I am getting stuck when I want to use a Convolutional Deep Learning network for classification. It looks like the Convolutional deep learning in KNIME was designed only for images (it has image specific parameters that must be defined) and don’t take output from word2vec.

Does anybody have an idea on how to use a Convolutional deep learning for text classification in KNIME? I attach a workflow with an example for your tweaking. I would appreciate an example workflow, if anybody has it.

Thanks!! 

double post please answer here https://tech.knime.org/forum/knime-labs-general/deep-learning-text-processing-question-word2vec-cnn

Hi,

I am trying to classify sentiments using the WordVectorLearner in Knime and I have taken help from the Example: Sentimen Classification Using Word Vector ( source : https://www.knime.org/nodeguide/analytics/deep-learning/sentiment-classification-using-word-vectors) 

I don't understand why the vocabulary includes labels "DOC_1", DOC_2" etc as words in my case while in the example nothing like this happens. Why would it include the labels of the documents in vocabulary? Please help me I am kind of stuff and this is ver disturbing. I am unable to attach the file because of it's big size.

Thanks.

Hi Aleenah,

in the mentioned example we are using the Word Vector Learner in order to learn a vector for the whole Document and not for singe words only. In KNIME there are two methods available: Word2Vec - learns vector representations of single words, and Doc2Vec - learns vector representations of single words AND labels. Currently, the Vocabulary Extractor will extract both labels and words depending on the model that was trained and put them into a singe table (we realized that this is not very intuitive (thanks for your feedback!) so this will be changed in the upcoming release). Summarizing, if you train a Doc2Vec model the vocabulary will also include vectors of labels additionally to the word vectors.

I hope this answers your question.

Cheers

David