Document classification with neural network


I am trying to do a document classification using a neural network. There are 2 categories, let’s say Positive and Negative. I am pleased with my preprocessing steps, the term list looks okay to me. The only neural network that accepts a document vector is the DL4J Feedforward Learner (or did I miss something?).

I have two problems:
My first approach was to create a Bag of Words, then calculate the TF and delete terms that only occure once in all documents. After that I trained my network. But it classified nothing as positive in the test data and hat the same percentages on every tested document (Pos: 10.5%). Did I do something completely wrong here?

Then I thought about working with NGrams. If I choose NGram Frequencies, I lose which term belongs to which document. Can I save that somehow? If I choose NGram Bag of Words, I do not get a frequency of how often my NGrams are in the document Corpus.

Did somebody already did such a classification using a neural network and can show me an example workflow?

Thank you and kind regards,

Hey @jk1006,

sorry for the late answer.
Did you use the Document Vector node to create a Document Vector?
The RProp MLP Learner/MultiLayerPerceptron Predictor as well as the PNN Learner/PNN Predictor should also work well.

There is a document classification workflow on the example server or here. It is using other ML methods, but the used nodes can easily exchanged with the neural network nodes mentioned above.