I am trying to do a document classification using a neural network. There are 2 categories, let’s say Positive and Negative. I am pleased with my preprocessing steps, the term list looks okay to me. The only neural network that accepts a document vector is the DL4J Feedforward Learner (or did I miss something?).
I have two problems:
My first approach was to create a Bag of Words, then calculate the TF and delete terms that only occure once in all documents. After that I trained my network. But it classified nothing as positive in the test data and hat the same percentages on every tested document (Pos: 10.5%). Did I do something completely wrong here?
Then I thought about working with NGrams. If I choose NGram Frequencies, I lose which term belongs to which document. Can I save that somehow? If I choose NGram Bag of Words, I do not get a frequency of how often my NGrams are in the document Corpus.
Did somebody already did such a classification using a neural network and can show me an example workflow?
Thank you and kind regards,