Document classification with neural network

jk1006 · June 20, 2018, 7:31pm

Hi,

I am trying to do a document classification using a neural network. There are 2 categories, let’s say Positive and Negative. I am pleased with my preprocessing steps, the term list looks okay to me. The only neural network that accepts a document vector is the DL4J Feedforward Learner (or did I miss something?).

I have two problems:
My first approach was to create a Bag of Words, then calculate the TF and delete terms that only occure once in all documents. After that I trained my network. But it classified nothing as positive in the test data and hat the same percentages on every tested document (Pos: 10.5%). Did I do something completely wrong here?

Then I thought about working with NGrams. If I choose NGram Frequencies, I lose which term belongs to which document. Can I save that somehow? If I choose NGram Bag of Words, I do not get a frequency of how often my NGrams are in the document Corpus.

Did somebody already did such a classification using a neural network and can show me an example workflow?

Thank you and kind regards,

julian.bunzel · June 26, 2018, 11:31am

Hey @jk1006,

sorry for the late answer.
Did you use the Document Vector node to create a Document Vector?
The RProp MLP Learner/MultiLayerPerceptron Predictor as well as the PNN Learner/PNN Predictor should also work well.

There is a document classification workflow on the example server or here. It is using other ML methods, but the used nodes can easily exchanged with the neural network nodes mentioned above.

Cheers,

Julian

system · June 2, 2023, 9:45pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.