Effects of length of sentences and effects of extracting place of sentences on Classification in Knime?

Dear all

How can I test the effects of length of sentences and effects of extracting place of sentences on Classification in Knime?

Hi Negaresma,

I am not quite sure if I understand you correctly. Anyway, here are some thoughts. The length of sentences has no direct effect on the classification of texts (in KNIME). The classification is in the end based on feature vectors (created from words or n grams). The order of the features is not preserved (bag of words). Two documents that contain exactly the same features (let’s assume just words here) in exactly the same number will be classified identically (with the same classifier) no matter if the words in one document are arranged in short sentences and the in the second in long sentences. Also the trained models will be identical.

However, the classification of short texts (single sentences or tweets) can be difficult. This is due to the absence of discriminative features (words, n grams). See also http://tech.knime.org/forum/knime-textprocessing/a-question-about-weightening.

To test the effect of sentence length of classification you need at least two sets of labeled documents. One with short sentences and one with long sentences and train two different models and compare the results.

I hope this helps.

Cheers, Kilian