When I run my documents through BoW, it processes the titles. As I am analyzing emails, this is rather inappropriate.
Is there a way to prevent this other than just assigning "title" to a meaningless column and filtering it with Regex later?
When I run my documents through BoW, it processes the titles. As I am analyzing emails, this is rather inappropriate.
Is there a way to prevent this other than just assigning "title" to a meaningless column and filtering it with Regex later?
Words in the title are processed as well. This can not be avoided. You could use a string column containing e.g. numbers as title column for the Strings to document node and filter these numbers out later on with the number filter.