01_From_Strings_to_Documents

This workflow shows how to convert string-type columns into document-type columns. A document type contains much more than just text. It contains tokenized text and document meta-information such as Title, Author, etc .... Tokenized text means that sentences and words have been identified inside the text. TextProcessing nodes work on document-type columns, but not on simple string-type columns. In order to proceed with any kind of text processing we therefore need to convert the text strings into documents.


This is a companion discussion topic for the original entry at https://kni.me/w/vREIOG05YsOudoPA