03_Streaming Document Vector Hashing Creation

Hub · August 9, 2020, 11:23pm

Here we execute the workflow in a streming fashion. The aim of this workflow is to create a vector space with the collection of documents being analzsed, bz using the Document Vector Hashing node. The node creates document vectors with a fixed number of dimensions using various hashing methods. This workflow starts reading the data and converts the strings into documents, which are then preprocessed, i.e. filtered and stemmed; all in a streaming fashion. All the preprocessing steps take place in the Streaming Pre-processing component. Then a bag of word is created and finally the documents are transformed into numerical/binary document vectors with the Document vector hashin node. The all workflow is executed in a streaming fashion.

This is a companion discussion topic for the original entry at https://kni.me/w/q-uNFTeuYvUP-c9r