Document type

Hi to all

I am new to text processing, but as far as I could see, it seems that document-modifying nodes (tagger or filters) don't allow to bring over other columns (e.g. class strings). If I got it correctly, the only way is to inject some string in document cell (e.g. document type) in order to be able to reexctract information at the end of tagging/filtering node sequel. As far as I see the example of clustering clearly shows with concatenate two different tables one can eventually differentiate document. Is there any simple way to inject any string or data in document? How can one modify  document type, for instance?



Hi Andrea,

you could e.g. assign a categorie to the document when creating it with a Parser node (have a look in the parsers configure-dialog). Later on in the workflow when you want to distinguish between documents you can extract the categoie with the "Document Data Extractor" as StringCell. Is this what you are looking for ?

Best, Kilian

Hi Kilian

indeed it works. What I wanted to do is to propagate a string variable $query$, which I used to search pubmed with grabber nested in a loop of $query$, in order to analysis data according to it.

I wanted to inject this variable (custom tag?) into a document coming from a source. I will have to save them and rerun the loop with parser then. I wonder what happens after parsing, when I need to generate BoW or tagging. it is not trivial to pass over this information if it is not part of a document, correct?



This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.