i would like to find specified strings in a document by using dictionary tagger.
Input data is an exel file with have at least one column with 500 rows.
In each cell of these rows there are text Information which i would like to analyze.
my workflow looks like:
1st input file: Excel file reader → strings to document
2nd input file: Dictionary file (self created table)
both files connected to dictionary tagger - workflow runs and end with green, but not mentioned Information from 2nd Input file will be found/selected in the other document. The output table has absolutely the same Information as the Input file…
to check if and how a document was tagged you need to use the Document Viewer node and inspect individuals documents. Alternatively you can use a Tag Filer node and filter out all terms except those which have been tagged by the Dictionary Tagger before. Then build a bag of words and see what has been tagged.
To get a table, use the Bag of Words creator. To convert tags and/or terms to strings use the Tags to String or Term to String node. On the bow use the TF node to calculate term frequencies. You can also use the GroupBy node to aggregate over the whole corpus or over all documents.
thx for fast Feedback!
means bag of words excluded previous identified words by dictionary tagger?
Ive checked several settings but ive got not the expexted values.
the dictionary tagger found based on sepaerate tabel word ive maintained in this table.
I see These words by using MWT in the document Viewer.
the target is now to show these words in a sepaerate file (best case xls,csv…)
Make sure that you selected the right Document Column in the Bag of Words creator node. The column containing the tagged and preprocessed documents. All tags will be shown in the bow, if there are any.