I’m trying to analyze resumes and job descriptions to create tag clouds and do similarity matching. The keywords in this domain are technical skills and specialized fields of study. I’ve been able to tag documents with a custom dictionary of terms and Dictionary Tagger (NE / UNKNOWN) and these appear highlighted in a document viewer. When I move to subsequent nodes like Tag Cloud, the output of the Tag Cloud never represents my custom dictionary it just shows general terms. Is this an use case where I need a custom tag set?
Is there a different way I should configure the Tag Cloud or is the a different keyword approach I should use to visualize these specialized skills?
your approach seems to be fine so far. The Tag Cloud node will show all terms though. If you want to see only the terms that you have tagged previously, you could either use the Tag Filter node to keep only terms with the NE / UNKNOWN tag or simply use the Modifiable Term Filter node. Tagger nodes like the Dictionary Tagger set tagged terms to an unmodifiable state. If you use the Modifiable Term Filter now, it will remove all untagged words, so that only the tagged ones remain and you should get the Tag Cloud that you expected.
I hope this helps.
Your method of using the tag filter yield useable results. Thank you.
I ran a model with two different dictionaries. One produced a tag cloud with useful results and a second had some useful terms displayed but many words like “the”, “e”, “be”, and “to”, which were not dictionary terms by themselves. Any idea why this happens?
usually only the words are displayed that are tagged beforehand (if you use the Modifiable Term Filter filter node). Maybe you mixed up the columns in the configuration. Tagger and Preprocessing nodes both have an option to create a new column. It could be that you are trying to visualize the original or a partly processed column.