Custom Dictionary and Compound Terms

Hello All,

I am having an issue with tagging compound terms from a custom dictionary.  I have a dictionary of terms I have created from an index to a list of documents.  I then want to scan and tag the indexes with NE tags.

In my dictionary I may have a term like "supreme marvelous hero of the universe" tagged as a Person.  However, when I create the bag of words, the term is not preserved.  Instead I will see:

[supreme marvelous hero(PERSON)]




Is there a way to preserve the coumpound term as a tagged entity in the BOW?



Hi Mark,

i guess you have many compound terms or multi words in your dictionary and some of them may be conflicting, meaning you have:

"supreme marvelous hero of the universe"

and also have

"supreme marvelous hero"



in the dictionary. If this is the case the first but also the next two can and will be tagged by the Dictionary Tagger. The order of the terms in the dictionary is important for the tagger, since it tries to match the dictionary entries top down. In the described case, the tagger would first combine "supreme marvelous hero of the universe" to one term and tag it, but the it will split the term since the next two entries of the dictionary are matching as well.

Attached you find an example workflow, showing two different tagging results based on the order of the dictionary entries.

If you want to keep the longest sequences of words sort the dictionary by the length of the strings.

Cheers, Kilian

Thank you, Killian!  I will put that into practice and review the dictionaries for consistency.  That is a wonderful piece of insight though, and I will debug the process acordingly.

Thank you again for your time and insight!