Is it possible and how to use an own dictionary of "multi terms" to keep their original components avoiding so their splitting? E.g. CHI SQUARED -> CHI_SQUARED



You can try to use the Reference Row Splitter node to first split the rows that occur in your dictionary from the data, apply your transformations on the rest of the data, and then join them with the unchanged rows.



you can use the Wilcard tagger and enable the setting "Multi word/term". The nodes than tries to find matches across terms on the whole sentences  of documents. If one or multiple terms match the match will be tagged as one term. This allows to you change granularity and tag e.g. "Text Mining" as one term. The node take as second input a dictionary that can contain regexes and wildcards.

I have solved this matter of combinations of words (till 7 words) by tagging the combinations of words starting with the longest combination till the single word. With every step the text is checked on the combination of words n. If the combination exists it is filtered out. The remaining text is filtered on the combination n-1, the combination n-1 is filtered etc. In this way it is secured you can recognize all combination of words.

