I have a set of documents in which there are references presented in both their full titles and as acronyms (ex. Too Much Information vs. TMI). I'd like to replace the full titles with acronyms, to establish consistency and reduce noise in the document-term matrix that I eventually end up working on.
In a previous post on a similar topic it was recommended to convert the documents to a Bag of Words, and then use the Dictionary Replacer or String (RegEx) Replacer nodes (with deep preprocessing enabled) to make the replacements, and then work off the resulting documents after re-grouping them. In my case, since I am dealing with compound terms/titles, the Bag of Words conversion destroys the reference by splitting the title into its individual component terms (ex. Too Much Information --> [Too] [Much] [Information]), which prevents the desired matching and replacement. The String Replacer looks like it can work directly on the document, but there are a fairly large number of these cases, so I'd have to use a long series of String Replacer nodes - which would be cumbersome.
Have I overlooked a node (or option in a node) that will let me deal with this situation?