Altering contractions Using String Manipulator or Regex

I’d like to alter contractions in a group of Tweets using either String Manipulator node or Regex Filter. Regardless of what I try, it seems I can not consistently remove the same contraction. For example, I’d like to change “I’m” to “I am” or remove the apostrophe all together. I have manage to make both Regex and String Manipulator work but only partially. They will either remove the first instance they find and skip the remaining or split the “I” and “m” leaving me with “I am’Iam”. I believe the issue is with the apostrophe but, like the contraction, can’t seem to consistently remove apostrophes. Any suggestions would be welcomed.

Here’s example for search:

Removing the apostrophe is easy but here is my suggestion to transform contractions to full form:

replace.knwf (30.4 KB)

In this workflow I have a “Table Creator” node containing a single row as an example.
First I converted the all letters to lowercase. Then in a recursive loop and by using an example dictionary (which I produced by using another table creator node) and the “replace” function in the “String Manipulation” node, I replaced the constractions with the full form. (The maximal number of iterations in the loop end must be the number of rows in your dictionary)
To create a complete dictionary you can use this list at Wikipedia.


1 Like

Appreciate izaychik63 and Armin for the quick reply. Turns out the root problem is there is specific text font that was not registering with String Manipulator or RegEx Filters, more specifically not identifying apostrophes. For example “it’s” was recognized and the apostrophe removed but many of the contractions used an apostrophe such as “it’s” which would not register with any of KNIME’s nodes. I fixed the problem by searching and removing either type of apostrophe.