Remove punctuation and uppercase

text-processing
#1

I tried to use this framework to change uppercase to lowercase using command, upperCase(replace(column(variable(“currentColumnName”))," “,”")) in column expression node,

and remove punctuation, but some how it does not appear. How to save the result into csv file. And it is possible to remove emoticon using talend? Tq!

0 Likes

#2

Hi,
if you want to change upper case to lower case, you have to use the lowerCase function in your expression. For the punctuation, can you try using the replaceChars function? It should work for your use case:

lowerCase(replaceChars(variable(“currentColumnName”), "/[.,\/#!$%\^&\*;:{}=\-_`~()]", ""))

(I got the punctuation characters from here, feel free to modify it for your needs).

I am not sure what you mean here:

remove emoticon using talend

but the above replacement should take care of most emoticons. If you do not want to lose other punctuation as well, it will be more difficult. You might want to use one of the regular expressions discussed here on stackoverflow.com.
Kind regards
Alexander

1 Like