Filtering out Chinese, Korean, Japanese, Thai characters with Python Script Node or Rule Engine using regular expression

Hi all, I was trying to filter data records that contain any Asian characters and later on use a translator on them…I tried with Python Script with exact same code as what I do in Spyder, which works in Spyder but not in the Python Script in KNIME.


here is the code:
output_table_1 = input_table_1.copy()
def contain_chinese(check_str):
for ch in check_str:
if ‘\u4e00’ <= ch <= ‘\u9fa5’:
return True
return False

def contain_korean(check_str):
for ch in check_str:
if ‘\uac00’ <= ch <= ‘\ud7a3’:
return True
return False

def contain_japanese(check_str):
for ch in check_str:
if ‘\u0800’ <= ch <= ‘\u4e00’:
return True
return False

output_table_1 = output_table_1[output_table_1[‘Customer Name’].apply(lambda x : contain_chinese(x))]

And I also tried using regular expression but it doesn’t seems to work well neither…
here (example for Japanese), it would give me results that totally does not contain Japanese characters: ^.([\u0800-\u4e00]).

Therefore I end by with filtering out all latin characters with this Rule Engine Row Filter but it would still give me latin characters
here is what I put and i chose exclude TRUE matches: $Customer Name$ MATCHES “^[a-zA-Z0-9_?@&.,,~()()^:;/=+~'’ àáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿ-]+” => TRUE

If anyone can give some advice, it would be really appreciated!
Thank you in advance!

@xli you might want to check if you will have to double escape the Unicode codes.

There are several other threads and examples about RegEx that might help.

hi thanks for your reply.
I tried double escape but it still doesnt work. It’s not like it doesnt work at all, with the rule engine row filter, I did filtered some records with Asian form, but it would also give records that are all in English.

hi thanks for your reply.
I tried double escape but it still doesnt work. It’s not like it doesnt work at all, with the rule engine row filter, I did filtered some records with Asian form, but it would also give records that are all in English.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.