** Feature Request. Addition of text processing packages for Japanese + Korean.

umutcankurt · December 31, 2023, 1:02pm

I am requesting help on this issue as we are working with data in many different languages, in all languages. I also think that many people will definitely need it in the future.

If you currently have an alternative solution suggestion, please share it.

Thanks.

umutcankurt · December 31, 2023, 5:27pm

https://jprocessing.readthedocs.io/en/latest/

Artem · January 2, 2024, 7:52am

Hello @umutcankurt

As an alternative you can use Redfield NLP nodes that run Spacy under the hood. Spacy is a Python NLP library that supports multiple languages, including Korean and Japanese.

umutcankurt · January 2, 2024, 11:05am

Hi; @Artem
Thanks for the answer, I don’t know how to do it but I will try. If you have an example to share, please share it.

My goal is to categorize the texts in the data set according to the reference word and code list.

Actually, I have a working solution and it works very well, you can find it in the attached link. However, I cannot proceed due to the language problem and I am looking for a solution to solve this problem.

Artem · January 2, 2024, 11:15am

I briefly looked at this thread. And I believe you can just include Redfield NLP nodes in your workflow and use them together with Text Processing nodes. So I guess you can replace the nodes that do the tokenization, lemmatization and different types of tagging with the new ones.

Just to emphasize: Redfield NLP node are fully compatible with Text Processing nodes and they also operate on or produce the Document data type.

Here is an example that shows how these nodes work.

system · April 1, 2024, 11:16am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.