How to translate Chinese PDF documents to English ?

michael19602016 · January 16, 2020, 8:52pm

Dear All,

I would like to translate one or more chinese (Patent) document(s) to preferably english using for example the amazon (or google) translate node. Therefore, I tried to input some documents using the PDF parser. Settings were: Stanford NLP ChineseTokenizer and Charset = ISO-2022-CN.

However, the node created a table, displaying the path of the input documents in the “Document” column, but no text appeared.

My question is: What do I have to do to “feed” the translation node properly proceeding from some PDF documents in chinese language in a certain folder ?

Kind regards

Michael

marten_kose · January 20, 2020, 1:34pm

I assume you checked the “Use file path as title” box in configuration dialog of PDF Parser node. That is why the file path is included. If you want to make use of the text (which is stored within the document column) you can use the Doctument Data Extractor, or use the Text Processing nodes which can work with Document Columns.

michael19602016 · January 20, 2020, 5:44pm

Hi Marten,

thanks for your reply. I tried unchecking the “Use file path” box, but unfortunately, the picture did’t change: The path is still used as title. I also tried to extract the body text from the documents using the document data extractor, but I wasn’ able to see anything like chinese text. It looks to me as if the node wouldn’t work with chinese…

I’m also not sure what would be to do if the parsing worked (which intermediate steps between the parser and the translation node) ?

The patent pdf document is completely (besides of sone numbers) in chinese characters. Are my settings (Stanford NLP ChineseTokenizer and charset =Iso-2022) right ?

Is there any example for a workflow which read a chinese pdf document and translates it using amazon translate ?

Sorry for asking so much questions, but at the moment, I’ a little bit helpless……

Kind regards

Michael

marten_kose · January 21, 2020, 12:41pm

Hi @michael19602016,

I’ve created a simple workflow, parsing a pdf with some Chinese text: https://kni.me/w/nMzlLB5jqXX5DQ1I

If you check hub.knime.com for Amazon Translate (https://kni.me/n/hf2q-2A8J1mkPyFs), you’ll find two example workflows using this node.

You can use the example workflows as reference for building your own workflow.

Best,
Marten

michael19602016 · January 26, 2020, 8:17pm

Hi Marten,

thanks for your reply and the examples. Since I’m not so familiar with flow variables up to now, I think it will be a good idea to got through hte book “Advances Luck” for first

Kind regards

Michael

system · June 2, 2023, 9:42pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.