I would like to translate one or more chinese (Patent) document(s) to preferably english using for example the amazon (or google) translate node. Therefore, I tried to input some documents using the PDF parser. Settings were: Stanford NLP ChineseTokenizer and Charset = ISO-2022-CN.
However, the node created a table, displaying the path of the input documents in the “Document” column, but no text appeared.
My question is: What do I have to do to “feed” the translation node properly proceeding from some PDF documents in chinese language in a certain folder ?
I assume you checked the “Use file path as title” box in configuration dialog of PDF Parser node. That is why the file path is included. If you want to make use of the text (which is stored within the document column) you can use the Doctument Data Extractor, or use the Text Processing nodes which can work with Document Columns.
thanks for your reply. I tried unchecking the “Use file path” box, but unfortunately, the picture did’t change: The path is still used as title. I also tried to extract the body text from the documents using the document data extractor, but I wasn’ able to see anything like chinese text. It looks to me as if the node wouldn’t work with chinese…
I’m also not sure what would be to do if the parsing worked (which intermediate steps between the parser and the translation node) ?
The patent pdf document is completely (besides of sone numbers) in chinese characters. Are my settings (Stanford NLP ChineseTokenizer and charset =Iso-2022) right ?
Is there any example for a workflow which read a chinese pdf document and translates it using amazon translate ?
Sorry for asking so much questions, but at the moment, I’ a little bit helpless……
thanks for your reply and the examples. Since I’m not so familiar with flow variables up to now, I think it will be a good idea to got through hte book “Advances Luck” for first