Add column with filename to final table

Hi,

I've following workspace:

"PDF Parser" --> "Document Data Extractor" --> "Java Snippet (simple)" [I extract the filename of the pdf-document" --> "Column Filter" --> Some Preprocessing Nodes --> "Dict Replacer" --> "Keygraph keyword extractor" --> "CSV Writer"

My problem is that the Preprocessing Nodes "delete" the document-name (also some of them only allow one column as input). The result of the "Keygraph keyword extractor" node is follwing table:

RowID Keyword Score Document
0 Keyword1 100 ""
1 Keyword2 80 ""
2 Keyword3 60 ""

Is there any possibility to add a column to the table which contains the extracted file-name? (I extracted this file-name before in the Java Snippet node) The number of keywords may vary, so an adjustable solution would be cool.

Many thanks in advance!

 

Best

Simon

Hi Simon,

that is a know issue and will change with the preprocessing - texprocessing nodesthat will be released with 3.1. The new nodes will not swallow additional columns.

For now you need a workaround. Basically you need the Joiner node to join the data back to the output table of the Keygraph node. You can try to join by document, if this does not work try to extarct the title of the document with the Document Data Extractor and join by that title column.

Cheers, Kilian

Hi Kilian,

it works with the "Document Data Extractor"!
Many thanks for your support!

Best
Simon