Hi,
I have following workflow:
PDF Parser --> Preprocessing Nodes --> NGram creater --> Document Data Extractor --> Java Snippet (simple) [I use this to extract the File_name of the different Documents] --> Column filter
The result of the Column filter node is:
Row ID | Ngram | Document frequency | File_Name |
0 | word1 word2 word3 | 1 | EP0000001A1 |
1 | word4 word5 word6 | 4 | EP0000002A1 |
2 | word4 word5 word6 | 2 | EP0000002A2 |
I want to filter the table above with another Table (TABLE1) which contains different words.
TABLE1:
word1 |
word2 |
The result should be:
Row ID | Ngram | Decument frquency | File_Name |
1 | word4 word5 word6 | 4 | EP0000002A1 |
2 | word4 word5 word6 | 2 | EP0000002A2 |
So I tried to add the following nodes to the workflow above:
[...] Column filter --> Strings to Document --> Dictionary tagger + TABLE1(see above) --> General Tag Filter--> Reference Row Filter + Table (with space "")
The result of the Reference Row Filter is a filtered table with a missing column "File_Name":
Row ID | Ngram | Document frequency | |
1 | word4 word5 word6 | 4 | |
2 | word4 word5 word6 | 2 |
Is there any possibility to add such a column? Maybe another way to filter those rows?! The Document Data Extractor doesn't work, I guess thats because of the "Stings to Document" node.
Many thanks in advance!
Best
Simon