Filter the results of Ngram creator node

am_dbs · December 1, 2015, 3:06pm

Hi,

I have following workflow:

PDF Parser --> Preprocessing Nodes --> NGram creater --> Document Data Extractor --> Java Snippet (simple) [I use this to extract the File_name of the different Documents] --> Column filter

The result of the Column filter node is:

Row ID	Ngram	Document frequency	File_Name
0	word1 word2 word3	1	EP0000001A1
1	word4 word5 word6	4	EP0000002A1
2	word4 word5 word6	2	EP0000002A2

I want to filter the table above with another Table (TABLE1) which contains different words.

TABLE1:

word1

word2

The result should be:

Row ID	Ngram	Decument frquency	File_Name
1	word4 word5 word6	4	EP0000002A1
2	word4 word5 word6	2	EP0000002A2

So I tried to add the following nodes to the workflow above:

[...] Column filter --> Strings to Document --> Dictionary tagger + TABLE1(see above) --> General Tag Filter--> Reference Row Filter + Table (with space "")

The result of the Reference Row Filter is a filtered table with a missing column "File_Name":

Row ID	Ngram	Document frequency
1	word4 word5 word6	4
2	word4 word5 word6	2

Is there any possibility to add such a column? Maybe another way to filter those rows?! The Document Data Extractor doesn't work, I guess thats because of the "Stings to Document" node.

Many thanks in advance!

Best
Simon

kilian.thiel · December 2, 2015, 10:30am

Hi Simon,

what about joining the file name from the first table to the last table by RowID?

I also attached an example workflow, showing two other filtering variations.

1) Create Set out of ngrams, ungroup set, reference row filter, ungrouped table (filter out rows thatare contained in dictionary), group again, compare set lengths (filtered and original), if length are not equal filter

2) Create Set out of ngrams, create set out of dictionary, compare sets with Subset Matcher, reference row filter

Cheers, Kilian

ngram_filtering.zip

am_dbs · December 2, 2015, 5:35pm

Hi Kilian,

many thanks for your answer!

what about joining the file name from the first table to the last table by RowID?

It works! It's that simple! ;-)

Many thanks!

Best
Simon

system · June 2, 2023, 9:48pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.