Solutions to "Just KNIME It!" Challenge 37

alinebessa · October 5, 2022, 12:41pm

This thread is for posting solutions to “Just KNIME It!” Challenge 37. This week we’ll be working with text deduplication using PDF parsing techniques!

Here is the challenge: Just KNIME It!

Feel free to link your solution from KNIME Hub as well!

And as always, if you have an idea for a challenge we’d love to hear it! Tell us all about it here.

Season 1 of “Just KNIME It!” is slowly coming to an end: we’ll wrap up on October 26!

lelloba · October 5, 2022, 1:47pm

Hello,

here is my solution:

Two alternatives:

In the first one, I detect the rows containing the incipit of the document and their position. Anything going from the position of the second incipit has to be discarded.
In the second one, I simply use a duplicate row filter to get rid of all duplicate rows.

Have a nice day,
Raffaello

gonhaddock · October 5, 2022, 3:18pm

Hello KNIMErs,

Here is my solution to #justknimeit-37 :

KNIME Hub > gonhaddock > Spaces > Just_KNIME_It > Just KNIME It _ Challenge 037

BR

erik_pinter · October 5, 2022, 9:44pm

Here’s my solution using only 1 node (in addition to the Tika-Reader)

Cheers,
Erik

Adrix · October 5, 2022, 10:46pm

HI All,

I was clueless so based my answer in mix. of the first 2 I saw ( @lelloba and from @gonhaddock

Also came with a doubt that I would appreciate some help :

victor_palacios · October 6, 2022, 2:17pm

Hi @Adrix , Regex Split requires groups using (). You can see my examples of this here:

rfeigel · October 6, 2022, 3:33pm

Here’s my solution

ndwulst · October 6, 2022, 4:26pm

Dear all,

Here’s my solution:

Regards,

AnilKS · October 8, 2022, 12:06pm

My take on challenge 37- text deduplication .

kwatari · October 8, 2022, 12:47pm

Hi all, here is my solution.

cf_123 · October 8, 2022, 4:31pm

Hi,
here my solution: jKi-37 – KNIME Hub

eamendola · October 8, 2022, 8:17pm

Afternoon everyone !
My submission for this rather similar solution to the others !

ersy · October 9, 2022, 1:06pm

Hi everyone,
Here is my solution.

alinebessa · October 11, 2022, 1:25pm

As always on Tuesdays, here’s our solution to last week’s #justknimeit challenge!

See you tomorrow for a new challenge!

system · January 9, 2023, 1:26pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.