Solutions to "Just KNIME It!" Challenge 37

This thread is for posting solutions to “Just KNIME It!” Challenge 37. This week we’ll be working with text deduplication using PDF parsing techniques! :open_book: :bookmark_tabs: :books:

Here is the challenge: Just KNIME It!

Feel free to link your solution from KNIME Hub as well!

And as always, if you have an idea for a challenge we’d love to hear it! :heart_eyes: Tell us all about it here.

Season 1 of “Just KNIME It!” is slowly coming to an end: we’ll wrap up on October 26!

Hello,

here is my solution:

Two alternatives:

  1. In the first one, I detect the rows containing the incipit of the document and their position. Anything going from the position of the second incipit has to be discarded.
  2. In the second one, I simply use a duplicate row filter to get rid of all duplicate rows.

Have a nice day,
Raffaello

2 Likes

Hello KNIMErs,

Here is my solution to #justknimeit-37 :

KNIME Hub > gonhaddock > Spaces > Just_KNIME_It > Just KNIME It _ Challenge 037

BR

1 Like

Here’s my solution using only 1 node (in addition to the Tika-Reader) :wink:

Cheers,
Erik

2 Likes

HI All,

I was clueless so based my answer in mix. of the first 2 I saw ( @lelloba and from @gonhaddock

Also came with a doubt that I would appreciate some help :

2 Likes

Hi @Adrix , Regex Split requires groups using (). You can see my examples of this here:

3 Likes

Here’s my solution

1 Like

Dear all,

Here’s my solution:

Regards,

1 Like

My take on challenge 37- text deduplication .

1 Like

Hi all, here is my solution.

1 Like

Hi,
here my solution: jKi-37 – KNIME Hub

1 Like

Afternoon everyone !
My submission for this rather similar solution to the others !

1 Like

Hi everyone,
Here is my solution.

1 Like

As always on Tuesdays, here’s our solution to last week’s #justknimeit challenge! :blush:

See you tomorrow for a new challenge!

1 Like