This thread is for posting solutions to “Just KNIME It!” Challenge 37. This week we’ll be working with text deduplication using PDF parsing techniques!
Here is the challenge: Just KNIME It!
Feel free to link your solution from KNIME Hub as well!
And as always, if you have an idea for a challenge we’d love to hear it! Tell us all about it here.
Season 1 of “Just KNIME It!” is slowly coming to an end: we’ll wrap up on October 26!
here is my solution:
- In the first one, I detect the rows containing the incipit of the document and their position. Anything going from the position of the second incipit has to be discarded.
- In the second one, I simply use a duplicate row filter to get rid of all duplicate rows.
Have a nice day,
Here is my solution to #justknimeit-37 :
KNIME Hub > gonhaddock > Spaces > Just_KNIME_It > Just KNIME It _ Challenge 037
Here’s my solution using only 1 node (in addition to the Tika-Reader)
I was clueless so based my answer in mix. of the first 2 I saw ( @lelloba and from @gonhaddock
Also came with a doubt that I would appreciate some help :
Hi @Adrix , Regex Split requires groups using (). You can see my examples of this here:
My take on challenge 37- text deduplication .
Hi all, here is my solution.
here my solution: jKi-37 – KNIME Hub
Afternoon everyone !
My submission for this rather similar solution to the others !
Here is my solution.
As always on Tuesdays, here’s our solution to last week’s #justknimeit challenge!
See you tomorrow for a new challenge!