Solutions to "Just KNIME It!" Challenge 18

This thread is for posting solutions to “Just KNIME It!” Challenge 18. Feel free to link your solution from KNIME Hub as well!

Here is the challenge of the week: Just KNIME It! | KNIME

Have an idea for a challenge? We’d love to hear it! :heart_eyes: Please write it here .

And remember: the more you participate, the more participation badges you may end up getting. Fancy, huh? :wink: Just remember to correctly mark your solution in the Hub with tag justknimeit-18. :grin:

2 Likes

Hello Knimers,

here is my solution:

Have a nice day,
RB

3 Likes

Why do I have a feeling of deja vu for this particular challenge? :sweat_smile: @victor_palacios

4 Likes

知道原因了,因为这类网址在china是封锁的,所以导致下载失败,请不要把文件存在s3里面,或者新增其他下载渠道。
https://knime-hubprod-catalog-service-eu-central-1.s3.eu-central-1.amazonaws.com/archives/00/00/00000003DC8A.knwf?response-content-disposition=attachment%3B%20filename*%3DUTF-8''justknimeit%20-%2018%20-%20Raffaello%20Barri.knwf&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20220526T022155Z&X-Amz-SignedHeaders=host&X-Amz-Expires=600&X-Amz-Credential=AKIAXLA4CVAR6UW4FREN%2F20220526%2Feu-central-1%2Fs3%2Faws4_request&X-Amz-Signature=13017f219faf554bfab56daa7d1e6050bee8f309f254356c1fa0e9fe1c7dbb16

1 Like

Hi Siry, try to write this suggestion in this section of the forum:

immagine

Someone from KNIME will certaily read it!

RB

2 Likes

Hi Knimers,

here is my solution

and some screenshots:
Bildschirmfoto von 2022-05-26 15-07-18

Einen schönen Vatertag :wink:
Andrew

3 Likes

Hello KNIMErs, Here is my solution for Challenge 18

3 Likes

Hello Guys,

Here’s my solution
image

BR,
Ilkka

3 Likes

Hello KNIMErs,

Here is my solution to #justknimeit-18 :

KNIME Hub > gonhaddock > Spaces > Just_KNIME_It > Just KNIME It _ Challenge 018

Despite I’d a deja vu as well, I’ve tried my own approach based in the Regex Match function.

I’m not completely convinced with it, as mistypes can pass undetected through the workflow.

Would you have some advice on how to cover this issue?

BR

2 Likes

Hi @gonhaddock , for the challenge, it’s been mentioned: ‘Also don’t worry about getting 100% accuracy.’ I think the intention behind this particular challenge is to show different methods to achieve a simple task of categorization. Any method including yours will fit into this intention, as long as it uses the categories provided.

4 Likes

Here’s my solution. Uses three nodes by embedding the categories with appropriate widlcards in a Rule Engine node.

REF Challenge 18a
REF Challenge 18.knwf (42.9 KB)

1 Like

Hi,
here my 2 different approaches for this challenge :

2 Likes

Hello @badger101
Thanks for the clarification; I already was aware of that. However in my case, I take the challenge as an opportunity to learn subjects that are out of the scope of my daily duty, I’m not taking it as a goal itself; then I learn from all your different approaches to the same tasks.

Then my question is just because I’m curious on the subject: if is there an preconfigured node/component, or in a more abstract way a method; that can return a king of fuzzy approach on quantifying every possible matching combination? and and then a tolerance cut…

I don’t think that a classical NLP method like Bag of Words fits for this challenge as the approach requires extensive data to be trained…

BR

@gonhaddock I’ve never worked with any supervised ML project, so I can’t answer from experience. I have only worked with unlabeled corpus. If we are to obtain as much matching as possible without human labeling, the misspelled words should be addressed first. Our matching tool is just as good as the dataset. I saw one of the solutions had already addressed this the simple way, which is by using wildcards. That’s one way to do it, but it still requires human intervention. Works like a charm for a small dataset like this, but won’t be the case for large datasets. (Although, if one uses stemming tool which gives a similar effect and doesn’t require human intervention, stemming won’t be accurate for large datasets since there’ll be so many English words that start with the same characters e.g. referral, reference, refill, refund)

3 Likes

@gonhaddock Also: Fun fact, there is a spellchecker node available exclusively via NodePit. Spell Checker (simple) — NodePit

but since it’s an ‘unsigned software’ (whatever that means), I’ll get this notification when trying to install the extension:

securitywarningnodepit

As of this date, if I go to the page and click on the Developer section to reveal the source code, it’s not available. As I sometimes can be a risk-avoiding individual, I’ve never proceeded. I wonder if there’s an active user of that node who could share their experience here.

3 Likes

hi @badger101
Sure, security first.
For the time being, not cool but, maybe the easiest is to aggregate /append the misspellings detected as an additional Table Creator with the correlated Wildcard; as suggested (I’m not saying that I’ll do, as it doesn’t add value to the approach).

Test the similarities could be another option :wink: but I’d need the full understanding of it or full develop a kind of fuzzy probability check…
Thanks for your time.

2 Likes

@gonhaddock Addressing the misspelled terms can also be done in various other ways. Advanced KNIME users like you might want to check out this thread and somehow find a way to integrate it into KNIME by creating a new component. It’s a possibility.

2 Likes

Hehehehe the differences can be subtle sometimes, but they’re still there! :grin: :grin:

2 Likes

Hi everyone,
here is my solution.
Probably low perfomance on large datasets.
Not used engine rule.

1 Like

Hi here is my solution. I used some nodes that are new for me and it results in tags.

I would not be happy if I would need to use these tags as input for any following workflows so I’m happy to see all your solutions have a different approach.

1 Like