Association Rule Learner - Input Dataset

Brock_Tibert · December 11, 2024, 7:39pm

Can someone point me to a dataset structure that will be able to act as the input to the Association Rule Learner extension? The documentation says Transaction List but I can’t find a Node for this type.

Thanks in advance!

mlauber71 · December 11, 2024, 8:26pm

@Brock_Tibert the rule learner will require types of lists or grouped variables. You and find some examples here. The use is not always straightforward.

Also from the description of the Association Rule Learner – KNIME Community Hub

“The underlying data structure used by the algorithm can be either an ARRAY or a TIDList.‘

Brock_Tibert · December 11, 2024, 9:20pm

Thank you for the fast response! I am completely new to KNIME and trying to determine if it fits for an MBA class I run. I mention this only because I wasn’t aware I could search for the extensions via the link you sent, which also provided links to related workflows. I will poke around and ask if I have any other questions.

mlauber71 · December 11, 2024, 9:34pm

@Brock_Tibert great you are exploring knime.

You can search the knime hub for nodes as well as sample workflows and also use filters and tags. Workflows might have additional links and information under “external resources” that might be worth exploring.

And you can always ask questions here in the forum. I am sure the Knime community is happy to help with specific questions about technical details but also about the general experience with the use of knime.

About that - maybe the best quick introduction about what knime is comes here.

And then if you want to explore more advanced stuff that also might involve some python use I might humbly point you to my own collection of articles:

nan · December 12, 2024, 8:40am

Hello @Brock_Tibert ,

thank you for the feedback. We heard about similar opinions about partially confusing usability of this node before and plan to improve the node when migrating it to the new user interface. I would be happy to hear about your observations and opinion once you had a look at the example workflows.

Have a nice day,
nan

Brock_Tibert · December 12, 2024, 4:54pm

@mlauber71 Just gave you a follow. I am super impressed after 24 hours of poking around. Very promising, and candidly my only concern if it’s too powerful/feature-rich. I have used Orange for years now because it’s easy to get started and the basics are easy to internalize. Orange has some hard edges, and I feel like this semester I heard about them a lot more which is why I am exploring other “no code” options. It might mean I change how I teach the course, but I am mostly up and running 24 hours later.

@nan The example workflow here was most helpful. It shows a dataset of transaction line items that we have to manipulate for the Node to run over the rule set. It’s not the end of the world, but being new to KNIME I wasn’t even thinking about the power of groupby to make that happen. In short, there is a pathway to make the dataset, but the feedback I have is that it’s not super clear that structure of the input. Maybe it’s overkill, but perhaps a node that can yield a dataset in the format(s) expected for the Node? Power users of the tool might know to use groupby, but it might be a few extra steps for me to walk my MBA students through the reason why.

Geo · December 13, 2024, 12:44pm

Why not make getting from the raw data to the data structure required for association rule mining part of the learning process?

In the real world, data is hardly ever in a shape that you can use right out of the box for data mining. Plus, many data processing ways lead to a given target data structure.

A dataset for association rule learning is organised as follows:

rows = transactions
columns = products/items to be analyzed for all the transactions, being true/1 or false/0 depending on whether they were part of the transaction or not

So whether or not you need GroupBy depends on whether or not your transaction data is already in that shape. Sometimes data is one row per transaction ID and per product ID - in which case you will need GroupBy or Pivot. Plus, you might need Missing Value node to recode missing values to false or 0.

If the data are already in the expected shape, you can use the Create Bit Vector node to pack the item columns into a single bit vector column that’s going to be used for association rule mining.
The bit vector also has a visual advantage: once you see the bit vector column per transaction, it is quite easy to identify similar transactions at a glance only by simply looking at the dataset. You can even sort the transactions to make it even more obvious.

Brock_Tibert · December 14, 2024, 4:29pm

It’s a fair question, but this is for an MBA class that by design, is not a programming class nor is it a course that aims to sharpen the skills for “Data scientists.” The course targets students nearing their degree’s end who want to understand analytics from a managerial and strategic lens. Orange has made this approachable and very intuitive, but has recently started to show some hard edges as students aspire to tackle harder problems.

In short, I think KNIME is fantastic, but it would be one component of the class used at specific points in the semester. As such, spending considerable amounts of time to teach the software and specific data analysis patterns is outside the scope of this offering. I would likely have to give them a partially completed workflow, which is not the end of the world.

As I have noted in other posts, I am still trying to wrap my head around what is possible with KNIME. I will check out the Bit Vector node. Thanks for the suggestion.

Geo · December 14, 2024, 5:37pm

I think that’s a reasonable way forward.

Executive profile or not, showing that there is a (hidden - see my next paragraph) data processing layer out there in the real world is a valuable lesson for any student who gets in touch with analytics or modeling.

Btw once you’ve created the clean student-ready workflow, you can select the data processing nodes altogether and wrap them into a Metanode or Component. This way the students won’t see the details in the data processing layer.

For further details on the possibilities, I suggest you check out, if you haven’t already, the two free books “Beginner’s Luck” and “Advanced Luck”. They are well written with screenshots and easy to follow instructions.

mlauber71 · December 14, 2024, 6:11pm

@Brock_Tibert you might want to think about using Components. They allow to collect (and to degree hide) more complex operations but they also allow for user input. And more recently they can function as report generators and export content as PDF or html.

Most important thing to keep in mind: if you use flow variables you have to specifically allow them to be used inside a component. When you want to try a specific task just browse the hub or just ask in the forum. More often than not someone will have an example or can easily build one.

Brock_Tibert · December 14, 2024, 6:39pm

Thank you for such a detailed response and follow-up! I have seen references to Metanode and Components, I just haven’t had time to wrap my head around everything just yet, including the materials and resources to help myself (and the students) learn.

Some students will gravitate to the visual programming experience, but the core goal of using a tool like KNIME is to demonstrate “analytics” in action above and beyond the cases and class discussions. The more I can simplify for them, the better.

Thanks again for the references, I will take a look!

Geo · December 15, 2024, 12:01pm

So if you manage to design the data processing yourself with KNIME’s nodes and then wrap it into a component (which makes a group of nodes behave like a single composite node), you can then distribute that component node to your students for them to work with it. You even get to name it

Brock_Tibert · December 17, 2024, 5:27pm

I can work with that, for sure.

system · December 24, 2024, 5:28pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.