Matching Test and Control Stores

Hello,

I am unable to share my data, but I will do my best to give a detailed explanation of my situation.

I have two sets of data: test and control stores. Each dataset has the following columns:

  1. Store number (str)
  2. Total sales $ (dbl)
  3. Total units sold (dbl)
  4. Demographic code (int): this is a nominal (categorical) variable.

I need to match control stores to the test stores by using columns 2-4 as measures to match the stores.

On Alteryx, I used the AB Controls node to do this, but I’m not sure if there’s an equivalent to that on Knime. The closest thing to it that I could find was the K Nearest Neighbor node, but I couldn’t replicate my results which is my primary goal.

I’m thinking that maybe the distance formulas for the K Nearest Neighbor node in Knime might be different from the one used in the AB Controls node in Alteryx, but I’m still relatively new to both programs so I’m running into a brick wall at the moment.

Any help would be appreciated!

1 Like

Hi @daniel_yi,

Have you tried using the Joiner node?

Hey @elsamuel,

Thanks for replying to my question.

My goal isn’t to join the two tables together, but rather to match a store from one dataset to a store in the other based on their similarities in columns 2-4. Therefore, the two tables do not share any common stores numbers.

The reason why I’m doing this is to compare the performance of test stores against the control store that they are directly matched to.

Hi @daniel_yi,
maybe the Similarity Search node is the right one for you? The KNIME Distance Matrix extension provides a couple of helper nodes to create distance information from rows. However, you either have to devise your own distance measure using Aggregated Distance or turn your nominal column into a numeric one and use one of the Numeric Distances node’s options.
Kind regards
Alexander

2 Likes

Hi @AlexanderFillbrunn,

I tried the Similarity Search node before and it actually gave me the exact same results as the K Nearest Neighbor node. After looking into it a bit more, it seems like the only option would be to figure out how Alteryx’s AB Controls node calculates their distance measure and replicate it with the Aggregated Distance Node (maybe?).

It seems like there isn’t an easier solution to this so I guess I’ll close this topic.

Thank you!

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.