Finding patterns in csv sheets with KNIME?

I have a csv sheets with different columns and rows. One column (A) has different numbers in it. These numbers depend on the items of the remaining columns. I want to know now for each number of column (A) which items the others columns have and which are decisive so that number in column (A) is picked.

Hi @Jinni

Can you supplement your question with a sample in a workable format of your current input and desired output? It’s a bit difficult to understand what you’re trying to achieve.

1 Like

I cant show you the actual data but I can give you an example. I have csv and in the first column (column A) are some numbers like (1068937). IN the other columns (B-Z) are other numbers or names. In total the csv has 30.000 rows but only 2.000 different numbers in the column A, so some numbers there appear more than one time. If the numbers in the column A are the same, I want to find out what the patterns of the other columns (B-Z) are. So for example, if the number in a row of the first column is 1068937, in the same row in column B must be a number between 5 and 10, in column C must be a name like “headquarters”, the row in column D must be empty and so on. So I want to know which column influences the number in column A and how. Do you know what I mean?

Hi @Jinni

No, not really, unfortunately. And this sounds like something where you will give us some incomplete scenario (a case that does not cover all the cases) where we will spend time and effort to solve and you will then come back saying that you have another sample that does not satisfy what was done and where we might have to start all over again. I’m afraid that this will turn into so many back and forth for clarification.

If you cannot provide the real data, which is understandable because of privacy issues, can you build some fake data to support what you are trying to explain? I know this requires additional work for you, but in the end, you need to help us help you :slight_smile:

We also need to know what the results should look like. I mean, for example, “I want to find out what the patterns of the other columns (B-Z) are” and “I want to know which column influences the number in column A and how”, what does that translate to? How should this be represented in the result?


At first it sounds like a scenario for one or multiple duplicate row checks whereby you keep the duplicate rows and then analyse the results but I can echo Bruno’s message.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.