Could you help me how to make a list of frequent value in a dataset?

can you tell me something more about the data or maybe can you upload some? Depending on the type we can decide how to proceed.

So I have an example dataset about airline delays. I would like to filter or make a listing from it to identify which airline or airport has the most frequent delayed flight.

I assume 1 row to be 1 flight. Correct?

I’d proceed in this way:

  1. Identify which flights are late. (Late? Y/N)
  2. Use a pivot node. In “Groups” put airline; in “Pivots” the attribute computed at point 1 (late y/n); in “Manual Aggregation” put a count of flight ids
  3. Repeat point 2 for both departure and arrival airports.

This is just for computing absolute values. Do you need a relative view too? (Ex: Easyjet: 70% on time, 30% delayed).


What is the relative view, If I may ask?

Take this example:


If you want to work on absolute vaues, Knime Air has been late on 6 flights.
With relative numbers, Air Alteryx is worse, since it has a 20% rate of lateness.

What you consider “worse” depends on what you want to take into account.
So my question is: do you need to consider just absolute values or you need percentages as well?

