Hi there,
I need help determining the best solution for a problem I am running into.
I’m working with two tables - inputs and activities
activities contains summarized data by activity_id and is the most accurate of the two tables.
inputs contains detailed data by activity_id including individual inputs that make up a certain activity.
However, the inputs table has extraneous information that needs to be excluded.
In other words - the sum of the total_cost from inputs does not equal the sum of the total_cost from activities (group-by activity_id)
There is no easy way to determine whether or not an input should be removed so the best solution I can think of is to use the summarized data from the activities table to determine whether or not to exclude an input.
I’ve attached a workflow for reference (with already grouped and joined data from inputs and activities)
Activity-Inputs.knwf (745.6 KB)
I don’t think this is the best solution because it doesn’t provide the desired results.
Here are a few scenarios to consider…
Only one input from a group of activity_ids needs removal
Examples: 4146200
Successfully identified which input to remove based on the logic entered in the columns expressions node. However, this will require me to re-run the same nodes and do a second review.
First review:
Second review:
More than 2 inputs wrongly identified for removal
Examples: 4481917, 4456865
The current logic in the column expressions node marked 2 inputs to remove instead of 1. I’m aware that it will be near impossible to determine exactly which of the two inputs needs to be removed, but I will be satisfied if the workflow removes at least one to match up to the summarized data.
More than 2 inputs are needed to be removed but current logic can’t figure it out
Examples: 4432176, 4432177
Lines highlighted red need to be removed but I don’t know how to do this.
Any insight and advice is greatly appreciated. Thank you.