Reducing Data

I have a data where I have groups of reported salaries. In this example, different companies reporting the individual salary rates of staff. Is there a way to control for situations where one company makes up 25% or more of the reported salaries for all groups? I would like to reduce the influence of that group on the data, while still maintaining their mean salary. I was thinking of a row filter node, but wondered if there would be a different approach, if it were possible.

I’m not sure what you want to do downstream, but if all you want is an unweighted mean salary by company and job description take a look at this simple example using the Groupby node.

I will running statistics on the weighted statistics. In this case, the incumbent (employee) average and other statistics (e.g., 25 percentile).

I uploaded a new workflow under the same name. You can access it with the original link. It calculates statistics by company and title and statistics for all companies by title. This seems like a straightforward solution. Trying to remove individual rows for companies with proportionally large amounts of data raises the problem of how to partition the data so the remaining rows are representative of the total data set. There’s probably a way to do it, but its going to be complicated. I don’t know how your data is formatted, so I guessed a reasonable configuration.

1 Like

Thanks for the examples and insight. I appreciate it