Reducing Data

kmarrs901 · October 10, 2022, 5:51pm

I have a data where I have groups of reported salaries. In this example, different companies reporting the individual salary rates of staff. Is there a way to control for situations where one company makes up 25% or more of the reported salaries for all groups? I would like to reduce the influence of that group on the data, while still maintaining their mean salary. I was thinking of a row filter node, but wondered if there would be a different approach, if it were possible.

rfeigel · October 10, 2022, 7:10pm

I’m not sure what you want to do downstream, but if all you want is an unweighted mean salary by company and job description take a look at this simple example using the Groupby node.

kmarrs901 · October 10, 2022, 11:12pm

I will running statistics on the weighted statistics. In this case, the incumbent (employee) average and other statistics (e.g., 25 percentile).

rfeigel · October 11, 2022, 3:02am

I uploaded a new workflow under the same name. You can access it with the original link. It calculates statistics by company and title and statistics for all companies by title. This seems like a straightforward solution. Trying to remove individual rows for companies with proportionally large amounts of data raises the problem of how to partition the data so the remaining rows are representative of the total data set. There’s probably a way to do it, but its going to be complicated. I don’t know how your data is formatted, so I guessed a reasonable configuration.

kmarrs901 · October 12, 2022, 12:25pm

Thanks for the examples and insight. I appreciate it

system · January 10, 2023, 12:26pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.