i have a dataset which have 3600 fielsd and is extremely sparse.
i want to replace 30 % of missing value with some estimaed vales.......
i also have a column which indicates that the current record belongs to a specific cluster....
i want for each record in each clsuter do the follwoing:
replace 30 % of missing values with for example avrage values in correspoding cluster..
so i think i need implement a loop to filter records for each cluster and then proceed ...
i do not know how to implement the loop
can anyone help please?
edited:i used group loop start and it seems to be what i am looking for......but how can i calculate an avrage of all fields for one record? about 3600 fields
another solution would be to use a stratified sampling on the missing value column. Would be worth a try.
You can calculate the average use the group by node.
Best regards, Iris
The advantage of Iris's alternative suggestion (stratified hot deck imputation) is that it does not impact the data distribution - it can even be used over groups of variables to maintain internal consistency and thus correlations.
Regarding the imputation by the mean, if all variables of interest are of the same data type, then you can use GroupBy's convenient type matching to select the variables automagically.
What do you mean by calculate an average of all fields for a single record ? Ok, whatever your use case, an easy way would be to unpivot the data and apply groupby on the record id to calculate the average of the value column.
thanks for the reply but honestly i did not get the idea....
with some help from iris i could to calcute sumation of fields for a record