Dealing with conditional missing values

Hello,

I have 3 features with 2 associated features:

GR1_COUNT Number of purchases per day merchandise group1
GR1_FIRST_TIME Time of the first purchase on the day
GR1_LAST_TIME Time of the last purchase of the day

GR2_COUNT Number of purchases per day merchandise group1
GR2_FIRST_TIME Time of the first purchase on the day
GR2_LAST_TIME Time of the last purchase of the day

GR3_COUNT Number of purchases per day merchandise group1
GR3_FIRST_TIME Time of the first purchase on the day
GR3_LAST_TIME Time of the last purchase of the day

The time data is only available if GRx_COUNT> 0.

I do k-means clustering. Currently the missing time data are replaced by 0. For the analysis it is bad, because the 0 values distort teh clustering and the min, mean … values as well.

Does anyone have an idea how to deal with such conditional data ?

Thanks in advance.

Warm regards,

Michel

Hi,
What is the purpose of the clustering? You could just make a rule up-front that says: everything with GRx_COUNT = 0 is in its own cluster. You filter it out and only do k-Means for the rest. It’s still a clustering of the full dataset, just not pure k-Means anymore.
Kind regards,
Alexander

2 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.