Outlier Detection in Medical Claims

The goal of the workflow is to identify outliers in the medical claim data such as claims with an unusual high cost for a certain disease. In order to find these outliers the input data is group by the target variable (disease) and the interquartile range (IQR), i.e. the difference between the 3rd and 1st quartile, is computed for the numerical variable in question (cost of stay). Outliers are all records that do not lie inside the permitted interval defined by [1st quartile - x * IQR, 3rd quartile + x * iQR] where factor x e.g. 1.5 is specified by the analyst. The upper branch of the workflows allows such an analysis and allows the user to change the group and aggregation column via the meta node context menu. The lower branch of the workflow is a refinement of this approach and allows to identify outliers across several variables e.g. an unusual high/low duration of days staid for a certain disease and payment amount. To achieve this the user has to select two groups such as disease and payment amount. Data Description The workflow analyses the Basic Stand Alone (BSA) Inpatient Public Use Files (PUF) named “CMS 2008 BSA Inpatient Claims PUF” with information from 2008 Medicare inpatient claims. This is a claim-level file in which each record is an inpatient claim incurred by a 5% sample of Medicare beneficiaries. There are some demographic and claim-related variables provided in this PUF as detailed below. However, as beneficiary identities are not provided, it is not possible to link claims that belong to the same beneficiary in the CMS 2008 BSA Inpatient Claims PUF.


This is a companion discussion topic for the original entry at https://kni.me/w/BZOZNILcihfmDsuE