I am having a lot of trouble adapting the 09_Wide_and_Deep_Learning_on_Census_Dataset example workflow to use other data sets. I like this workflow because it auto-configures the model shape and handles string and numerical input data transparently.
I want to adapt this example workflow for use with other tabular data sets like the Wisconsin Breast Cancer data set from the UCI Machine Learning Repository. I have tried also other data sets like the Boston Housing data set. All of my test models are perfect (100% accurate). I must have made some mistake in the configuration. I am wondering if the target variable sneaked into the mix of predictor variables somehow (the workflow is very complex).
I want to use this workflow on 10,000 records of clinical data to predict the incidence of precancerous colon polyps at the UCI Medical Center. My bagged boosted tree model is only about 70% accurate. I am hoping that the deep learning model can beat that. Some changes must be necessary in the workflow configuration to use other data (other than specifying the target variable), but I can’t figure out what I missed.
Any suggestions anyone?