You work for a real estate company and want to evaluate if machine learning can help you determine median housing prices better. Which models would you select first to start studying and comparing techniques?
Here is the challenge. Let’s use this thread to post our solutions to it, which should be uploaded to your public KNIME Hub spaces with tag JKISeason4-20 .
Need help with tags? To add tag JKISeason4-20 to your workflow, go to the description panel in KNIME Analytics Platform, click the pencil to edit it, and you will see the option for adding tags right there. Let us know if you have any problems!
I also used the AutoML component (which I think is the goat in topics like this, really love it). I trained it for every model it could offer and for me the deep learning was the winner (with R2 of 0.7155).
I initially started to use the Random Forest model to test/compare the models and not sure if my results are correct for the Random Forest model which seem to indicate strong correlations between the training and test dataset:
But from viewing the others, it seems the AutoML component is the way to go and does alot of the tedious/hard work for us and provides a more comprehensive comparison analysis to other models.
@jproudfoot111 provided a good idea to include geospatial context with this challenge and I thought I follow suit but I just included the lat/long co-ordinates and ignored county information (but in real life that is definitely information you need in Real Estate business). I don’t work with geospatial data that often so this is a good exercise to get that practice. The geospatial view definitely provides excellent context to understand your real estate market versus just looking at statistical figures.
Stay tuned for tomorrow’s new challenge! We will explore our very own KNIME Forum. Let’s experiment with its content to hone our text processing skills, especially text summarization! You can even go one step further and visualize these summaries!