01_Semi_Automated_ML

This workflow is for semi-automated data blending from two different datasets. Each dataset consists of unique columns and overlapping columns that appear in both datasets. The first part of the workflow represents the machine learning algorithm, which matches corresponding rows (using numeric and string distance metrics to calculate the distance between the rows from tables 1 and 2, based on the selected columns. Based on this distance, the domain expert can decide whether they want to trust the prediction or inspect the results and correct them if needed. This is handled in the second part of the workflow. This workflow generalizes well, as the algorithmic part serves simply as an example and can be exchanged by any ML algorithm; the interactive views can be adapted easily to different use cases.


This is a companion discussion topic for the original entry at https://kni.me/w/F-cTVfoLv1Of-MgW