synapse pyspark delta lake merge scd type2 without primary key

mlauber71 · December 9, 2023, 5:53am

@jdisunilkumar welcome to the KNIME forum. You could try and use some sort of similarity search on the whole row entry.

If the dimension are indeed slow changing this might work. You could also try to introduce an artificial ID based on this so to make the use easier.

This will depend on how many columns there are and what kind of data there is and how this would typically change. So for example you might use just the string values if you suspect them to form an identity.

Where this would fail is if you have things like addresses. If a customer would change the address the row would be completely different but still the same customer. In such a case you could try and just go for the numbers. Their portfolio or purchases might not have changed.

The as a remark. Database systems without proper key values are … not a good idea. But I assume you already know that.