SCD.knwf (950.0 KB)
I thought to take a stab at a more complex, Type 7 SCD problem using @stelfrich’s great component (found here) as a base. I made some updates to more fully cover Type 7, but could use a hand from the community to make this a bit more robust.
In the attached SCD workflow, I have a list of Scenarios on the left side that I copy rows from into the “Incremental Reporting” Table Creator (this would mimic incremental CSVs or data pulls, that may or may not contain duplicate/redundant rows).
For my scenario tests, I copy one row at a time and evaluate the effect. When a scenario fails, I copy over the workflow and make changes until fixed, and copy a new workflow again below and proceed down the scenarios until the next failure.
I believe the new SCD component passes scenarios 1-7, but fails on scenario 8 (which is, what happens when we need to load an initial, older SCD dataset that needs to be updated to this new system).
Appreciate any support on evaluating this updated SCD component.
Scenario Test Table: (current component passes scenarios 1-7)
Scenario | Sold Date | List Price | Last Status | ListingID | Original Price | Sold PRice | Updated Time Stamp | Postal Code |
---|---|---|---|---|---|---|---|---|
1: First row | 2160000.0 | New | Z1033099 | 1699000.0 | 2023-03-12 | M6H1N7 | ||
2: Unsold (duplicate), repeat insert. | 2160000.0 | New | Z1033099 | 1699000.0 | 2023-03-12 | M6H1N7 | ||
3 Unsold, Updated Timestamp | 2160000.0 | New | Z1033099 | 1699000.0 | 2023-03-13 | M6H1N7 | ||
4 Unsold, Updated Timestamp, Price Changed | 2200000 | PriceChange | Z1033099 | 1699000.0 | 2023-03-14 | M6H1N7 | ||
5 Sold | 2023-03-15 | 2200000 | Sold | Z1033099 | 1699000.0 | 2200000.0 | 2023-03-15 | M6H1N7 |
6 Sold Duplicate | 2023-03-15 | 2200000 | Sold | Z1033099 | 1699000.0 | 2200000.0 | 2023-03-15 | M6H1N7 |
7 Sold Duplicate, Date Change (e.g., some irrelevant / nonvisible field changed) | 2023-03-15 | 2200000 | Sold | Z1033099 | 1699000.0 | 2200000.0 | 2023-03-16 | M6H1N7 |
8 All Scenarios Imported at Once | 2023-03-15 | 2200000 | Sold | Z1033099 | 1699000.0 | 2200000.0 | 2023-03-16 | M6H1N7 |
There is a potential scenario 9 where if columns XYZ did not change, but the updated time stamp (an external field) did change, then just update the “valid/current row” with the new updated time stamp. Updating the component to this scenario would allow for optimizing the overall SCD table size further when redundant updates are made.
Expected SCD Final State from Scenario 1-8:
Sold Date | List Price | Last Status | ML Number | Original Price | Sold Price | Updated Timestamp | Postal Code |
---|---|---|---|---|---|---|---|
2160000.0 | New | Z1033099 | 1699000.0 | 2023-03-12 | M6H1N7 | ||
2160000.0 | New | Z1033099 | 1699000.0 | 2023-03-13 | M6H1N7 | ||
2200000 | PriceChange | Z1033099 | 1699000.0 | 2023-03-14 | M6H1N7 | ||
2023-03-15 | 2200000 | Sold | Z1033099 | 1699000.0 | 2200000.0 | 2023-03-15 | M6H1N7 |
2023-03-15 | 2200000 | Sold | Z1033099 | 1699000.0 | 2200000.0 | 2023-03-16 | M6H1N7 |
Expected with 9th Bonus Scenario:
Sold Date | List Price | Last Status | ML Number | Original Price | Sold Price | Updated Timestamp | Postal Code |
---|---|---|---|---|---|---|---|
2160000.0 | New | Z1033099 | 1699000.0 | 2023-03-13 | M6H1N7 | ||
2200000 | PriceChange | Z1033099 | 1699000.0 | 2023-03-14 | M6H1N7 | ||
2023-03-15 | 2200000 | Sold | Z1033099 | 1699000.0 | 2200000.0 | 2023-03-16 | M6H1N7 |