Compound primary key
Here it is sufficient to insert all the key parts into the appropriate GROUP and PARTITION BY statements, as well as into the join condition. In the data vault, you can get around the problem right away by making use of surrogate keys.
Gaps in the history
The gaps in a history are either intentional or they are the result of errors either in the source system or the integration process. Irrespective of which it happens to be, these gaps are temporarily closed during the distillation process by "ghost records" which are removed at the end of the process.
Valid-from and valid-to dates are not identical
In some environments, the valid-to date of the previous version does not match the valid-from date of the successor, without a gap being stated, e.g.:
Here it usually helps if you increase the valid-to date by the appropriate unit (here: one second, in the code example: one day). Make sure that the end date does not overflow beyond the year 9999.
Depending on the content required after distillation, the increment at the end can be deducted from the valid-to date.
No reliable dummy value
In the first version, we added a dummy value to the attribute to tag the rows. However, there is always some likelihood of this value occurring in the data itself. We therefore added the column "flag" in the intermediate steps, which is set to "-" by default and may contain the values "PREV-DUMMY", "GAP_GHOST", "UNCHANGED" and "END-DUMMY".
Several relevant attributes
Normally, it is not just the changes to a single attribute that are relevant for versioning. In our example, versioning is carried out by category and status; the designation is of no interest. By introducing the hashdiff column as the hash value (MD5 in this instance) of all the relevant concatenated attributes, you do away with the need for the AND-linked query, as can be seen in the following pattern:
The necessary NULL substitution would, however, bring back the unreliable dummy value problem. Without COALESCE() the comparison becomes even more complex. The reader is welcome to try it.
On the other hand, you can simply compare the hashdiff with EQUAL TO. The attribute itself must be carried over.