After reading an XML file, I am faced with the problem of having to restructure data because the XML file standard has changed, but the database structure cannot change with it.
This is a real example from the music industry. Each song has several identification numbers, which have to be archived in a cleanly structured way together with the title, artist and producer.
After reading the XML file, the structure looks like this:
The ID_TYP_KNZ column contains an array with the names of the individual identifiers and the ID_WERT column contains a second array with the corresponding identification numbers.
Finally, the structure should look like this:
The values from the first array, the TYP_ARRAY, are used as column headings and the values from the second array, the VALUE_ARRAY, are assigned to the respective TYP columns.
But it gets a little more complicated. Unfortunately, the TYP_ARRAYs are not stringent. The order of the individual identifiers varies and almost all identifiers are never supplied. The values of the TYP_ARRAYs must therefore be carefully compared with the columns. If TYP_ARRAY[2] is matched to the second column, as can be seen in the fifth and sixth row with EAN_UPC, for example, VALUE_ARRAY[2] must be written to the cell in the second column.
What does this workflow look like in KNIME?