JSON dynamic array search

Good evening,

I have to pcroceed XML files. I found a topic in the forum where the XML file was converted to JSON with XML to JSON Node.with JSON Path Node I can extract my data of interest for one file. In this case for example the data ist in JSON array $[‘VehicleReport’][‘ComponentList’][‘Component’][43][’…’]. No I want to read in several files. My problem is, that the “[43]” index is dynamic so my data could be in index [39] or [67]…
I have a keyword that is on the top level of the nested JSON array. So my questions are

  1. How can I search for the keyword in a JSON table an extract the index of the array?
  2. How can I use this index my workflow found in JSON Path Node to extract my arrays I’m interested in?

Have anyone an idea?

Greetings,
Brotfahrer

1 Like

maybe it helps if you upload a sample file
bR

2 Likes

@Brotfahrer as @Daniel_Weikert has said an example might help.

The hub repository of @mw seems to contain various examples that deal with the unwrapping of XML and JSON files into tables. Maybe you try to further explore there also:

Then with an idea by @Andrew_Steel there was an example where an asterisk was used to introduce a dynamic element into an XPath:

1 Like

Hello @Daniel_Weikert and @mlauber71

thank you for your reply. I will try to create some example data that I can share in the forum.
Until I can share some test data I will show you a screenshot of a XPath Node which shows the content of the XML in a better way.

As you can see the sctucture of my XML files they have variouse times the same node names “Component” in the XML Cell Preview. When I extract them the XPath Node give the XPathe Query a numerical index. This index for my keyword “VGSNAG3” can change in every XML file. So I need to know in which “Component” node I find my keyword “VGSNAG3” and then tell the XPath Node only to extract this XML node.

I hope you have now a better understanding of my problem.

Brotfahrer

Here are 3 sample XML files. Each file has included my keyword “VGSNAG3” but in each file at a different position/index. I hope you can show me a way how to find the different positions of my keyword.
Sample1.xml (1.8 MB)
Sample2.xml (490.2 KB)
Sample3.xml (1.8 MB)

Thx in advanced
Brotfahrer

@Brotfahrer I built a demo workflow to show how you might approach this issue.

The key might be to use dynamic searches for certain values in the XML/JSON file and then continue the extraction. KNIME also offers nodes to sort of automatically extract JSON structures but they might run quite long and maybe produce results that can be hard to interpret.

OK so first you search for the cell and the path where your desired object “VGSNAG3” might be found. JSON Path should offer a broaf variety of search options. Unfortunately the sampel syntax with KNIME does not cover them to soem depth and I was not instantly able to adapt more complex search algorithms from other web sites.

$['VehicleReport'][*]['Component'][*]['ECUShortName']['text']

You allow the search structure some flexibility. The art is later to only keep the information you want. You can extract precise values of cells, paths to later reuse or collection of these objects that would appear as lists (and you might later have to ungroup etc.).

Not: you also extract the sub-JSON block that would belong to the “ECUShortNames” as -well_ another JSON block you could deal with later.

Then you keep only those lines that have your ID “VGSNAG3”.

Then you can now approach your second (or more) JSON (sub-)object and extract the specific infgormation you want from the (presumably) ECUShortNames-block. These again can be single texts or lists or gaian nested onjects. Here you might try your hand at the other JSON nodes.

In the end you have the results in a table

This does just demonstrate some elements you might have to combine and some paths to explore:

1 Like

Hi @Brotfahrer,

is there a special need to use JSON. Could you also use XML directly? If so, here is a solution with XPath:

Component is an array embedded in ComponentList. First you have to split ComponentList into his Component entries:

Now you have access to the entries of the individual components:

Filtered by the key value, the Loop End result is the required list:

I have left the Component XML column for further evaluations and sorted the Columns for a better visualisation

I hope it helps.

Best Regards
Andrew

4 Likes

@mlauber71 , @Andrew_Steel

thank you very much for your solutions! I will try both and give you a reply which fits for me.

Greetings,
Brotfahrer

1 Like

Hello @Andrew_Steel ,

thank you for your screenshots! With your input I’m able to extract the data I need.
While it’s tedious, because of the nested “Additional Services” XML node I have to extract each of the outcoming 70 :pensive: nodes with an XPath node, but it works. And that is the main thing :wink:

Greetings,
Brotfahrer

2 Likes