Process multiple occurance of same field in JSON with different values

Hi, I have a json that looks similar to the record below. In the json, it can have multiple GIVENNAME, MIDDLENAME, and SURNAME for the same record. I need to extract all of the SURNAME, for example, in one field called FirstName with space between them as they all belong to a single record. I tried JSON Path, but not sure how to get what I am looking for here. Any help will be highly appreciated.
[{
“string”: “May”,
“termType”: “GIVENNAME”
}, {
“string”: “Green”,
“termType”: “MIDDLENAME”
}, {
“string”: “Dowman”,
“termType”: “SURNAME”
}, {
“string”: “Hinna”,
“termType”: “SURNAME”
}]

The output I am looking for is:
FirstName, MiddleName, LastName
May, Green, Dowman Hinna

Hi @rumanac , this is quite simple to do actually.

Just to confirm what you said: “I need to extract all of the SURNAME, for example, in one field called FirstName with space between them as they all belong to a single record”. I am guessing that you meant LastName instead of FirstName, and I put something quickly for you based on that assumption.

This is what the workflow looks like:

Input data (same as what you have):
image

Results:
image

FYI, the rule that was applied on SURNAME, that is if the termType repeats, the string values for the same termType will be joined by a space, is applied on all 3 termType.

Here’s my quick workflow:
Process multiple occurance of same field in JSON with different values.knwf (23.5 KB)

BTW, this is quite a weird JSON structure. Normally, the structure is like:
"variable" : "value"

2 Likes

Hi @rumanac , just some additional comments. JSON Path would not be useful here, because of the fact that the JSON data that you have is not following the structure of "variable" : "value" that I have already mentioned.

You use JSON Path to go fetch the value of a variable. In your case:

  1. The variables are “string” and “termType”
  2. It looks like you cannot determine in advance how many “string” and “termType” you can have, since you can have more than 1 GIVENNAME, and/or more than 1 MIDDLENAME, and/or more than 1 SURNAME

Hi Bruno,
Nice to hear from you. Hope you are doing well. Thank you so much for the response. That looks like a good plan of getting the expected output. I will definitely try it out. I actually ended up completing it using a different approach - I wrote some Java snippets to get the expected output. Thanks again! Rumana

1 Like

Hi @rumanac , no problem, happy to help, and happy to hear that you resolved it. As we always say, there are more than 1 way to resolve something in Knime, and this can definitely be done in Java or Python.

It would be interesting if you could share your Java Snippets solution so that the other members of the Community can see another way to do this.

1 Like

Json Path would be a great solution if there was some id attached. I would like to see the full data to try out some possibilites

Hi @Daniel_Weikert , I would be interested to see how we can use JSON Path here.

If the data was like this:

{
	"GIVENNAME": [
		"May"
	],
	"MIDDLENAME": [
		"Green"
	],
	"SURNAME": [
		"Dowman",
		"Hinna"
	]
}

Then for me it would make sense to use JSON Path. You can go to the GIVENNAME, or MIDDLENAME or SURNAME paths, no matter how many GIVENNAME or MIDDLENAME, or in this example, SURNAME you have, it would be ok.

For example:

{
	"GIVENNAME": [
		"May",
		"Mary"
	],
	"MIDDLENAME": [
		"Green",
		"Herbs"
	],
	"SURNAME": [
		"Dowman",
		"Hinna"
	]
}

That would not be a problem using JSON, you still have 3 paths.

But if I translate the above example to the current given format, it would look like:

[ {
  "string" : "May",
  "termType" : "GIVENNAME"
}, {
  "string" : "Mary",
  "termType" : "GIVENNAME"
}, {
  "string" : "Green",
  "termType" : "MIDDLENAME"
}, {
  "string" : "Herbs",
  "termType" : "MIDDLENAME"
}, {
  "string" : "Dowman",
  "termType" : "SURNAME"
}, {
  "string" : "Hinna",
  "termType" : "SURNAME"
} ]

Now, suddenly I got 6 pairs of paths vs 4 pairs from the original data.

If I have to access each path, I have to access them by position/index, for example <some_level>.string[2] or <some_level>.termType[2], and we can also speculate that the amount of pairs will be dynamic per record, so you would need to determine what’s the n position/index will be (<some_level>.string[n]).

Obviously it can be done, but it looks more work to implement, such as figuring out the size of n, looping to retrieve the string[0] and termType[0] to string[n] and termType[n] (I am guessing that’s how it got implemented in the Java snippet), etc.

I would still be interested to see how you implement it with JSON Path though.

Hi @bruno29a, you could use the following JSON Path queries (all as lists):
$..[?(@.termType=="GIVENNAME")].string
$..[?(@.termType=="MIDDLENAME")].string
$..[?(@.termType=="SURNAME")].string

2 Likes

Very nice one @daniela_digles !!! I didn’t know we could do this, it’s a good way to search by values.

I used your JSON Path and it can be done in a few nodes (In the v2 one, where I added both samples):

JSON Path config:

JSON Path results:
image

Final results:
image

FYI: v1 can process both types of sample, but because of the transpose and moving of first row as header, it cannot process both at the same time.

Here’s the workflow:
Process multiple occurance of same field in JSON with different values v2.knwf (30.5 KB)

2 Likes

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.