Excel to JSON: Multilayer nested arrays/objects

TheController · June 14, 2023, 9:56am

Hi Knime-Community!

First time here in this forum

I was searching already the whole forum and could find some helpful pieces that I included in my workflow, but I am still struggling with other parts.

The Situation
For my company I need to create a JSON format (for a POST request). This JSON file will be created out of an Excel file. The structure in JSON needs to look like this:

Yellow area is a simple format where every column from excel is one object in JSON.
Orange area “DIs” will be a nested array that consists of different nested objects (green boxes).
So i basically have a main level, and then different DI categories (DI2, DI3, DI4, DI5) which are like a sublevel within the “DIs” category.

I already learned from some topics in this forum, how to create a nested array.
I did it like this:
JSON - First array

Result is:
JSON - First array result
That is perfect.

But now I need to add the second (and 3rd, and 4th) green box within the array “DIs”.
So I tried to repeat the first steps for the next nested object and then try to combine them:

But result is not as expected. Second green box with DILevel 3 will be created as a seperate object on the main level. But how can I move this object to the sublevel within the “DIs” category, below the DIlevel 2?:

I hope I described myself well enough, otherwise please let me know.
Thanks in advance for your help!

denisfi · June 14, 2023, 10:59am

Hi @TheController ,

For Json, you have some other options to use… as table to Json that you can bring you table rows into a Json sequence and Json combiner (rows/columns).

Just a tip, you have the Containers options too that will convert all into a json scheme and you can use it after with other workflows and requests as you need.

For the last tip, you can use Json reader to build the structure and then make changes/replaces to manipulate the information, so you can make a mask and replace with string manipulation to change the content as you wish and then, you convert again into json type.

I hope that some of them can help you now as a shortcut for you…

BR,

Denis

TheController · June 14, 2023, 11:15am

Thank you Denis!

I will work my way trough your mentioned nodes and try them.
In case I will get stucked at some point, I will ask again

ArjenEX · June 14, 2023, 12:21pm

Hi @TheController

As a general tip, it’s always useful to include the Excel file that you are using with the same dummy info as in your screenshots so people can understand the structure of your source file This avoids assumptions and could make a big difference in the approach that you can take.

In general, in situations like this it is best to start at the lowest array level and gradually work your way up. This is one approach assuming this is the structure you have :

Key points:

For visibility purposes, I split the creation of the JSON’s for the main information and the DI’s.
The DI’s can be created as you already have done. Make sure to use the Unnamed root element as your root key name

Then combine the DI’s while omitting the root key. Name the column DIs

After appending, you’ll have the main information and the DI JSON next to each other.

Then combine the two while using an Unnamed root element.

Final result:

See WF:
Excel to JSON Multilayer nested arrays objects.knwf (41.9 KB)

Hope this helps!

TheController · June 16, 2023, 11:45am

Thank you as well @ArjenEX

That’s funny, I think my workflow from above that I am using currently, is the one I got from one of your older posts a few years ago

I will look into both of your proposals beginning of next week.
Just wanted to let you know I am not ghosting you

Thanks so far!

Ah and btw:
Attached the Excel file. Simply all objects are one column.
JSON tryout for sharing.xlsx (13.5 KB)

TheController · June 19, 2023, 2:20pm

Hi @denisfi I tried the container method, but I just couldn’t find out how they work and how to combine them all unfortunately. But I like the idea of creating a template/mask. If you can show it a bit more detailed, I would be happy to learn.

@ArjenEX That was amazing. I re-built it according to your guidance and it worked! Thanks a lot!

Now PART 2:
Second question. I just didn’t want to overload my fist question. I like to tackle problems one by one.
So here is the next thing:

As you can see, in my Excel I have the columns named like this:
o DILevel DI2
o DILevel DI3
o DILevel DI4
o DILevel DI5

This is important so that users know where to enter the dates for the different DI levels (btw: DI = Degree of Implementation).

But the JSON format requires this:
2023-06-19 16_17_40-OPW API requests.docx - Word

That means, all DILevel columns have the same name.
How can I rename now these objects during the workflow?
I realized when I use some kind of JSON/Table transformation or Column Rename Nodes to give all columns the name “DILevel”, that it won’t work as KNIME automatically makes it “DILevel (#2)”, “DILevel (#3)” and so on.

Any ideas here?

ArjenEX · June 19, 2023, 2:56pm

@TheController

Something like this then?

I’ll elaborate further on how I got there once I know it’s what you’re looking for

TheController · June 19, 2023, 3:47pm

Exactly this, yes

ArjenEX · June 19, 2023, 4:04pm

Allright @TheController , this is the route I took:

Steps:

Since we’re dealing with rows and columns that need to stay separated, I initiated a chuck loop, making the WF process each record individually.
Make a split between DI and non DI information (similar to the first version). For the DI, I initiate an Interval Loop between 2 and 5. This corresponds to your DI numbers.

Due to the common column name issue, I process each DI individually. As such, I need to dynamically filter for the DI that is being processed to later on create the JSON. This can be done with join("*DI") + string($${Iloop_value}$$). Convert it to a flow variable and pass that to a Column Filter node. Make sure the variable is applied properly.

It’s then possible to remove the DI from the column name when using [ ](DI)[0-9]+ in a Column Rename (Regex) node.

Create the JSON as earlier done and, after all DI’s have been processed, combine them with a JSON Row Combiner.

Append that again to the non DI information and create the final JSON (use unnamed root element again).
Exit the outer Chunk loop. If you eventually need to do the POST request, you can technically already put that after the last JSON creation before the exit the outer loop.
Final result is two full JSON’s.

Note that I didn’t cater for the possibility that a DI between 2 and 5 is not present. If could be that the workflow crashes or outputs unwanted results like an empty JSON, but I will leave that up to you to catch

See WF:
Excel to JSON Multilayer nested arrays objects V2.knwf (75.6 KB)

If this helped please mark the post as solution so that other KNIME users can also benefit from this in the future.

TheController · June 20, 2023, 12:35pm

Wow, thanks. This solution was different than I expected
Never worked with loops etc.

As I never want to just copy solutions, but really understand them, I tried to also find an own way.
Inspired by your loop, I thought what if I just split the task in 4 different sequences, rename the columns and then append them again. Some kind of a “manual” loop.
This is what I did:

Seems it worked as well. Result:

I am really happy with that.
Question here is: How do I get rid of all the other columns? Or is that not important if they all appear there, because in the POST Request node I can for example select a specific column for my body, and thus I would just select this last column called “FinalJSON”…?
This is the full picture from the last node (Node 8, Columns to JSON):

Btw: When I mark an answer as solution, will this thread then automatically be closed?

ArjenEX · June 20, 2023, 12:48pm

Many nodes offer to option to remove the source columns. If you look at the Columns to JSON one, it’s at the bottom.

But you’re right, in the POST node you just select the finaljson column as source for the request body. Otherwise just use a Column Filter with enforced inclusion/exclusion.

Btw: When I mark an answer as solution, will this thread then automatically be closed?

No, it won’t.

TheController · June 20, 2023, 12:55pm

That’s what I did.
And still I got all these columns as you can see on the bottom right:

But Column Filter is a great idea. That should make it cleaner.

Thanks a lot @ArjenEX ! You are really quick in your answers, detailed, helpful and truly a win for the KNIME Community

I will finalize this workflow now, might be that I will have one last question in the next days, but first I need to try it by myself.

Have a good day!

ArjenEX · June 20, 2023, 12:57pm

No problem, good to hear that you can proceed with your work! Happy KNIME’ing

denisfi · June 20, 2023, 6:54pm

Hi @TheController ,

Sorry by the later message… many projects takes a lot time from me from these days…

When I suggest a template/model, you can just create a simple/silly json scheme and map some parts to be used as “flags” to be overwritten when you need.

Example:

{
  "ActionID": @1,
  "OrganizationID": @2
...
  "DIDs": [
    {
        @3,
        @4,
       ....
    }
  ]

}

You can use column expression node or string manipulation to substitute the @Number for anything you need, like components to replace it.


regexReplace($A$,"@1",$B$)

Don’t forget to convert it into Json format again to use as body for your request, but you can see that a simple solution to manipulate the data as you need.

I made it for a project that I needed to change some body requests using the same “template”, avoiding me to expand a lot of time and process for each case.

Tks,

Denis

TheController · June 23, 2023, 9:12am

Ah great, that worked. Also a very interessting solution.
Trying to work with both solutions really helps me to improve my KNIME skills and discover different nodes and workarounds

This is how I did it:
2023-06-23 11_16_02-KNIME Analytics Platform

The columns from the excel are transformed to variables so that I can use the excel values for the string manipulation later on.

My JSON Mask from the Container Node looks like this:

The @strings are then replaced by the string manipulation with the variables:

The result is:

But, 2 problems:

How can I manipulate many many more items? String manipulation only allows one expression. I would need maybe 50 replacements in the end
All this was only replacing the first row from the excel file. But how can this be done for all rows?
So I have one JSON mask, but as my final Excel file would have up to 300 rows, I would need this mask to be created hundred times and replaced also respectively.

takbb · June 23, 2023, 11:05am

Hi @TheController,

I was watching this topic a week ago, and was having a play with the original solution with a view to making it more scalable (using a loop instead of a sequence of “Columns to JSON” nodes). The loop adds a little complexity and as you already appeared to have a solution, I didn’t upload it.

Now I haven’t followed closely since to see whether this will assist you (nor indeed if it simply covers the same ground that has already been discussed) but given you are now talking about problems with performing the operations on a much larger set of columns, I wonder whether some of the ideas in here can assist.

Excel to JSON Multilayer nested arrays objects.knwf (70.1 KB)

The top branch is one of the earlier solutions offered. I’ve left it in so you can see where my workflow fitted in with the timeline of this thread. As you can see, the “loop” version increases the nodes and complexity, (but does not have to be modified to handle increasing iterations of the same columns) in order to return, hopefully, the same result. I hope it offers some ideas

ArjenEX · June 23, 2023, 11:33am

We were on the same line of thought, just slightly different ways to get there I had mine here, yours requires less manual intervention.

takbb · June 23, 2023, 11:41am

Hi @ArjenEX, Yes I just looked back up the posts and saw that the thread had continued on without me

I don’t know that my solution offers anything other than an alternative, but sometimes the final answer turns out as a “take the best of both”. I often find looking at other people’s workflows such as yours and seeing how other people approach it allows me to have sudden bursts of (“stolen”) inspiration on an unrelated task a few weeks or months later

TheController · June 26, 2023, 6:01am

Wow, great thanks a lot guys!
Okay, so now I have a few solutions from you with more or less the same idea but different built.
Really interessting.

Today I will receive the full JSON format from a colleague so that I can fully start working on the whole solution. I will work myself through your solutions and try to understand and rebuild the workflows.

Latest end of this week, I will give you my feedback on this
But hopefully I am quicker

system · July 3, 2023, 6:02am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.