Parsing JSON output from CKAN

krabina · February 22, 2022, 3:42pm

I am trying to do a simple thing: convert a JSON usually found in open data portals to further process the metadata with KNIME. However, I cannot get this to work:

GET Request node gets me this JSON:
https://www.data.gv.at/katalog/api/3/action/organization_show?id=stadt-linz

If tried JSON Path, where I want to select the users. The result is a List (collection of JSON). But when I then try to add a JSON to Table node, it just gets me the original “body” node, not the new users node.

When I try to directly connect the JSON to Table node to the GET request, I can manage, but then the resulting table is not organized in order of the JSON file, and only some columns are named with #1, #2, etc. others are not.

The JSON from the above example is not so complex, so this cannot be that hard. Can anybody help me out?

ScottF · February 22, 2022, 3:58pm

Hi @krabina and welcome to the forum.

Can you try an Ungroup node between the JSON Path and JSON to Table, like this?

2022-02-22 09_57_58-KNIME Analytics Platform

I think that should get you closer, based on my quick testing.

krabina · February 22, 2022, 4:21pm

Indeed, the Ungroup node was what I was not aware of. Awesome, thank you!

krabina · February 22, 2022, 4:40pm

Now I tried this with the actual dataset I am interested in: https://www.data.gv.at/katalog/api/3/action/organization_show?id=stadt-linz&include_users=no&include_tags=no&include_extras=no&include_followers=no&include_datasets=yes
This lists 745 rows, so should not be a big problem It runs quickly until the Ungroup. But the JSON to Table node takes forever and will not finish, but gets stuck at 2/745 “Row0_2”

Daniel_Weikert · February 22, 2022, 5:47pm

What are you extracting? The Json path and ungroup should already return a table.
br

ScottF · February 22, 2022, 5:57pm

Can you upload the current state of your workflow? As it is I’m just getting a bunch of missing values using the URL you provided.

krabina · February 22, 2022, 6:07pm

OGD Linz.knwf (17.8 KB)

Yes, the output from Ungroup shows a table (same as in the working users import. But then the JSON to Table converter fails.

ScottF · February 22, 2022, 6:38pm

I have the same issue. Not sure what is going on except to note that for whatever reason the data in the packages collection query seems to be quite large (as opposed to what showed up in the simpler users example above).

For example, if you filter down the packages collection query to a single row after the ungroup node, the JSON to Table node will finish, although it takes a while - eventually producing a table with over 41,000 columns.

Maybe the initial collection query is malformed somehow? There’s a LOT of data there.

Thyme · February 22, 2022, 6:47pm

You’re returning the paths instead of the values in the JSON Path node. Ungrouping that list ungroups it into multiple rows, while also copying the original JSON unmodified into all rows as well. Doing a JSON to Table node on 745 copies of that already somewhat large JSON probably froze your machine (mine as well )

Instead you’d want to return the values, and also return them as type JSON instead of String. Ungrouping that gives you 745 rows with a different package each. JSON to Table then gives a reasonable amount of 67 columns, though I’m not sure that’s exactly what you want? You might want to query your data manually, or chain another JSON Path to get the nested data.

parsing JSON output as CKAN.knwf (26.8 KB)

ScottF · February 22, 2022, 6:53pm

Thanks @Thyme, I was hoping someone would come along that’s more conversant in JSON parsing than I am

Thyme · February 22, 2022, 6:57pm

@ScottF Well, I’m an XML guy, but it translates to JSON fairly easy

krabina · February 22, 2022, 8:46pm

Thank you all for the very fast and successful help! I will try to improve this and share it as a workflow on the KNIME hub. Harvesting metadata from open data portals is something that potentially man other uses can be interested in as well!

The next step for me is to find ways to get a better overview over the data!

krabina · February 22, 2022, 8:55pm

btw. this works in basically all open data portals, because 90% of them use CKAN. So you could use this on govdata.de or other other open data portals:
API guide — CKAN 2.9.5 documentation

system · March 1, 2022, 8:55pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.