KNIME JSON Challenges...

All:

First, let me say that I know this is not a new topic. I have read the multiple other posts on this, downloaded the sample workflows to try to make it work, and haven’t been able to achieve my goal through trying:

  • JSON to Table Node
  • The Ungroup Node
  • JSON Path Node
  • Converting JSON to XML and using XPATH

What I am trying to do seems very simple and common for processing JSON, but I can’t seem to make it work right. I have data in the form below:

I am trying to simply have a flat table output with the following columns:

  • season
  • seasonType
  • week
  • poll
  • rank
  • school
  • conference
  • firstPlaceVotes
  • points

The multiple levels of hierarchy in the JSON are complicating retaining the integrity across my rows as I need to repeat season, seasonType, week, and poll for the more granular rank, school, conference, firstPlaceVotes, and points values. It seems the JSON Path Node was built to do this through collection queries, but when I try to use this I lose integrity between queries.

The answer may be very simple so apologies if I am “not getting it”, but I have spent hours on what should be a basic ETL challenge and remain unsuccessful. Can anyone help guide me as to what I am doing wrong?

The actual data file is attached in case there are incorrect JSON structures I am not seeing which has been the issue in a number of other posts.

Ranking JSON Data File (93.8 KB)

Thanks in advance for any help on this.

Best…MV

Attached you’ll find a workflow which gives you the, hopefully, desired output. I have to admit that the workflow is very checky but it works. I would love to see other, more easy, approaches to get the desired output.

Hope the workflow helps though.

JSON_Reader.knwf (21.7 KB)

3 Likes

laaaarsi,

THANK YOU! This is exactly what I needed. After reviewing the approach, I see that you keep the integrity by getting everything (under the seasons column) and then use the JSON Path nodes to further parse that while the missing values nodes keep the repeated cells. Brilliant. I had it in my head that I should do the collection queries all from the source (separately) which is why I was losing the integrity.

Thanks again!

3 Likes

You’re welcome! But as I already stated this approach ist very complicated and maybe somebody has a less complicated one. But Iam happy that I was able to help you :slight_smile:

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.