Parsing and Pivoting JSON

#1

I am building a workflow to process data from an API, and the JSON it returns is nested with arrays and I am struggling to parse/transform the data to get my desired schema. Here is a sample of the JSON returned.

[
  {
    "id": 401112436,
    "teams": [
      {
        "school": "Louisville",
        "conference": "ACC",
        "homeAway": "home",
        "points": 17,
        "stats": [
          {
            "category": "fumblesRecovered",
            "stat": "1"
          },
          {
            "category": "tacklesForLoss",
            "stat": "6"
          },
          {
            "category": "firstDowns",
            "stat": "21"
          }
        ]
      },
      {
        "school": "Notre Dame",
        "conference": "FBS Independents",
        "homeAway": "away",
        "points": 35,
        "stats": [
          {
            "category": "fumblesRecovered",
            "stat": "3"
          },
          {
            "category": "tacklesForLoss",
            "stat": "11"
          },
          {
            "category": "firstDowns",
            "stat": "18"
          }
        ]
      }
    ]
  }
]

I would like a final data set that has 2 rows, for the above JSON sample. Each row would have the following columns;

  • id
  • teams.school
  • fumblesRecovered
  • tacklesForLoss
  • firstDowns

So, I want the ID and the team name, and then I want to pivot our the array of stats onto separate columns.

My workflow to attempt to achieve this has become a monster of JSON Path, Split Column Collections and Transposes… and is still not working. There has to be a simple way to achieve this?

0 Likes

#2

This is with what I came up. The point is to first define the initial group of variables as JSON path, then ungroup them, and then extract the stats as additional JSON paths and then treat them again with JSON paths and use Pivot to ‘transpose’ them. Finally joining the information together by the school (name).

With the ID I used a missing value node since I was not exactly sure how to handle them.

PS.: welcome to the KNIME forum :slight_smile:

kn_example_json_ungroup_pivot.knwf (53.7 KB)

2 Likes

#3

Hi @cctechllc,

EDIT: here is a (second) small monster for a solution. We need one more JSON Path and one more Ungroup, because there could be more then one competition.

football.knwf (35.1 KB)

Best regards
Andrew

3 Likes

#4

Many thanks Andrew! This Ungrouping and Pivot were the keys! I will have to study this solution more to make sure I understand how it works, as I think more data from this API will require similar handling.

I have this approach integrated into my workflow and working!

Chris

0 Likes

#5

Thanks for the solution! I need the ID, as this is the best key to join back to other data sets in the API as it uniquely identifies each game. I am very new to Knime, but blown away that this sophisticated of a product is available for free!

I am using it to consume this college football data, to cut my teeth on the tool and hopefully put together a model for CFB, but I hope to be able to contribute to the forum as I learn.

Chris

1 Like

closed #6

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

0 Likes