JSON data into Knime and exported as SD file....

I apologize if this question is confusing or WAY too basic for this forum. I have JSON data that I am reading into Spotfire directly using script controls. However, Spotfire is an analysis tool that is way too pricey. I’m hoping that someone would be able to share a workflow teaching me how to mine JSON data off a webpage, go through a statistical analysis, and to generate a sunburst chart with a multiple variable analysis. I’ve never had to deal with JSON data before. I can provide sample csv files that I have obtained from in the past where we mined data off a website and used Python scripting to generate a Spotfire analysis. However, I’m pretty sure this entire process can be done in KNIME. Thanks!

Yes I hope it can. You can also use Python code within KNIME and combine it with other functions. If you have a working Python file why not give it a try. Otherwise there are free (Vernalis) and payed (Selenium) tools to harvest data from the web.

Concerning JSON you will have to define a pattern (XPath) for the data in order to transform it into a usable table.

So if you could provide us with an example this might help. And if it comes to predictive models KNIME also has you covered there.

2 Likes

What I would like to do is to abandon the Python scripting and read the data direct from a website into KNIME. Then choose whatever nodes I need to do the statistical analysis so as to generate a sunburst plot of the data that is flexible for the end user to highlight/manipulate so as to understand the data more thoroughly. Essentially, EVERYTHING from reading in the data, to processing the data, to visualization of the data is all done in KNIME. The data sources are api services provided by the bluealliance (https://www.thebluealliance.com/apidocs/v3). We pull data from that site real-time from a number of different " tables" (e.g. …/event/{event key}/rankings, and …/event/{event key}/oprs, for summary info, …/event/{event key}/matches for detailed match info).

1 Like

The Vernalis Load Local Text Files node will, despite the name, accept a remote URL via a flow variable, and then a String to JSON node will get you into the JSON format. Then you can process the JSON using the dedicated JSON nodes.

and

(There is a Vernalis Load Local… Files node for XML - we should probably add a JSON version for completeness, which I dont think would be very difficult to do)

Steve

2 Likes

Looking at your link, it may be that you need to use the REST nodes to handle the authentication keys:

Steve

2 Likes

Steve,

I am beginning to understand what I want to do. I’m still running into errors. I am using the website

and in the website are embedded JSON files for various metrics at the match level and district level of competitions. So, in KNIME I chose the Get Request node, used the above web URL as the URL in the connection settings. However, I do not see where I have the option of handling this request “All endpoints require an Auth Key to be passed in the header X-TBA-Auth-Key .”. Assuming I learn that, and can actually extract all the JSON data, I then have built a workflow through a JSON Path Node to a JSON to Table Node. Presumably I can then manipulate the data, concatonate, column filter, row filter if I wanted to and then have it flow to a data reader like a Sunburst chart. Does this sound correct to you?

That sounds correct. I think you should be able to enter " `X-TBA-Auth-Key" as header key in the Request Headers tab, and your actual key in the corresponding header value (I dont have a key to try this to check!):

Click on ‘Add Header Parameter’, and I think you want your key as a constant (unless you somehow have it in a flow variable or workflow credential)

Once you have that working, JSON Path should let you extract the data you want from the result.

Steve

Steve,

So far so good. However, when I connect the JSON Path node I get an error saying that there is no column in spec compatible with “JSON Value”. However, I know that there are JSON tables on that webpage. Any idea what I may be doing wrong?

Will

Does your output contain the json in a normal string column? If so, use the String to JSON node to convert it to a JSON column type first.

Steve

Steve, you have been so incredibly helpful. I really appreciate it. I have the following workflow that continues to fail:

image

The error message I receive on the string to JSON node is “Execute failed: Unexpected character (”<" (code 60)): expected a valid value (number, String, array, object, ‘true’, ‘false’ or ‘null’) at [Source: java.io.StringReader@7d49b006; line 1, column: 2] in row: Row0.

Please advise…

Looks like the json has a < character in a numeric field, which shouldn’t be there. I would suggest using a string manipulation node to handle this. You will have to decide what to do with the values, e.g. Make the null or a fixed value. Probably you need the regular expression replace function, with a regex something like e.g. :\s*<\d*\.?\d* to capture the issue.

Steve

Steve,

I found the string manipulation node and inserted it into the workflow. However, I’m not familiar with how to configure this node. This is the page I see when I click on configure. Please advise:

OK, I’ve put together an example workflow for you (see attached).

This takes some very simple JSON I entered myself, and reproduces the error, and fixes it in several different ways:

  • Unchecking the ‘fail on error’ option in the String to JSON node - this is a bit of a blunt tool most likely, because you will lose all the other, valid, information in the JSON String in question
  • Replace the ‘broken’ values with ‘null’ (which will be extracted to a missing value cell)
  • Replace the ‘broken’ values with a default value (in this case, I chose ‘-1’, but you could change that)
  • Replace the ‘broken’ values by just removing the ‘<’, e.g. ‘<3’ becomes ‘3’, ‘<3.75’ becomes ‘3.75’ etc.

In each case, the output is passed into a JSON Path node configured identically so you can see the various effects.

Steve

Broken_JSON_Parser.knwf (51.5 KB)

Most probably you get an xml/html, not a JSON in the column you try to convert to JSON. Have you checked it? It can be an error message web page.