csv file with a json column into a new table

Hi everyone.

I am starting to use Knime so I apologise if my question is too basic. So here is the thing: I have a csv file and some columns in this file have a json notation, as the following:

column genre [{"id": 12, "name": "Adventure"}, {"id": 14, "name": "Fantasy"}, {"id": 28, "name": "Action"}, {"id": 878, "name": "Science Fiction"}]

I need to find a way to treat these kind of columns properly in order to do some interesting 'whatever' analysis. So, I tried to use json to table node but the console box says 'No column in spec compatible to JSONValue'. I also tried to import the csv file as a json file and was not accepted.

So, I need some help.

I thank you all in advance.

hi there

would it be pissible to have a little sample file?
there are also nodes like file reader  and line reader to test with

Hi,

no matter what kind of reader (e.g. file or csv reader) you used, it does not automatically recognise that some columns might be in JSON format. It just reads it in as a string. So you first have to convert the column and its values with a String to JSON node. Afterwards the column format is JSON and you can use a JSON path, JSON to table or whatever node you like to do your transformations. Hope that helps.

Cheers,
Marten

IMBD_v1.knwf (1.7 MB)

I am trying to convert the columns genres to a JSON and then to parse it in a traditional table with multiple records but I did not manage to do it. Please, could you help me? Please find attached the workflow

Thanks
Regards

do you have the data in any other way? From my perspective the JSON format lacks the quotation marks. It could be possible to manipulate them back in but better to get clean data.

image

Could be that you can use Relaxed JSON for this but I would have to research for an appropriate package in R or Python for that


edit. There seems to be a Python package for that, but I have not tried it: https://pypi.org/project/demjson/

2 Likes

Or alternatively, this approach with a Java Snippet might work (seems like that the screenshot is no longer embedded into the post, probably due to the forum migration, but the link “Bildschirmfoto” will show it):

2 Likes

Dear all,
Thank you for your answer. Please find attached the original file. As you can see the quotation marks exists in the original file (it is also present here https://www.kaggle.com/tmdb/tmdb-movie-metadata). You cannot see the Quote in Knime File Reader because to import it correctly I had to add the Quote as a Quote Character in Advanced Tab of the File Reader node.

So how how can I import and parse correctly this fie?

tmdb_5000_movies.zip (1.5 MB)

Thanks
Regards

If you use the R package readr to import the CSV file it will handle (hopefully most of) the quirks and preserve the Json strings.

These strings can you then convert to collection columns and ungroup them, that gives you the information (note with ungroup there will be one duplicate line for each original entry.)

You could also split the collection column into single columns:

kn_example_imdb_json_import.knwf (2.7 MB)