Python Script node. How to work with output table?

Maya · January 24, 2023, 12:02pm

Hi,
I’m trying to test some script execution in Knime, but as I’m not too familiar with python and Python Script node I don’t know how to get the json output from the console into a table for Knime.

I’m testing Deepgram’s API (you can get an API key online). For path I’m using the path to a local file (wav).

import knime.scripting.io as knio
import pandas as pd


from deepgram import Deepgram
import asyncio, json


# Your Deepgram API Key
DEEPGRAM_API_KEY = 'API KEY' #should be replaced

# Location of the file you want to transcribe. Should include filename and extension.
# Example of a local file: ../../Audio/life-moves-pretty-fast.wav
# Example of a remote file: https://static.deepgram.com/examples/interview_speech-analytics.wav
FILE = knio.flow_variables['path']

# Mimetype for the file you want to transcribe
# Include this line only if transcribing a local file
# Example: audio/wav
MIMETYPE = 'audio/wav'

async def main():

  # Initialize the Deepgram SDK
  deepgram = Deepgram(DEEPGRAM_API_KEY)

  # Check whether requested file is local or remote, and prepare source
  if FILE.startswith('http'):
    # file is remote
    # Set the source
    source = {
      'url': FILE
    }
  else:
    # file is local
    # Open the audio file
    audio = open(FILE, 'rb')

    # Set the source
    source = {
      'buffer': audio,
      'mimetype': MIMETYPE
    }

  # Send the audio to Deepgram and get the response
  response = await asyncio.create_task(
    deepgram.transcription.prerecorded(
      source,
      {
        'punctuate': True
      }
    )
  )

  # Write the response to the console
  print(json.dumps(response, indent=4))
  
import sys
try:
  # If running in a Jupyter notebook, Jupyter is already running an event loop, so run main with this line instead:
  #await main()
  asyncio.run(main())
except Exception as e:
  exception_type, exception_object, exception_traceback = sys.exc_info()
  line_number = exception_traceback.tb_lineno
  print(f'line {line_number}: {exception_type} - {e}')

The console gives an expected output in Json format, but I get the error KnimeUserError: Output table ‘0’ must be of type knime.api.Table or knime.api.BatchOutputTable, but got <class ‘py4j.java_gateway.JavaObject’>

As I understand, I need to assign knio.output_tables[0] the output I want to get to be able to work further with the response, but I don’t know how to do that correctly and everything I try from searching for this issue gives me errors.

Please advise,
Thank you.

DiaAzul · January 24, 2023, 12:26pm

Hi @Maya,

Welcome to the forum.

You will need to extract the information you want to return to KNIME from the JSON response, convert it to a Pandas dataFrame and then convert the dataFrame to a KNIME table which you will then assign to the output port variable.

I will assume that you know how to manipulate JSON in Python to get the data you require.

To create the dataFrame you will first need to create a list of dictionaries. Each dictionary in the list will be a row in the output table, dictionary keys become column headers and the values become row values in those columns. So, if we assume that responses is a list of dictionaries containing the data returned in the response (you will need to work out your own iterator over the json data).

# Records is the list of dictionaries (records)
records =  list()
for response in responses:
    record = {
        "reference": response["reference"]
        "transcribed_data": response["transcribed_data"]
    }
    records.append(record)

# Convert list of dictionaries to pandas dataframe
df = pd.DataFrame(records)

# Write dataFrame as KNIME Table to output port
knio.output_tables[0] = knio.Table.from_pandas(df)

Hope that helps.

DiaAzul
LinkedIn | Medium | GitHub

Maya · January 24, 2023, 1:46pm

No, sorry, I don’t know how to manipulate the json in python. If you could help will be much appreciated.

the response I get is this:

{
    "metadata": {
        "transaction_key": "deprecated",
        "request_id": "5360faf4-aecb-423d-ade9-18ea35a9dd6b",
        "sha256": "2d5b81411de4b5a28b908639f293a56355995dc3bbc1eab2e42d542ca8b0173a",
        "created": "2023-01-24T13:43:46.857Z",
        "duration": 19.0,
        "channels": 1,
        "models": [
            "c12089d0-0766-4ca0-9511-98fd2e443ebd"
        ],
        "model_info": {
            "c12089d0-0766-4ca0-9511-98fd2e443ebd": {
                "name": "general",
                "version": "2022-01-18.1",
                "tier": "base"
            }
        }
    },
    "results": {
        "channels": [
            {
                "alternatives": [
                    {
                        "transcript": "Yep. I said it before, and I'll say it again. Life moves pretty fast. You don't stop and look around once in a while. You could miss it.",
                        "confidence": 0.9848633,
                        "words": [
                            {
                                "word": "yep",
                                "start": 5.7,
                                "end": 5.8599997,
                                "confidence": 0.99560547,
                                "punctuated_word": "Yep."
                            },
                            {
                                "word": "i",
                                "start": 7.274654,
                                "end": 7.4344416,
                                "confidence": 0.88427734,
                                "punctuated_word": "I"
                            },
                            {
                                "word": "said",
                                "start": 7.4344416,
                                "end": 7.5542817,
                                "confidence": 0.79345703,
                                "punctuated_word": "said"
                            },
                            {
                                "word": "it",
                                "start": 7.5542817,
                                "end": 7.913803,
                                "confidence": 0.984375,
                                "punctuated_word": "it"
                            },
                            {
                                "word": "before",
                                "start": 7.913803,
                                "end": 7.9936967,
                                "confidence": 0.99902344,
                                "punctuated_word": "before,"
                            },
                            {
                                "word": "and",
                                "start": 8.07359,
                                "end": 8.113537,
                                "confidence": 0.9946289,
                                "punctuated_word": "and"
                            },
                            {
                                "word": "i'll",
                                "start": 8.193431,
                                "end": 8.393165,
                                "confidence": 0.9848633,
                                "punctuated_word": "I'll"
                            },
                            {
                                "word": "say",
                                "start": 8.393165,
                                "end": 8.473059,
                                "confidence": 0.9995117,
                                "punctuated_word": "say"
                            },
                            {
                                "word": "it",
                                "start": 8.473059,
                                "end": 8.752686,
                                "confidence": 0.9609375,
                                "punctuated_word": "it"
                            },
                            {
                                "word": "again",
                                "start": 8.752686,
                                "end": 9.252686,
                                "confidence": 0.99902344,
                                "punctuated_word": "again."
                            },
                            {
                                "word": "life",
                                "start": 10.150825,
                                "end": 10.470399,
                                "confidence": 0.9980469,
                                "punctuated_word": "Life"
                            },
                            {
                                "word": "moves",
                                "start": 10.470399,
                                "end": 10.710079,
                                "confidence": 0.9975586,
                                "punctuated_word": "moves"
                            },
                            {
                                "word": "pretty",
                                "start": 10.710079,
                                "end": 11.149494,
                                "confidence": 0.99902344,
                                "punctuated_word": "pretty"
                            },
                            {
                                "word": "fast",
                                "start": 11.149494,
                                "end": 11.509016,
                                "confidence": 1.0,
                                "punctuated_word": "fast."
                            },
                            {
                                "word": "you",
                                "start": 12.228058,
                                "end": 12.3478985,
                                "confidence": 0.99560547,
                                "punctuated_word": "You"
                            },
                            {
                                "word": "don't",
                                "start": 12.3478985,
                                "end": 12.627527,
                                "confidence": 0.99316406,
                                "punctuated_word": "don't"
                            },
                            {
                                "word": "stop",
                                "start": 12.627527,
                                "end": 12.827261,
                                "confidence": 0.99853516,
                                "punctuated_word": "stop"
                            },
                            {
                                "word": "and",
                                "start": 12.827261,
                                "end": 12.947102,
                                "confidence": 0.99560547,
                                "punctuated_word": "and"
                            },
                            {
                                "word": "look",
                                "start": 12.947102,
                                "end": 13.226728,
                                "confidence": 0.9941406,
                                "punctuated_word": "look"
                            },
                            {
                                "word": "around",
                                "start": 13.226728,
                                "end": 13.46641,
                                "confidence": 0.99853516,
                                "punctuated_word": "around"
                            },
                            {
                                "word": "once",
                                "start": 13.46641,
                                "end": 13.666143,
                                "confidence": 0.99365234,
                                "punctuated_word": "once"
                            },
                            {
                                "word": "in",
                                "start": 13.666143,
                                "end": 13.7460375,
                                "confidence": 0.93896484,
                                "punctuated_word": "in"
                            },
                            {
                                "word": "a",
                                "start": 13.7460375,
                                "end": 13.865877,
KnimeUserError: Output table '0' must be of type knime.api.Table or knime.api.BatchOutputTable, but got <class 'py4j.java_gateway.JavaObject'>
                                "confidence": 0.9135742,
                                "punctuated_word": "a"
                            },
                            {
                                "word": "while",
                                "start": 13.865877,
                                "end": 14.1854515,
                                "confidence": 0.96533203,
                                "punctuated_word": "while."
                            },
                            {
                                "word": "you",
                                "start": 14.7937765,
                                "end": 14.91324,
                                "confidence": 0.99902344,
                                "punctuated_word": "You"
                            },
                            {
                                "word": "could",
                                "start": 14.91324,
                                "end": 15.112348,
                                "confidence": 0.9892578,
                                "punctuated_word": "could"
                            },
                            {
                                "word": "miss",
                                "start": 15.112348,
                                "end": 15.271633,
                                "confidence": 0.99316406,
                                "punctuated_word": "miss"
                            },
                            {
                                "word": "it",
                                "start": 15.271633,
                                "end": 15.771633,
                                "confidence": 0.9819336,
                                "punctuated_word": "it."
                            }
                        ]
                    }
                ]
            }
        ]
    }
}

The response is of type dict.

DiaAzul · January 24, 2023, 2:21pm

@Maya

You may want to Google python reading json dict. There are more examples than we can cover on the forum.

In general, if you have a Python dictionary, or nested dictionaries, then you can access the values using the keys; if there is a list then you can access it using the index.

So if the string you gave is stored in response_json. You can convert it to a dictionary using json.load.

You can than access that dictionary using accessors. So to get the results you can use response_dict.get("results") or response_dict["results"]. The second format works if you can guarantee that the key results is in the dictionary. If it is not in the dictionary then you will get an error. The first format will return None if the key doesn’t exist. This allows you to test that the key exists before continuing.

The JSON file includes both dictionaries and lists. The channels key returns a list of elements. To access the elements in the list you will need to use the index e.g. first_channel = channels[0]. Don’t forget that Python is zero indexed, so counting starts from 0. The example below shows how to get the transcript from your JSON both as a step by step break down of the file, or as a one line statement with all the accessors applied sequentially to the original dictionary. You can chose whichever method you find works.

# Convert JSON to Python dictionary
response_dict = json.loads(response_json)

# Accessing each element individually
results = response_dict.get("results")
channels = results.get("channels")
first_channel = channels[0]
alternatives = first_channel.get("alternatives")
first_alternative = alternatives[0]
transcript = first_alternative.get("transcript")
print(transcript)

# Or as one combined set of accessors
transcript_in_one = response_dict["results"]["channels"][0]["alternatives"][0]["transcript"]
print(transcript_in_one)

DiaAzul
LinkedIn | Medium | GitHub

Maya · January 25, 2023, 2:39pm

The item response is of type dict.
Thanks to you I understood how to get to the transcript (or one particular “column”) and get it into the knime table. But I have issues trying to bring more than one column into the table.

Also do I need to parse the whole json into columns/rows to be able to work with the whole response?
My lack of knowledge of doing that results into errors…hence why was using Knime for parsing before.
Also the response might be different everytime, depending on types of audiofiles, so is there a possibility to bring the whole response into Knime table and parse it afterwards using json nodes?

DiaAzul · January 25, 2023, 3:10pm

@Maya

Thanks for the additional information. If you want to bring the response into KNIME as a JSON object then you can convert the dictionary to a JSON string and return that as a cell in the KNIME table as follows. Note, you will need to import json and pandas as pd. I’ve included a dummy dictionary to demonstrate the workflow. Note: If you have the response as a JSON you do not need to convert it to a dictionary, you could just return it in the dataframe by changing json_string to response.

import json
import knime.scripting.io as knio
import pandas as pd

response = { 
  "one": 1, 
  "two": 2, 
  "three": 3
}

json_string = json.dumps(response)

df = pd.DataFrame([{"json": json_string}])

knio.output_tables[0] = knio.Table.from_pandas(df)

Once you have exported the json string you can convert it to a json object using the string to json node. The script and nodes are in the attached workflow.
JSON.knwf (7.3 KB)

DiaAzul
LinkedIn | Medium | GitHub

Maya · January 25, 2023, 4:35pm

Amazing!
This works for me perfectly.
Thank you!

system · February 1, 2023, 4:36pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.