How to specify relative- and wildcard-paths in JSON Transformer node?

ekadagami · April 19, 2024, 3:35pm

The following example JSON structure is given:

{
  "membership" : [ {
    "number" : "1",
    "date" : "2023-12-01"
  } ],
  "addresses" : [ {
    "id" : "123",
    "company_name" : "xyz",
    "salutation" : "mr"
  },{
    "id" : "456",
    "company_name" : "dfg",
    "salutation" : "ms"
  } ]
}

Using a JSON Transformer Node, I would like to delete all (!) possible fields with “salutation”. The number of list fields is not fixed, so it is possible that “addresses” contains none, one or more elements.

This expression works, but only if addresses contains exactly one element.

[
{ "op": "remove", "path": "/membership" },
{ "op": "remove", "path": "/addresses/0/salutation" }
]

How do I formulate the expression correctly? In any case, this one doesn’t work:

[
{ "op": "remove", "path": "/membership" },
{ "op": "remove", "path": "/addresses/*/salutation" }
]

If I interpret the RFC document correctly, then there are no wildcards and I can only specify absolute paths?

RFC 6902 - JavaScript Object Notation (JSON) Patch (ietf.org)

Should this be the case: How do I reach my goal? Normally I would specify a JSON path as follows, but unfortunately this does not work here…

$.addresses[*].salutation

takbb · April 19, 2024, 6:46pm

Hi @ekadagami , in truth I also cannot see a mechanism for using wildcards or regex within the JSON Patch syntax. Hopefully if such a syntax exists, somebody will chime in with it!

In the meantime, my only suggestion would be to transform the set of paths into a JSON Transformer patch script. That of course is perhaps easier said than done, so here is a (draft) component to try to do just that:

Here is a demo workflow using this component with your example json data:

ekadagami · April 20, 2024, 7:53am

That is an interesting approach. Since it had to be done quickly yesterday, I wrote a small Python script with which I can easily edit the nested JSON nodes. As always, my last resort.

import knime.scripting.io as knio
import json
import re
from pandas import DataFrame

# Convert the input ArrowSourceTable to a Pandas DataFrame
df: DataFrame = knio.input_tables[0].to_pandas()

def process_json(json_list):
    new_json_list = []
    # Regex pattern to clean phone numbers
    phone_pattern = re.compile(r'(?<!^\+)[^0-9]+')
    
    for item in json_list:
        # Check if item is in string format and convert to dictionary
        if isinstance(item, str):
            item = json.loads(item)
        
        # Remove keys
        keys_to_remove = ['salutation', 'title', 'id']
        for key in keys_to_remove:
            item.pop(key, None)
        
        # Rename 
        keys_to_rename = {
            'company_name': 'company',
            'is_main_address': 'default',
            'zip_code': 'zip',
            'phone_number': 'phone'
        }
        for old_key, new_key in keys_to_rename.items():
            if old_key in item:
                item[new_key] = item.pop(old_key)
        
        # Clean the 'phone' entry if it exists
        if 'phone' in item and item['phone'] is not None:
            item['phone'] = phone_pattern.sub('', item['phone'])
        
        # Append the modified dictionary to the new list
        new_json_list.append(item)
    
    # Serialize the list of dictionaries back to a string
    return json.dumps(new_json_list)

# Check if the 'addresses' column exists and apply the function
if 'addresses' in df.columns:
    df['addresses'] = df['addresses'].apply(process_json)

# Output the modified DataFrame
knio.output_tables[0] = knio.Table.from_pandas(df)

I noticed that Python scripts in KNIME cannot return complex data structures, so I had to serialize the list of dictionaries back to a string and use a string2JSON node afterwards. But that’s another story.

system · July 19, 2024, 7:53am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.