Modify Maximum String Length Limit for JSON Parsing / Handling in KNIME

mwakileh · February 10, 2025, 3:41pm

Hi KNIME community,

I’m facing an issue after migrating a workflow to KNIME 5.32. The String to JSON node fails when processing large JSON strings, showing this error:

ERROR String to JSON 6:6577 Execute failed: String value length (20051112) exceeds the maximum allowed (20000000, from StreamReadConstraints.getMaxStringLength()) in row: Row0

From my research, this limit is likely enforced by the internal JSON parser (possibly Jackson). According to its StreamReadConstraints documentation, the default max string length is 20,000,000 characters, controlled by maxStringLength(int maxStringLen).

Relevant methods from StreamReadC* onstraints.Builder:

maxStringLength(int maxStringLen): Sets the max string length.
maxDocumentLength(long maxDocLen): Sets the max document length.
maxNestingDepth(int maxNestingDepth): Limits nested objects/arrays.

What I’ve tried

Conversion using downstream nodes:

Python script node (These can return Json columns under the limit, but return null result for string lengths over the limit)
Column Expressions Node (These can convert Strings to Json columns under the max limit, but fail with “Error while parsing” for string lengths over the limit)

Adding a line to the knime.ini file, such as:
-DStreamReadConstraints.maxStringLength=30000000
(This had no effect.)
Looking for ways to configure Jackson settings within KNIME nodes, with no success.

Questions:

Can I adjust these constraints in KNIME (e.g., via knime.ini)?
Is there another way to configure this, such as a custom node or solution?
Has anyone encountered and resolved this issue?

Any advice or suggestions would be greatly appreciated! Thanks in advance for your support.

Best regards,
M. Wakileh

mwakileh · February 10, 2025, 7:01pm

I am adding a small python script that can generate a json compatible test string that is over the max limit (or under if modified). :

import knime.scripting.io as knio  # Integration for KNIME data & image handling
import pandas as pd  # Data handling & manipulation
import json  # JSON processing

# Generate a very long string of repeated characters (e.g., 'A')
#long_string = 'A' * 19999900  # size under the 20 million characters limit
long_string = 'A' * 20000001  # size to ensure it's over 20 million characters

# Create a simple JSON object with the long string
simple_json = {
    "id": 1,
    "description": "This is a test JSON with a long string.",
    "data": long_string
}

# Convert JSON object to a JSON string
large_json_string = json.dumps(simple_json)

# Output the JSON string as a table
output_table = pd.DataFrame({"Large_JSON_String": [large_json_string]})

# Convert the Pandas DataFrame back to a KNIME table
knio.output_tables[0] = knio.Table.from_pandas(output_table)

mwakileh · February 11, 2025, 10:28am

In addition to the above Python script (which returns a string column), the following script illustrates how automatic JSON handling is lost in the Python Script node—a limitation that extends to all KNIME nodes (such as String to JSON, Column Expressions, and Python Script), though each may exhibit different failure modes.

import knime.scripting.io as knio  # Integration for KNIME data & image handling
import pandas as pd  # Data handling & manipulation

# Generate a very long string of repeated characters
#long_string = 'A' * 20000001  # size to ensure it's over 20 million characters - Json output is null
long_string = 'A' * 19999900  # size to ensure it's under 20 million characters - Json output is as expected

# Create a Python dictionary (JSON-like structure)
simple_json = {
    "id": 1,
    "description": "This is a test JSON with a long string.",
    "data": long_string
}

# Instead of converting the dictionary to a JSON string,
# pass the JSON-compatible Python dictionary directly to the DataFrame
output_table = pd.DataFrame([{"Large_JSON_Data": simple_json}])

# Convert the Pandas DataFrame back to a KNIME table
knio.output_tables[0] = knio.Table.from_pandas(output_table) #

thor_landstrom · February 26, 2025, 10:10pm

Hey @mwakileh,

Let me get in touch with one of the devs to take a look at it to see if there’s a workaround, I’ll try and get back as soon as possible.

thor_landstrom · February 28, 2025, 9:17pm

@mwakileh,

So there is no easy workaround right now as it is tied into the code itself, like you mention Jackson which is not managed by us. It was put in place mostly for abuse as can be seen Add configurable processing limits for JSON parser (`StreamReadConstraints`) · Issue #637 · FasterXML/jackson-core · GitHub

There would need to be a change to the source code itself or send a optional argument to allow for configuration of larger allotted sizes.

For now, I will keep an eye out to see what they say, but I would just chunk your input down so you avoid running into this.

TL

mwakileh · March 3, 2025, 9:17am

@thor_landstrom
Hi Thor,

Thanks for checking in! I’ve begun chunking the data where feasible, but for some applications, like the transfer to our legacy document and PDF rendering/mailing system, splitting the documents isn’t practical. In one instance, I had to reduce the resolution of images within a base64 encoded XML document format to an unreasonable level to accommodate a previously functioning workflow. I hope a better solution can be found.

Best regards,
Michael

mwakileh · April 7, 2025, 8:33am

(In practical terms, this limitation means that files of around 15 MB and over can’t be sent via json to subsystems.)