Table Reader Caches Columns breaking workflow execution

Hi,

I erratically face an issue I struggle to reproduce to help fixing it. Whilst I have a pretty good understanding and explanation to the circumstances, I still can not reproduce it in a workflow to share with you (but soon might have).

The issue
When a Table Reader initially reads a table, it seems to cache it’s columns as well as it’s column data types. Regardless if any transformation happens, the table reader, after the table being read was altered (column count, column type change), then fails with an error breaking the workflow execution.

Like so:
ERROR Table Reader 4:848:0:1018:0:1007 Execute failed: The column 'Piano Tag' can't be converted to the configured data type 'WebElement'. Change the target type or uncheck the enforce types option to map it to a compatible type.

I wonder if this is a bug or feature. I can see some advantage about thus but without the option to control this behavior, it is more a bug than a feature.

PS: This is also the primary reason why I am working on this several tickets around column type interpretation, mapping, enforcement like:

Edit
After updating all cached columns I suddenly got this totally new error. No change to column count nor type nor order was made.

ERROR Table Reader 4:848:0:1018:0:1007 Execute failed: class org.knime.core.data.columnar.table.ColumnarContainerTableLoader$SavedColumnarContainerTable cannot be cast to class org.knime.core.data.container.BufferedContainerTable (org.knime.core.data.columnar.table.ColumnarContainerTableLoader$SavedColumnarContainerTable is in unnamed module of loader org.eclipse.osgi.internal.loader.EquinoxClassLoader @54610a47; org.knime.core.data.container.BufferedContainerTable is in unnamed module of loader org.eclipse.osgi.internal.loader.EquinoxClassLoader @1c610f)

Best
Mike

I found this feature but it is only possible to disable it if “Files in Folder” option is chosen.

Fail on differing specs
If checked, the node will fail if multiple files are read via the Files in folder option and not all files have the same table structure i.e. the same columns.

Now I am constantly getting, but only upon writing to the table the second time:

ERROR Table Reader 3:848:0:1018:0:1025 Execute failed: Writing to table process threw "ClassCastException"

I managed to pin it down to a particular column of the type JSON. I can not share the real data in public as it’s customer related but strangely upon replacing:

  1. [a-zA-Z] by a
  2. [0-9] by 0

The error is gone. Even more strange. When trying the above anonymization via the String Manipulation (Multiple Column) I got this error which forced me to use the regular String Replace instead. I do not use a variable reference, though. Nor is a $ to be found in the JSON and I converted the JSON to a String Column: regexReplace(regexReplace($$CURRENTCOLUMN$$, "[a-zA-Z]", "a"), "[0-9]", "0")

Invalid settings:
Invalid type identifier for variable in line 1: n

Bildschirm­foto 2023-01-19 um 21.21.14

I am trying to identify the possible cause in the data but that JSON is … large.

Best
Mike

Adding more precision to the two connected errors ClassCastException as well as Invalid type identifier for variable’:

  1. It is caused by the data and is independent from the data type ( converted JSON to String but issue persisted)
  2. Replacing numbers by 0 did not resolve it
  3. Replacing a-z and A-Z (to share the data anonymized for reproduction) resolved it
  4. Replacing line breaks by the cached value resolved it (quite ridiculous)
  5. Replacing any character by itself resolved it as well

This is :crazy_face:

Hello @mwiegand ,
if you are reading a single file which might change please disable the “Enforce types” option in the Transformation tab. For more details on how to work with file that might change have a look at this section of the the File Handling Guide.

We are still looking into the ClassCastException problems and get back to you as soon as we know more. Sorry about the inconveniences.

Bye
Tobias

1 Like