String to XML: Self closing img-Tag not supported - with FALLBACK!

Hi,

this has been reported back in 2020 already but no one felt motivated to pick that up:

WARN  String to XML        5:2        Cell in row:"Row0" and column "HTML String" could not be parsed: The element type "img" must be terminated by the matching end-tag "</img>". Add missing value.
Cell in row:"Row1" and column "HTML String" could not be parsed: The element type "img" must be terminated by the matching end-tag "</img>". Add missing value.

As follows the test workflow:

PS: I took the liberty, in case anyone finds this post, to include a fallback using simple RegEx

RegEx Repalcement Comment
<(img)([^>]+>) <$1$2</$1> Convert self-closign tags
<use[^>]+> Remove incompliant tags


?

The capturing group in the first row (img) can be extended by other tags like so (img|ANOTHERTAG)

Best
Mike

Hi @mwiegand,

I see two errors in the string values you are trying to convert to XML in your example workflow:

  1. A valid (well formed) XML must have a closing tag [1]. So, your <img> tags must at least be closed like this: <img ... />. Then the String to XML node will take care of the closing tag and adds </img>. Remember that HTML is not necessarily a valid XML [2].

  2. I see “xlink” namespace used but not declared [3]. xmlns:xlink="http://www.w3.org/1999/xlink"

By fixing the mentioned issues, I could convert the strings to XML without any problem.

[1] XML Validator
[2] html - Is HTML5 valid XML? - Stack Overflow
[3] XML, XLink and XPointer

1 Like

Hi @armingrudd,

thanks for your feedback but if you look at my workflow I, well in awareness of the closing tag format, played through the scenario of <img />. About the colons in the attribute name, this has been an a recognized issue that I reported but which so far did not get addressed:

Best
Mike

I checked your workflow throughly.

The first Table Creator has the issue number one (missing </img> or <img .., />)

The second Table Creator has the closing tag (<img ... />) but both tables (1 & 2) still miss declaring the namespace used (“xlink”).

As I mentioned back then, colons in XML indicate namespace demarcation. So, when you have a name like “a:b”, then “a” is considered as a namespace which has to be declared.

The first two table creator nodes are for illustration purposes to show the issue. The second one in particular has, as far as I could recall, proper <img ... /> tags but the issue still persists. Maybe I made a mistake here but I thought I ensure all three img tags were coded properly.

The namespace isn’t the main focus of that workflow as I “grabbed” the code as a fragment from a whole document.

As already mentioned, in the second table, you are missing namcespace declaration.
Here is the modified version of your workflow where I just added 2 String Manipulaiton nodes to show what the problem is with the second table. In one, I have removed the namespace, and in the other, I have declared it. Both then can be converted to XML.
78954 - String to XML does not support self closing tags.knwf (106.4 KB)

That expalins the missing namespace declaration in the second table then.