this has been reported back in 2020 already but no one felt motivated to pick that up:
WARN String to XML 5:2 Cell in row:"Row0" and column "HTML String" could not be parsed: The element type "img" must be terminated by the matching end-tag "</img>". Add missing value.
Cell in row:"Row1" and column "HTML String" could not be parsed: The element type "img" must be terminated by the matching end-tag "</img>". Add missing value.
As follows the test workflow:
PS: I took the liberty, in case anyone finds this post, to include a fallback using simple RegEx
RegEx
Repalcement
Comment
<(img)([^>]+>)
<$1$2</$1>
Convert self-closign tags
<use[^>]+>
Remove incompliant tags
?
The capturing group in the first row (img) can be extended by other tags like so (img|ANOTHERTAG)
I see two errors in the string values you are trying to convert to XML in your example workflow:
A valid (well formed) XML must have a closing tag [1]. So, your <img> tags must at least be closed like this: <img ... />. Then the String to XML node will take care of the closing tag and adds </img>. Remember that HTML is not necessarily a valid XML [2].
I see “xlink” namespace used but not declared [3]. xmlns:xlink="http://www.w3.org/1999/xlink"
By fixing the mentioned issues, I could convert the strings to XML without any problem.
thanks for your feedback but if you look at my workflow I, well in awareness of the closing tag format, played through the scenario of <img />. About the colons in the attribute name, this has been an a recognized issue that I reported but which so far did not get addressed:
The first Table Creator has the issue number one (missing </img> or <img .., />)
The second Table Creator has the closing tag (<img ... />) but both tables (1 & 2) still miss declaring the namespace used (“xlink”).
As I mentioned back then, colons in XML indicate namespace demarcation. So, when you have a name like “a:b”, then “a” is considered as a namespace which has to be declared.
The first two table creator nodes are for illustration purposes to show the issue. The second one in particular has, as far as I could recall, proper <img ... /> tags but the issue still persists. Maybe I made a mistake here but I thought I ensure all three img tags were coded properly.
The namespace isn’t the main focus of that workflow as I “grabbed” the code as a fragment from a whole document.
As already mentioned, in the second table, you are missing namcespace declaration.
Here is the modified version of your workflow where I just added 2 String Manipulaiton nodes to show what the problem is with the second table. In one, I have removed the namespace, and in the other, I have declared it. Both then can be converted to XML. 78954 - String to XML does not support self closing tags.knwf (106.4 KB)
That expalins the missing namespace declaration in the second table then.