Dear all,
Hope you can help me with the following.
When I use the HttpRetriever to request information from an API server, I sometimes receive some sort of "empty" XML tag that represents both the opening and closing XML tag. Here is an example <prism:pageRange /> (see point 1 below).
It seems that the HTML parser notes in KNIME are "normalizing" these type of "empty" XML tags, however it seems that this is not always correctly done if I use the current HtmlParser. It somehow thinks it's now the parent of the next tag (see point 2 below). The old deprecated NekoHtlmParser seems to have no problems "normalizing" these "empty" XML tags corretly (see point 3 below).
How come the HtmlParser node is causing this problem and how can I best solve this? Should I simply use the NekoHtlmParser instead?
Many thanks in advance,
Ruben
1. Retrieved result via Web Browser (Chrome):
<entry>
<prism:url>***</prism:url>
<dc:title>***</dc:title>
<prism:pageRange />
<prism:doi>***</prism:doi>
</entry>
2. Parsed result via HtmlParser (Palladian for KNIME 1.6.100.v201607071900)
<entry ...>
<prismU00003Aurl>***</prismU00003Aurl>
<dcU00003Atitle>***</dcU00003Atitle>
<prismU00003Apagerange>
<prismU00003Adoi>***</prismU00003Adoi>
</prismU00003Apagerange>
</entry>
3. Parsed result via NekoHtmlParser
<entry ...>
<prism:url>***</prism:url>
<dc:title>***</dc:title>
<prism:pagerange>
</prism:pagerange>
<prism:doi>***</prism:doi>
</entry>