I am retrieving a URL with the Palladian HttpRetriever which is returning a jsonp that looks like this (truncated):
jsonp121("<div class=\"skin-box-bd\"> ...
You can see inside the jsonp is HTML that gets munged back into the parent HTML. Note that the whole thing inside the brackets is treated as a string with the quotes and ampersands having been escaped. It's this retrieved jsonp that contains all of the juicy data that I need to parse - the parent HTML is just a shell.
What I need to do is strip off the "json121" string headers, un-escape the quotes and ampersands, add back html-head-body tags, and then parse the remaining HTTP.
In other words, what I would like to do is this:
HttpRetriever (output HttpResultCell) --> JavaSnippet (output String) --> HttpParser (output XML)
Unfortunately if HttpParser is passed a String it assumes this string is the path to a local file - so this workflow won't work. If I first save the output string from the JavaSnippet as a file then HttpParser works just fine, but this is a very clunky way of doing it.
I think the solution is to:
1. Have an intermediate node that converts a String back to an HttpResultCell, or
2. Have an HttpParser that will directly parse a string
A super-simple fix might be to add a selector to the HttpParser that tells the node to treat the string as a file or as an HTTP Result. But perhaps you have a better idea?