Here’s a workflow which may help get you started. My Xpath skills are pretty limited. I’ve been unable to get the values to parse correctly. Maybe someone with better Xpath knowledge can help. The xml seems to be pretty poorly formed. Simple Web Scraper.knwf (112.8 KB)
Hi @rfeigel , the Date output from this workflow differs from the typical YYYY-MM-DD format. I tried to use the String to Date&Time node with the below options selected:
Hi @tone_n_tune, the String to Date and Time node requires that you specify the input format of the data as it appears in the string to be converted. (Dates themselves have no “format” as such, but when displayed in KNIME, are rendered in the yyyy-MM-dd format that you mentioned.)
In your screenshot, KNIME is telling us that the first cell contains a string “Jul 18, 2024”. Looking back on the earlier response from @rfeigel, this corresponds to the dates shown in the string “Date” column.
Therefore, the format that you need to specify is MMM d, yyyy
A single d is used here, as this allows for single and two-digit day, whereas dd would require all day values to be two digits, eg 01,02…10…
This will then tell KNIME how to interpret the data, and it should then be able to convert it to a date.
If you want the date to be output in a specific format, other than the default, you would need to convert it back to a string using Date&Time to String, specifying the output format that you require.
If this resolves the new question, please leave @rfeigel’s response marked as the solution as that appears to solve the actual question posed on this thread, and only one post may be marked as the solution.
Ideally once a question has been resolved, if you have a new question which is not specifically part of the original question (e.g. in this case the original question was only concerned with how to pull data without an API, and didn’t specify anything about requiring a particular date format) it is better generally to open a new specific question.
This is for several reasons:
People don’t always go looking at “solved” questions and so you may not get an answer so quickly
There may be people who could quickly answer the new question (date conversions) who would not know anything about “how to pull data from a website that does not have an API” and so you reduce the potential responders.
Somebody else in future with a question about date formatting probably won’t be able to find this answer so easily.