Finance data retrieval..

mateenraj · February 9, 2021, 11:31pm

Hi,

I am trying to extract this data table

from the website: COTY Earnings Date & Report (Coty Inc) - Investing.com. I have encountered couple of issues:

The retrieved webpage is larger than 2000 rows (can be retrieved via GET or HTML retriever) and the data table is also after 2K rows thus the proper XPath was difficult but with the help of another tool, it was resolved
Couldn’t find an option to concatenate a string at the end of another string in String manipulation node, as the URL of data table is with the suffix “-earnings”
Attached workflow was able to retrieve only header of the table but not the complete data, I think XPath is not correct

Can anyone please have a look at the attached workflow and suggest what is wrong. I spent some time trying to use different XPath options but couldn’t manage to retrieve the full data table.

Best,
MateenInvesting_Experiment.knwf (272.2 KB)

julian.bunzel · February 12, 2021, 2:41pm

Hi @mateenraj,

I will have a look and get back to you as soon as possible.

Best,

Julian

qqilihq · February 13, 2021, 5:03pm

Hi,

this is a super-perfect use case for the Table Extractor from the Selenium Nodes. The following workflow will expand the table until it provides no further data through a loop, and then simply convert the HTML into a KNIME table. Add some text cleanup using Palladian’s Regex Extractor, and you’ll get a table like this:

I have just built an example workflow which looks like this:

You can download it from my public NodePit Space here:

Hope this helps!

–Philipp

mateenraj · May 26, 2021, 9:50am

Hi,

Sorry for the late reply. I tried your proposed approach but unfortunately it was not reproducible as I encountered different issues (i.e. Licensing, missing nodes etc). Please see attached screenshots.

SeleniumLicense

qqilihq · May 26, 2021, 10:38am

Looks like you were running an old version of the nodes. Make sure to update to the most recent one.

–P