The company I work for started using knime and they are exploring the possibility of getting the server to automate many processes in the organization. I received the following request :
I need to create a flow that downloads csv containing the historical price of an economic index ( MSCI WORLD INDEX) from businessinsider.com.
I’ve retrieved that type of data from yahoo finance in the past via web query but this site does not enable me to do the same.
It seems there’s a way to do it via the “Webpage Retreiver” and “Xpath” nodes but I am not knowledgeable in HTML. I checked all the examples I could find on this forum but none of them are precise enough to compensate for my lack of knowledge in the matter.
This would be a huge amount of work. I am able to identify the table in the HTML, but the work goes beyond this.
For example, you would also need to convert the html table into a table. Also, I’m not sure that the data is actually in the html, but rather might be generated via javascript dynamically (which it probably is as you can sort dynamically, and change the date range which will change the results dynamically)
And you would face the same issue regardless if you are using Knime or any other webpage retriever application.
You need to check with them if there isn’t an API on their side that you can use to retrieve the data.
ALL data is copyrighted, and you will need a license from the data owner before you can use it.
In the case of the website that you are accessing, the data owner is very clear in their terms and conditions that you cannot download and use the data.
@Thierry_Collins , you may want to seek out a provider of the data you need from an organisation that will license it to you. There will probably be a fee.
Apologies for seeming like a killjoy, but better to be aware of intellectual property rights than in court and paying a hefty fine.
Hi @DiaAzul , there is a Download button which allows you to download the data as csv (I used it when I was looking for an alternative, and was also trying to get the link from the button).