File Reader - Problem with loading the Corona datasets

d4t4v1z · March 21, 2020, 2:28pm

Hi there,

I never used data from the web before, so this is new to me. I´m trying to load
the Novel Coronavirus (COVID-19) Cases Data Sets
located at
https://data.humdata.org/dataset/novel-coronavirus-2019-ncov-cases

The download liks for the three time series look like this:
https://data.humdata.org/hxlproxy/api/data-preview.csv?url=https%3A%2F%2Fraw.githubusercontent.com%2FCSSEGISandData%2FCOVID-19%2Fmaster%2Fcsse_covid_19_data%2Fcsse_covid_19_time_series%2Ftime_series_19-covid-Confirmed.csv&filename=time_series_2019-ncov-Confirmed.csv

I´m trying to use the File Reader Node to load thos series into KNIME, but have no clue how to transfer the link into something, the file reader can handle.

After 2 hours of reading Forum topics and surfing the web I hope someone of You gusy
has a hint for me.

Best regards
Mat

Iris · March 21, 2020, 4:05pm

Hi @d4t4v1z

I would read the data directly from the John Hopkins Github: https://github.com/CSSEGISandData/COVID-19

The file you linked would than be: https://github.com/CSSEGISandData/COVID-19/raw/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Confirmed.csv

And you can chose insert this link into the File Reader

We are providing a workflow which parses this file already on the hub. You can find it here:

It actually parses an REST API which uses this data as well.

Let me know if I can help you more

d4t4v1z · March 21, 2020, 5:01pm

Hi Iris,

thanks for helping out, will try out both worlkflow options!

Mat

knimediger · March 21, 2020, 5:45pm

First of all: Thank you for that workflow. Great visualisation of the COVID-19 pandemic.

One remark to this workflow: I have seen that to some reason the last day is kicked of from the data in the node “COVID-19 overview” (done by the row filter documented as “remove last day”). So as of today yesterdays data is not visualised. Any clue why this is done?

Furthermore to the visualisation I would like to compare countries with regards to the growth rate. So I would like to align the data to (typically diferent )starting date of these countries. Would it be possible to add that feature?

Iris · March 21, 2020, 6:35pm

All fame goes to @paolotamag he made this
We need to ask him.

paolotamag · March 23, 2020, 8:42am

Hi @knimediger,

First Question: …So as of today yesterdays data is not visualized. Any clue why this is done?

This API is updating every hour by checking for new rows in yet another source: a GitHub repository maintained by Johns Hopkins University. It often happens that the last day in the dataset is missing new cases / deaths / recovered by some countries while it does not from some other. When this is the case I think the visualization of the last day is deceiving as it shows only partial data and we preferred to take it out. If you want to display also the most recent day feel free to remove the row filter.

Second Question: …So I would like to align the data to (typically different) starting date of these countries. Would it be possible to add that feature?

We are already doing this in the last line plot in the last view. Check out this twitter post to see how it looks like:

We use as the start date the first day with at least 20 cases, but if you want to change that find the row filter in the last component on top of that Line Plot node “line shifted” and change “20” to “1”.

We wrote a nice article on Towards Data Science magazine. You can find it here:

https://towardsdatascience.com/following-the-spread-of-coronavirus-23626940c125

All the best,
Paolo

knimediger · March 23, 2020, 6:17pm

@paolotamag Thank you for your fast support to add this feature to the visualisation.
Now it is much easier to see the pandemic bahaviour in the different countries.

But getting requests resolved creates new ideas: What about a normalisation of the cases to the country size (population)? If I think about China (1.4 Billion people) vs. Italy (60 Mio people). That should make a difference in the number of total cases, but unfortuntely it does not.

paolotamag · March 23, 2020, 6:43pm

Hi @knimediger, I did not add anything, it was already there
Regarding normalization on country population I do agree it would make things more proportionate.
Feel free to:

Download the Workflow
Download a table from the internet with population of each country (I do not think it’s provided by the sources I have been using but you can easily find a csv via Google)
Blend this new source with the data rows in the workflow using a Joiner node on Country right before the first Component
Divide each double column by the value of the new column “Population” using this Math Formula (Multi Column)
Make sure the new table header is unchanged
Visualize in the line plot the new normalized data by simply rexecuting the components
Reshare the enhanced workflow (mentioning Nomalization in the title) on your KNIME Hub space and give us the link here!

That would be super cool!
Cheers
Paolo

knimediger · March 23, 2020, 8:44pm

Hello @paolotamag, thanks for the invite to contribute to this great project.

I did some research and found population data on the UN web sites data.un.org/Handlers/DownloadHandler.ashx?DataFilter=variableID:12;timeID:84;varID:3&DataMartId=PopDiv&Format=csv&c=2,4,6,7&s=_crEngNameOrderBy:asc,_timeEngNameOrderBy:desc,_varEngNameOrderBy:asc

But I’ve faced some challenges.

My first problem is that I was not able to understand the license which this data is based on. So I’m not sure whether it’s allowd to use this data for this purpose.

Nevertheless I tried to follow your instructions. Thank you very much for guiding a novice.
But due to the data I’m struggeling already in the first step of joining the two tables.
The UN data is using country names which do not appear in the same way in your ISO table (e.g. the UN table uses just “Afghanistan” instead of “Afghanistan, Islamic Republic of”). I’m sure there a quite easy way to manage this issue. But I’m already at my wit’s end.

paolotamag · March 24, 2020, 11:13am

Regarding the license problem of this dataset with country population take a look here.

" Terms of Use: All data and metadata provided on UNdata’s website are available free of charge and may be copied freely, duplicated and further distributed provided that UNdata is cited as the reference."

Just add the link of UNdata in the workflow metadata using the description panel from the KNIME Analytics Platform. To learn how to do that go here and scroll to “Workflow Metadata Editor”. This way you reference them in the Hub page and you are on the right side.

Regarding the joining operation… I had the same issue to find the continent names for each country. Country names can differ quite a bit. Do not use country names then, use their codes made of 2 letters! “IT” stands for “Italy”! Join on such code columns and find another population by country table with such codes if yours does not have any!

Thanks for contributing

Cheers
Paolo

Ellert_van_Koperen · March 26, 2020, 11:48am

The UN uses 3 digit numerical country codes, named M49.
These can be converted into readable data using a table that can be downloaded here:
https://unstats.un.org/unsd/methodology/m49/overview/
in that table are also the ISO-alpha-3 codes, but NOT the alpha-2 codes that most people think about as country-codes due to their use in domain extensions.

Edit: I see now that a similar table is already used in the workflow, this one stemming from datahub.io.

Johnny_Gel · March 29, 2020, 1:01pm

Hello,

Have you updated the workflow with the normalization?

Johnny

paolotamag · April 2, 2020, 9:52am

Hi @Johnny_Gel, I will write on this thread when I will

https://forum.knime.com/t/covid-19-live-visualization-using-guided-analytics

paolotamag · April 2, 2020, 5:46pm

Don’t forget to join us for the Webinar on this workflow!

Register here:

https://www.knime.com/about/events/visualizing-the-spread-ofcovid-19-pandemic-with-knime-online-apr-7-2020

paolotamag · April 3, 2020, 5:42pm

hey guys, I added normalization to the workflow and much more.

More infos in this thread!

https://forum.knime.com/t/covid-19-live-visualization-using-guided-analytics

system · October 3, 2020, 5:42am

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.