Webinar: Cheminformatics with KNIME: Data Pipelining, Machine Learning, and Interactive Web Applications - Mar 3, 2021

SinaH · March 3, 2021, 9:42am

Join Daria Goldmann (@daria.goldmann) and Alice Krebs (@Alice_Krebs) on Wednesday, March 3, 2021 at 4 PM - 5 PM UTC +1 (Berlin) which is 9 AM - 10 AM UTC -6 (Chicago).

Do you want to accelerate the R&D process through reproducible data analytics, and better manage and make use of your chemistry data?

Join Daria Goldmann and Alice Krebs (KNIME) in this webinar and learn how KNIME can facilitate collaboration with domain experts, enable you to build quick prototypes, and let you create and deploy machine learning models.

See how KNIME and its Cheminformatics Extensions are used to create an interactive web application that trains different machine learning models based on chemical fingerprints. This pipeline allows us to explore and evaluate the data as well as the resulting models. We’ll round the analysis up with a downloadable report.

Rewatch here

Q&A
Check out the questions that were asked at the webinar here. Click the arrow to unfold the answer.

Is it possible to embed a KNIME web application in a website??

Yes, it is possible. You can find more information here: KNIME WebPortal User Guide

With regards to the demo database shown in the webinar, how and with which components has that demo database been built? Just to make it clear my question is in regards to getting the raw data (web scraping?)?

**For the workflow in the demo the data was collected from ChEMBL database. There are numerous public databases where you can search for the bioactivity datasets: ChEMBL, PubChem, GPCRdb, BindingDB, etc.

I have tried analysis using machine learning workflow. I just wanna ask about ROC curve, is it possible/normal to have ROC curve value "1" indicating perfect accuracy??

If you get an AUC value of 1 from the ROC curve, I would double check what is going on. This can be an indicator of overfitting or could also be a simple mistake of mixing up the predicted outcome vs the real outcome in the settings of the ROC node.

I am wondering how performance expectations are for running certain steps. How does the performance change as the data size grows? To what level does KNIME utilize the multi-processor infrastructure? Depending on that, at what point do you recommend running KNIME not on our computer but run on a server?

In general you would think that the computing time changes proportionally to the data size. That will heavily depends how you would choose to run your workflow. In this blogpost we summarize the key tricks to run the workflows efficiently. Things like working with .table files, streaming execution, optimizing your .ini file, caching data, working in the database, and others are discussed there.

Is the ML prototyping workflow shown from the WebPortal (loading and examining the data, building and evaluating the model) specifically built for this demo, and can it be used for other use cases rather than cheminformatics?

Yes, the workflow was built for the demo, but you can use it as a template for building machine learning models for other datasets. Keep in mind that you might want to adapt some of the settings depending on the data you are working with.

Do you have experience in using this approach for polymers? I am especially interested in polycondensates?

Dealing with polymers in KNIME using SMILES is unfortunately not trivial and I am afraid kind of impossible. There are some extensions like BigSMILES and CurlySMILES, but the toolkit support for it isn’t great. None of them are actually SMILES, so they won’t work in KNIME. RDKit can read for example ChemAxon’s CXSMILES, but you can’t really do something with it.

Could you give some of the examples of workflow for beginners?

Yes, we do. For example all the workflows from the book KNIME “Beginner’s Luck” which you can find here.

Do you need to know a certain programming language to use KNIME?

No, if you don’t want to use your own script, you don’t have to use any programming language in KNIME and so you don’t need to know any programming language. However it might turn out to be useful if you do.

Have you any experience with applying ML in Life Science use cases under GxP Compliance conditions? How would that fit together in your experience?

Internal quality standards play an important role in the context of GxP Compliance and those differ, so there is no out of the box solution. However, reproducibility and automated validation always play an important role here and KNIME offers a lot for that. First of all, a workflow is saved with the data that is being processed at each individual step. Therefore, you will always be able to retrace what data went in and what came out in each step. From a validation perspective, the data that is being used to train ML models is particularly important. Moreover, the parameters of your model and other nodes as well as annotations are saved. In addition, you can validate that your workflows are doing what they are supposed to do in an automated way. For more on this check out this blog post: Enter the era of automated workflow testing and validation | KNIME. You could also build and automate a workflow that creates a report and alerts you when the results deviate from your expectation. If you have further requirements, contact us, the KNIME team is happy to help!

Can you integrate data from AWS S3 buckets?

Yes, you can, we have a connector node for that: Amazon S3 Connector – KNIME Hub

What is the outlook for KNIME compatibility with ARM M1 Mac / MacOS BigSur?

KNIME is following this closely but relies on the underlying frameworks (mostly Java/Eclipse) to support this. For both components this is either being worked on or it’s planned (details are available for Java and Eclipse). Until this is out KNIME will need to be run via Apple’s compatibility layer (Rosetta).

H_LE · March 8, 2021, 2:50pm

Hello everyone,

Thank you for this very instructive webinar.
Will a replay video be available?

Hugues

Iris · March 8, 2021, 3:27pm

Hi @H_LE

yes we uploaded it today to our Youtube Channel. You can find it here: Cheminformatics with KNIME Data Pipelining, Machine Learning, and Interactive Web Applications - YouTube

Best, Iris

H_LE · March 9, 2021, 8:18pm

Hi @Iris,
Thanks.

Best, Hugues