Machine learning to filter relevant articles based on title and abstract


I have a naive question for you all. I want to know where to learn the following task I am trying to do. So if anyone has a suggestion and points me in the right direction, it is very much appreciated.

I search and collect hundreds of articles every week in PubMed based on keywords. However, there are many other complex factors involved in choosing what I consider relevant articles. This can be for example simply reading the title and abstract to understand the context and this could not be captured by just keywords. I usually end up discarding more than 80% of the articles collected.

So I wonder if I can implement a machine learning workflow and use the thousands of articles I hand-picked in the past as ‘relevant articles’ and train the machine learning workflow to filter the new set of articles I collected in PubMed.

Is this first of all possible? and if yes, can anyone give me a hint/guidance on how I can learn and implement such a workflow?

Thanks all in advance for your support.


You could for instance create a binary classification problem use your past collected articles as features (you would also need to collect redundant articles you throw away) and then train the algo to forcast a “1 or 0” basically “relevant” and “not relevant”
This is probably an NLP Problem and you need to convert words to numbers for the model.


Hi @Daniel_Weikert Thanks for your reply. I will check out NLP-related nodes and lectures to learn more.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.