Semantic Scholar

rfeigel · April 4, 2023, 7:54pm

Several other Knimers have built workflows to extract and manipulate data from Semantic Scholar (SS.) I’ve built a straightforward workflow which searches for papers by keywords. I’ve copied some of the earlier work. Thanks! SS is an artificial intelligence–powered research tool for scientific literature developed at the Allen Institute for AI and publicly released in November 2015. It uses advances in natural language processing to provide summaries for scholarly papers (Wikipedia.) SS currently has indexed millions of papers from hundreds of journals. SS has an API which can be configured to search for papers by keywords, author names, et. al. I’ve employed the API in this workflow. SS’s search algorithm is proprietary and fairly opaque. They’ve published several blogs/papers with some information on how it works. I haven’t spent the time to try to absorb it all.

The output from SS is JSON. I’m weak using JSON Path. The particular problem I had was dealing with the number of authors and author ids. The number can vary from one to many. I converted the JSON to XML and used Xpath to parse the fields. The best I could do was group the authors and author ids separately. I’m open to anyone taking a stab at improvements.

The most troublesome challenge was finding a node(s) which produces a well formatted table. The Xpath node produces an author list like this:

Dinar Anggraeni K. Sugiyanto M. Z. Zam Harry Patria

I used regexplace to replace the multiple spaces with a new line and carriage return which produces this (keeping the /n/r):

regexReplace($authors$, “{2,}”, “\n\r”)

Dinar Anggraeni
K. Sugiyanto
M. Z. Zam
Harry Patria

I did the same for the authorid list. Passing the newly formatted data to Viewer nodes required tagging the author/authorid cells to preserve the /n/r:

string(“

”+$$CURRENTCOLUMN$$+“

”

This works with the Tile Viewer and Javascript Table Viewer, but not the Table Viewer (Labs). The latter reads the tags as strings. The Tile Viewer node is by far the best since it word wraps lengthy text like titles and abstracts. There may be a way to do it with the Table Viewers, but I can’t figure it out. I had a similar problem with writing xlsx/csv files. The best fix I could manage was to use the Continental Cell Formatter node. The formatting in the output file must be moved manually to be easily readable, but the author and authorid lists are vertical as with the Tile Viewer. I’ll probably remove the two Table Viewer nodes, but wanted to post them to see if other Knimers could make improvements to them. Hope the Knime Community finds this useful and I’m sure more experienced Knimers can make improvements.

S2 API XPATH EXCEL 1 – KNIME Community Hub

ScottF · April 12, 2023, 8:41pm

Hi @rfeigel -

Thanks for pulling this together - hopefully others searching the Hub for this type of analysis will find your workflow and use it as a basis for future work!

system · July 11, 2023, 8:41pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.