So I have a few 100 files with publication records for academic papers, things like Author name, phone, address , etc. One of the fields is Keywords: (another one with same problem is Abstract:) which in some files the entry is in a one line string in some other files entry is multiple lines. I would like to have one row for each field and entity. any ideas ?
In attached file I put an example from each case.
How do you retrieve the data? One easy way would be to use the Document Grabber node, which you can use to get data for academic papers from PubMed. With a subsequent Document Data Extractor node you can extract fields like the author, title, abstract etc. all in one row for each field and entity.
Hope that helps,
the problem is that these records are not from PubMed and I am not sure the data that I am looking for will be indexed by PubMed