I am handling RIS format files, generated by citation manager software as Zotero, EndNote (http://en.wikipedia.org/wiki/RIS_%28file_format%29). I attached a small sample of the RIS file I am dealing with.
I would appreciate any help in order to translate some operations into Knime. Below there are two registers that are in the same file, just separated one of another by an empty row. I need to transform this structure in a table (the simulation of the desired one is below).
From:
TY - JOUR
TI - Get up…
AU - Clegg, Stewart
AU - Carter, Chris
T2 - Management Review
TY - JOUR
TI - Bringing Space…
AU - Kornberger, Martin
AU - Clegg, Stewart
T2 - Organization-Studies
To (simulated table I need to obtain from Kime):
TY TI AU T2
JOUR Get up… Clegg, Stewart; Carter, Chris Management Review
JOUR Bringing Space… Kornberger, Martin Clegg, Stewart Organization-Studies
OBSERVATIONS:
a) I tried to use the ‘File Reader’ node, being my first step the separation of the fields having as ‘column delimiter’ the dash sign -
The problem with the dash sign ‘-‘ is that some names have the dash as well (e.g. Organization-Studies) and it cause a mess in the whole file.
b) Some fields, as AU (author) will repeat in the RIS file considering the total of authors of an article. So, in the desired Knime table, I need that the authors be inserted in a same AU column, being separated each other from a semi-colon ;
Interesting problem... I have tried to solve, but with only partial success. I guess it would be easier with a custom importer node, or maybe with a node witch gets the whole table (Jython or R). Anyway, if you prefer, you can try to fix (enhance) the attached workflow, altough it requires the KNIME Utilities from HiTS (which is not yet updated to KNIME 2.8, so each start it will show some error messages - you have been warned ;) ). Probably I have just found a bug with this input. (Well, at least a way to improve. :) )
Thank you very much for share the workflow. I am doing my way trhrough Knime learning and KNIME Utilities is something to be added to my next steps to go deeper on handling this amazing tool.
QUESTION=> You mentioned that "(R package to read ris files) might be more help". I read that there is some ways to run R inside Knime. Would it be possible to use Knime and R (inside Knime) to read RIS files, or should I try to do this direct in R, without any interface with Knime?
There are labs and community extensions for KNIME, so you can use them within KNIME. (Under Windows there are binaries of R that can be installed, not sure it supports installing packages.) I guess you can install/check R packages from within the R nodes. You should specify in the metadata of the workflow which extensions, R version are required if you plan to share that. (Sometimes I wish I had shared these infos with myself for my workflows.)
The KNIME Utilities are specific for certain tasks I needed, so beyond those, it might be less interesting for others, it should have some updates, although not sure I'll perform those in the near future, so it might worth seeking for alternatives. ;)
Cheers, gabor
PS.: Have fun and success working on your PhD dissertation! :)
I could achive the transformation from RIS to BibTeX you suggested before (http://www.inside-r.org/packages/cran/ris/docs/read.ris).
Actually, I am using Zotero as citation manager and I can export as RIS, but also as BibTeX. So, I did this through Zotero and R wasn't necessary.
The structure of BibTeX file (attached) solved the problem that exists in RIS file related to authors, since there is now just one field called 'author'.
So, I have now the need to transform BibTeX file in a table (same thing I was looking for how to do based on RIS file). Is Knime able to do such a transformation? Which nodes should I use?
OBSERVATION: The BibTeX field names can vary in different registers of the same file. For instance, in the attached example the 'copyright' field exist in one register but not in the other. It is because in the source of one of the registers (Zotero) the field is empty and so it isn't exported to the BibTeX output format. There is a kind of rule if the field doesn't exist in one of the registers, the field content should be treated as empty (but a column 'copyright' should exist always, once it appeared at least in one of the registers).
I would use regular expressions or a similar way to extract the information you need from BibTeX files, but probably someone else created a node to handle BibTeX files. (Maybe the textprocessing nodes?) If you are familiar with Jython, or groovy (and/or KNIME development), imho the RIS is better to parse, but if your input is BibTeX, it might be better to parse those.