I am a relatively recent user of KNIME and need help with the attached workflow. I am also not an IT/Software professional.
I am trying to extract publication metadata from DBLP. Metadata is available for individual authors and as a test case, I am attempting to extract information for the following author: Albert Zomaya
Being new to KNIME, I have followed a workflow that is available on the hub that I have downloaded.
The RSS feed data that is in the description column needs to be cleaned and I have done it using the cell splitter but unable to scale it up across the full dataset.
How do I clean up all the āgarbageā in the Description column of the RSS output? Is the column filter and cell splitter the right tools to use?
Thereās an XPath tool which may make this process easier, Iām not an expert but I know some of the respondents on here have a higher xml knowledge so maybe able to parse this for you.
Thanks for your response. I have tried the XPath tool and didnāt get any success. In fact I cannot even configure it and get the message: āThe dialogue cannot be opened for the following reason: No column spec compatible to XMLValueā.
The thing is that I get a similar error in Alteryx (which I am more familiar with) so I am beginning to wonder if the data structure (for both the RSS feed and XML file) is flawed? Since I am not a computer guy, I canāt say if this is the case.
I can get some time one morning this week to give you a hand with this, I have made some progress for you. Is there something specific you need from the data?
I spotted that there are different keys within the file relating to -
article key
book key
incollection key
inproceedings key
proceedings key
these all contain a slightly different xml structure (data). Itās a pain for a novice like me but Iām learning a great deal so thank you!
If you can narrow down what you need (specific data headers) itāll be easier to try and help you out with the specific instead of trying to ingest it all. I donāt know anything about this data which makes it toughā¦
Hey Matt,
Good to be able to talk to someone who has also used Altleryx (and so can feel some of the pain I am feeling )
This is an extract of research publishing data and the different keys that you mention are the different āplatformsā (if you will). For instance article key relates to all research publications in journals. Likewise book relates to published books and proceedings relate to publications presented at various conferences.
For a start if I could extract Journal publications that would be great. One thing I have done (and I am building a parallel workflow in Alteryx) is to download the RSS feed and used the XML parser to read the file. Doing this, I am able to filter out the different publication types using a filter (I get a GUID field in Alteryx which I cannot see in KNIME).
I am glad that this is helping you learn as am I. I have a meeting with a colleague at work who knows a bit more about XML than I do and hope to find out more. I will let you know if I get ahead or find something that would help with this workflow.
I am also trying to replicate all of my Alteryx workflows into KNIME so that will be a challenge in itself.
Thanks again for your help, it is greatly appreciated.