Help with RSS parsing

Hello KNIME users,

I am a relatively recent user of KNIME and need help with the attached workflow. I am also not an IT/Software professional.

I am trying to extract publication metadata from DBLP. Metadata is available for individual authors and as a test case, I am attempting to extract information for the following author:
Albert Zomaya
Being new to KNIME, I have followed a workflow that is available on the hub that I have downloaded.

My workflow is also attachedKNIME_project_DBLP.knwf (14.1 KB)

The RSS feed data that is in the description column needs to be cleaned and I have done it using the cell splitter but unable to scale it up across the full dataset.

How do I clean up all the ā€˜garbageā€™ in the Description column of the RSS output? Is the column filter and cell splitter the right tools to use?

Any help will be greatly appreciated.


Hi @RPS,

Iā€™ve not had a chance to look at this properly but I noticed you can return the data as xml too.

Albert Y Zomaya

Thereā€™s an XPath tool which may make this process easier, Iā€™m not an expert but I know some of the respondents on here have a higher xml knowledge so maybe able to parse this for you.

Hi @Matt_D,

Thanks for your response. I have tried the XPath tool and didnā€™t get any success. In fact I cannot even configure it and get the message: ā€œThe dialogue cannot be opened for the following reason: No column spec compatible to XMLValueā€.

The thing is that I get a similar error in Alteryx (which I am more familiar with) so I am beginning to wonder if the data structure (for both the RSS feed and XML file) is flawed? Since I am not a computer guy, I canā€™t say if this is the case.


Hi @RPS,

I used to use Alteryx, loved that software.

I can get some time one morning this week to give you a hand with this, I have made some progress for you. Is there something specific you need from the data?

I spotted that there are different keys within the file relating to -

article key
book key
incollection key
inproceedings key
proceedings key

these all contain a slightly different xml structure (data). Itā€™s a pain for a novice like me but Iā€™m learning a great deal so thank you!

If you can narrow down what you need (specific data headers) itā€™ll be easier to try and help you out with the specific instead of trying to ingest it all. I donā€™t know anything about this data which makes it toughā€¦ :slightly_smiling_face:

Let me know, Iā€™m sure we can solve this.



Hey Matt,
Good to be able to talk to someone who has also used Altleryx (and so can feel some of the pain I am feeling :grin: )

This is an extract of research publishing data and the different keys that you mention are the different ā€˜platformsā€™ (if you will). For instance article key relates to all research publications in journals. Likewise book relates to published books and proceedings relate to publications presented at various conferences.

For a start if I could extract Journal publications that would be great. One thing I have done (and I am building a parallel workflow in Alteryx) is to download the RSS feed and used the XML parser to read the file. Doing this, I am able to filter out the different publication types using a filter (I get a GUID field in Alteryx which I cannot see in KNIME).

I am glad that this is helping you learn as am I. I have a meeting with a colleague at work who knows a bit more about XML than I do and hope to find out more. I will let you know if I get ahead or find something that would help with this workflow.

I am also trying to replicate all of my Alteryx workflows into KNIME so that will be a challenge in itself.

Thanks again for your help, it is greatly appreciated.


Hello there!

Just to add that for easier transition there is From Alteryx to KNIME free guide/book.

@RPS did you have some progress on your workflow?

@Matt_D welcome to KNIME Community!


@RPS hello!

I had an unexpected absence so wasnā€™t able to work on this for you, did you find a solution?

Happy to continue to help if you need me too.



Hi @RPS,

Iā€™m also coming from Alteryxā€¦ hope this helps you:

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.