Working with multiple xml files

Hi, I'm brand new to Knime and struggling with some of the basics. I'm trying to read in a bunch of separate xml files and combine them in a data table, separating text fields (title, summary and body text) from surrounding tags, so that I can do some keyword analysis.

My (incomplete) workflow is along the lines of: List Files>Iterate List of Files metanode (using XML Reader)>Xpath>Strings To Document>Sentence Extractor>BoW creator

The metanode outputs all my files in a two-column table (Row ID/XML). But how can I then separate the various elements of the XML column (title etc) into table columns? XPath only seems to let you specify one element to pull out.

Also the main text I want to analyse is interspersed with tags which I need to strip out to leave just the plain language. Any tips for doing this? Much appreciated!


For extracting several elements just use multiple XPath nodes in a row. In the XPath node you can also specify the output type. If you set it to string and use an appropriate XPath expression you get a column without XML tags. If you want to strip the tags from a larger block, it's more difficult. You can probably use an XSLT that strips all tags and only leaves the text.

Thanks for the tip Thor. I got that to work so I now have an output table with the various attributes I need separated out. I haven't got anywhere with stripping the tags out (tried xslt but didn't seem to work with my data), but I'm ignoring that attribute for now so that I can make progress elsewhere. Much appreciated!