Hi, I'm brand new to Knime and struggling with some of the basics. I'm trying to read in a bunch of separate xml files and combine them in a data table, separating text fields (title, summary and body text) from surrounding tags, so that I can do some keyword analysis.
My (incomplete) workflow is along the lines of: List Files>Iterate List of Files metanode (using XML Reader)>Xpath>Strings To Document>Sentence Extractor>BoW creator
The metanode outputs all my files in a two-column table (Row ID/XML). But how can I then separate the various elements of the XML column (title etc) into table columns? XPath only seems to let you specify one element to pull out.
Also the main text I want to analyse is interspersed with tags which I need to strip out to leave just the plain language. Any tips for doing this? Much appreciated!