Is it possible to have a larger array of Document Parsers, I'm finding the DML/SDML and plain ASCII parser very limited. I cannot even find a way to convert documents into DML/SDML.
Ideally it would be good to have a PDF parser, I appreciate not all PDF's are text readable, but such a parser would be great for those that are, and PDF's are increasingly becoming the common format.
Additionally, a Rich Text Format (RTF) would be good which retains more features than a flat ASCII file.
Its a shame that KNIMEs text processing facilities are so powerful but let down a little with the limited set of Document Parsers.
Some other Document Parsers that would be really handy is being able to read text out of MS Word (.doc .dox), MS Powerpoint (.ppt .pptx) and MS Excel (.xls .xlsx) files. This would be useful to be able to quicker gather key points and terms out of presentations, reports and such like.
Sorry for inconvinience with the currently provided parsers. The Textprocessing plugin is so far still a labs-project, but growing. More Parsers will come and we already thought about Pdf, Word, or RTF parsers.