Power Point parser?


Is there a way to parse Power Point document?  I could not find the equivalent of the Word and PDF parsers.




PowerPoint 2010 or higher version files (PPTX) are just a compressed folder structure with assets (such as images, videos, etc.) and XML files for properties.Check this link.

Since KNIME can unzip files, traverse folders and load/manipulate XML files, I would assume it is possible to create a PPTX parser out of commonly available KNIME nodes.

As an alternative, you could use a Java Snippet node with the docx4j library here.

Not a full solution, but it might get you on the right track.



Would saving the powerpoint(s) as a pdf or be another option ?

In Office 2016 I see an option to save as "Strict Open XML Presentation (*.pptx) which might be another avenue to consider or use the "Create Handouts" to make word documents including the slides and notes or blank lines then use the Word parser.

I'm very new to KNIME but hopefully those suggestions might work for you

PDF is an ungrateful solution compared to an already available XML file structure. In a non-KNIME fashion: simply rename the file ending from pptx to zip, then unzip and look for the file which includes the textual content. I don't think that one even needs to rename the file ending.

I would like to convert a powerpoint presentation into a series of images. One per slide specifically so they can be uploaded as an image gallery to a blog. Does anyone know of any libraries that can convert a .ppt into images. Any language is fine as long as it can run on a *nix server, so no C# or .Net dependent libraries.

A bit off-topic, but you can use the Apache POI library (Java based). There are also ready-made command line tools for that which are based on the same library. Have a look here:




with the next major release 3.3 coming in December we will release new parser nodes that are based on the Tika parsing lib. With these nodes it will be possible to parse Powerpoint as well.

Cheers, Kilian