Power Point parser?

Claire · June 9, 2016, 1:54pm

Hi,

Is there a way to parse Power Point document? I could not find the equivalent of the Word and PDF parsers.

Thanks,

Regards,

Claire

marco_ghislanzoni · June 10, 2016, 2:07pm

PowerPoint 2010 or higher version files (PPTX) are just a compressed folder structure with assets (such as images, videos, etc.) and XML files for properties.Check this link.

Since KNIME can unzip files, traverse folders and load/manipulate XML files, I would assume it is possible to create a PPTX parser out of commonly available KNIME nodes.

As an alternative, you could use a Java Snippet node with the docx4j library here.

Not a full solution, but it might get you on the right track.

Cheers,
Marco.

mobcdi · June 23, 2016, 10:09am

Would saving the powerpoint(s) as a pdf or be another option ?

In Office 2016 I see an option to save as "Strict Open XML Presentation (*.pptx) which might be another avenue to consider or use the "Create Handouts" to make word documents including the slides and notes or blank lines then use the Word parser.

I'm very new to KNIME but hopefully those suggestions might work for you

Geo · June 23, 2016, 7:07pm

PDF is an ungrateful solution compared to an already available XML file structure. In a non-KNIME fashion: simply rename the file ending from pptx to zip, then unzip and look for the file which includes the textual content. I don't think that one even needs to rename the file ending.

Adamawaisu · August 2, 2016, 8:23am

I would like to convert a powerpoint presentation into a series of images. One per slide specifically so they can be uploaded as an image gallery to a blog. Does anyone know of any libraries that can convert a .ppt into images. Any language is fine as long as it can run on a *nix server, so no C# or .Net dependent libraries.

marco_ghislanzoni · August 2, 2016, 9:47am

A bit off-topic, but you can use the Apache POI library (Java based). There are also ready-made command line tools for that which are based on the same library. Have a look here:

https://poi.apache.org/slideshow/xslf-cookbook.html#PPTX2PNG

Cheers,
Marco.

kilian.thiel · August 17, 2016, 6:06pm

Hi,

with the next major release 3.3 coming in December we will release new parser nodes that are based on the Tika parsing lib. With these nodes it will be possible to parse Powerpoint as well.

Cheers, Kilian

system · June 2, 2023, 9:48pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

thor_landstrom · January 6, 2025, 8:23pm

Reviving an old topic as I have came across it and figured if anyone had a similar problem, they could use one of the components I made:

The component uses python on a side note.

TL