How to extract embedded excel sheets from .doc files

Ma5asak · July 6, 2024, 2:13pm

I am trying to extract all the embedded images and attachments in .doc files using tika parser, everything is extracted except the embeded files mainly trying to extract excel objects from .doc files, the extracted files are empty and have an extension of .unknown.
any suggestions on how to resolve this issue?

thor_landstrom · July 15, 2024, 2:37pm

Hello @Ma5asak,

Are you able to provide your workflow and a test data file (doc)?

I tried doing this on my own with a doc that has an embedded excel table and several images as below:

Here is the results after running tikka parser on it to extract content:

I was not able to reproduce it, if you are able to post a test data that exhibits the same issue that would be great!

Thanks,
TL

system · October 13, 2024, 2:37pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.