I am trying to extract all the embedded images and attachments in .doc files using tika parser, everything is extracted except the embeded files mainly trying to extract excel objects from .doc files, the extracted files are empty and have an extension of .unknown.
any suggestions on how to resolve this issue?
Hello @Ma5asak,
Are you able to provide your workflow and a test data file (doc)?
I tried doing this on my own with a doc that has an embedded excel table and several images as below:
Here is the results after running tikka parser on it to extract content:
I was not able to reproduce it, if you are able to post a test data that exhibits the same issue that would be great!
Thanks,
TL
2 Likes
This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.