Our company has deployed Azure Information Protection. Essentially all documents that are only for internal consumption have to be classified as so upon save. Unfortunately KNIME Excel Reader is unable to read such files. The only solution I have found so far is to save them as ‘External’ one by one. But this not just risks breaching information management policies, becomes very burdensome quickly.
Has anyone faced the same issue and potentially found a way of dealing with it?
Currently, Excel Reader throws the error "no such entry: “EncryptionInfo”, had: [EncryptedPackage, DataSpaces]
Hey @pzkor, my org just implemented this as well. The way that we addressed it for the time being is by just changing the Sensitivity level via the Excel dropdown (assuming you own the document).
Once the level has been changed (we changed it to Public) you should be able to write with KNIME. This seems a more complex issue with Microsoft’s Azure Information Protection but in the meantime this could be a quick fix.
Thanks for the comment @TardisPilot. This isn’t a solution to us, unfortunately, because the company-wide rationale of implementing an information security framework is still valid. And as the number of classified documents is currently rapidly growing in the document storage, it is not a even feasible solution to manually reclassify each document we want to process with KNIME.
I believe the best solution is to enable Microsoft Authentication in the relevant nodes (including when reading local files) so that we can both maintain information security, but also enable NLP by KNIME.
We have also implemented AIP and it makes life more difficult. One possible way forward is to make use of the xlwings python library (or similar). This calls excel instead of reading the file directly and so is able to read data from protected files and so could be used to get data in to knime. I’m not sure how well the Authentication credentials will pass in to a python node to allow the file to be accessed on sharepoint however.
tried to look into this and did not found a simple solution The Java SDK from Microsoft is still in preview and requires a deep integration if you want to read the files without reclassify them. I created an internal ticket on this, but right now there is no plan to add this functionality.
You might do the reclassify step in the explorer or try to automate this using the unified client. Another option might be using a database or something other than excel to exchange the data using a different protection layer than AIP.