What you can do is convert the strings to numbers. Using One-Hot encoding or some feature hashing or something, That will obviously produce very large (numerical) datasets that also might not be very stable (if additional strings show up), but that would be exactly the point about t-SNE that it can handle such very large numerical datasets.
That indeed is not that easy. There are some approaches using KNIME (one for H2O Isolation Forests).
I do not have much experience with these concepts and their interpretability might need some (well) interpretation in order to really be of any help.
I have a limited experience with H2O „Driverless AI“ (there is also an integration for KNIME for that …) - they have some features that help with interpretation and you can select models according to their „interpretability“. In some tests the models were quite superior - but you will have to see if you would be ready to spend that amount of money. Maybe a test installation might help to decide. Set aside some resources for deployment - when I tested it deploying the results was quite complicated. But as it is often the case KNIME might (now) help with that.
Another approach you could try is explore the area of dimension reduction in order to distill your strings (or key words/parts from strings) into factors and then use them and see which ones come up.
It might very well be that you would need to collect some manually annotated samples in order to train a model with a good target/label.