I would like to assign topics to unstructured text submitted in the form of Software Support Tickets. The overall goal is to determine what each ticket is "about."
We receive about 3,500 tickets each month. The field we want to analyze contains an average of 50 characters in unstructured text. Because the data is unstructured, customers describe similar topics using different words and phrases. The challenge (obviously) is grouping unstructured text into one or more topics in order to rank topics by frequency.
In this situation, could KNIME suggest common topics by evaluating the text on its own (without a training set?) Is this realistic? Which node (s) would we use in this case?
If we wanted to create our own categorization rules, are there a node (or nodes) that would allow us to assign custom "topics" or categories based on the presence of words or phrases we define? e.g. Tickets containing string 1, string 2, string 3 would be assigned to Topic A where strings and topics are defined by us? If so, which nodes would we use to do this?
Finally, are there books containing sample workflows and results explaining the analysis and categoization of unstructured text to identify patterns (not just sentiment) in more detail? I've purchased KNIME Beginner's Luck by Dr. Rosaria Silipo and working my way through it but I'd also like to read books or papers that focus specifically on unstructured text analysis.
Thanks in advance for any guidance and suggestions.