KNIME offers a bunch of clustering algorithms (see the image in attachment).
In the nodeguide you can find a couple of examples: https://www.knime.org/nodeguide/analytics/clustering
These examples are also available in the EXAMPLES Server (knime://EXAMPLES/04_Analytics/03_Clustering). The type of clustering algorithm that could be used, really depends on the data that you have.
You said that your data is string. Maybe, you may find interesting the following example showing document clustering:
I am dealing with a similar issue, but the problem is that our dataset containg about 10 millons rows with string type fields ( name + last name ; adress ; gender ).
We intend to cluster this dataset in order to detect duplicates ( same person wrotten with errors, not normalized format, etc )
For example, we want to detect that following both registers probablly are the same person clustering them into the same cluster:
Santiago Gonzalez; Road Street 1241 NY
Santiago Gonsales; Road St. 1241 NY
Is it correct trying to use analysis cluster for this purpose ? In the example posted before, knime will need generate a distance matrix of 10 millons x 10 millones, i think is not possible this solution for this volume of data set .. do u have any suggestion ?