Normalizing Incomplete Text Fields and gathering counts


I'm having a challenge to process and perform counts on text fields. I have about ~100k rows, Below is a snapshot of the data set

DocNo          Section Reference                              Description

111                                                          This is case1

124                Chap 4 Section                      This is case2

100                Chapter 2 Section 2                            This is case3

1985              S2.7.2.2                                                 This is case4

The objective is the find the count of Section References for each document, Presently I'm using Regex Expression to capture valid refereces (like ) and using cell splitter to put them in seperate columns and doing the counts with drill down (using Tableau Hierarchy feature), but the I'm not sure how do I handle case 2,3 and 4 with all the text fields in it.

Is there a good(or naive) solution in KNIME I can implement, like fuzzy match etc. and get counts for all section references, it doesnt have to be efficient or optimized.

Appreciate any leads, Thanks!

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.