Normalizing Incomplete Text Fields and gathering counts

mohammedayub · September 29, 2016, 4:56pm

Hi,

I'm having a challenge to process and perform counts on text fields. I have about ~100k rows, Below is a snapshot of the data set

DocNo Section Reference Description

111 2.2.3.4 This is case1

124 Chap 4 Section 4.4.4.2 This is case2

100 Chapter 2 Section 2 This is case3

1985 S2.7.2.2 This is case4

The objective is the find the count of Section References for each document, Presently I'm using Regex Expression to capture valid refereces (like 2.2.3.4 ) and using cell splitter to put them in seperate columns and doing the counts with drill down (using Tableau Hierarchy feature), but the I'm not sure how do I handle case 2,3 and 4 with all the text fields in it.

Is there a good(or naive) solution in KNIME I can implement, like fuzzy match etc. and get counts for all section references, it doesnt have to be efficient or optimized.

Appreciate any leads, Thanks!

system · June 2, 2023, 9:48pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.