Count how many times each character appears in the whole dataset

RoyBatty296 · October 3, 2022, 2:21pm

I have a table with twenty columns and thousands of rows.

Just for example purposes, I will say I have this table:

ColumnA   ColumnB
Testing      This
1231          1231

I want to count how many times each single character appears in the whole dataset.

So in our toy example we would have

character   nºoftimes
T                3
e                1
s                 2
i                  2
n                 1
g                 1
h                 1
1                  4
2                  2
3                  2

I was thinking of using some kind of string manipulation, but now sure how can I do this.

takbb · October 4, 2022, 5:32pm

Hi @RoyBatty296, an interesting challenge!

One of the problems you have is breaking down the strings into individual characters so that you can count them. Fortunately I have just such a component on the hub.

I don’t often have a need to use it, but here I believe is just such an occasion!
Count the characters across multiple columns.knwf (43.5 KB)

Looping through all of the columns, and putting that component inside the loop allows us to turn the set of individual characters within the column into a single column with a row for every individual character found.

At the end of the loop, you have a single column that is a concatenated list of all characters found across all the columns/rows.

This list can then be grouped to provide the count you need.
It will be interesting to see how well this performs across your thousands of rows but hopefully it will be of assistance.

If I run it with your test data I get this, which differs slightly as mine counts T and t separately. If you wanted them to be counted the same, you could for example use String Manipulation in the loop to make the column all upper or lower case.

bruno29a · October 6, 2022, 5:38am

Hi @takbb , I think this could be done without a Loop, but at the cost of Memory, though the Loop could actually be using as much memory, since it needs to keep the results of each iteration while the Loop is running.

The idea would be to concatenate all the cells into 1 cell. This can be done with GroupBy + Column Combiner:

The GroupBy would concatenate all the rows into a single row, and the Column Combiner would concatenate all columns into 1 column.

For example, if I have this input:

The GroupBy will concatenate all the rows into 1 like this:

And then the Column Combiner will concatenate all the columns into 1, like this:

We can now split this string into array of characters and then do an Ungroup

EDIT: @RoyBatty296 I completed the whole workflow, and that’s how it looks like:

Input:

Results:

Here’s the workflow: Count each character in dataset.knwf (15.1 KB)

takbb · October 6, 2022, 6:09am

Ah yes @bruno29a. Good idea using the column combiner. Why didn’t I think of that?

system · October 13, 2022, 6:10am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.