Hello guys, I have converted my terms to strings and filtered out the other unwanted columns. Why am I seeing different numbers for the row count (as depicted in this pic)? One is 4k+ and another is almost 20k.
I also noticed that when I write an Excel file after this node, the file follows the 4k+ version of number of rows. Can someone help explain this for me? Thanks!
(FYI the term column was extracted from my BoW node from which the other unwanted columns were discarded - not sure this background info helps you understand my problem but i just wanna point that out to paint the picture of my workflow.)
Based on this info I say the number of rows in your table is 4137.
You are seeing different numbers for RowID because the Bag of Words Creator has created new row id’s. With the RowID node you can create a new RowID, that will go from 0 to 4136.
The BagOfWordsCreator multiplies the input Document by the number of words. After the BagOfWordsCreator you probably did some filtering.
Another way to find out the number of rows in your table is to use Extract Table Dimension node.
Thanks @HansS !
I did filter the rows in the previous nodes, its weird as to why this particular node exhibits the unfiltered Row ID though, although the “real” row count is correct (4k+).
I’m not sure I understand why this is an issue.
You created a lot of rows in a previous node. You then filtered out some rows, and as a result you can clearly see that some RowIDs are missing (IDs jump from 19577 to 19581, from 19595 to 19600, from 19793 to 19814, etc)
Simply filtering never renames rows.
Just to complete what @HansS and @elsamuel have explained:
This is not Row Count, this is Row ID. This field does not count your rows but contains some unique values to identify the row. So when you filter some rows, the IDs remain exactly as they were before filtering. That’s why you are missing some row IDs in your screenshot.
Thanks now it makes sense! I didnt notice the term “id” in Hans’ answer at first. Having read everything again now I know that I had mistaken row IDs as row counts.
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.