I have say 10 String columns. These columns can contain the same or different values. I want to find the value(s) that occurs the most over these 10 columns. If there are multiple values with same occurence, return all of them.
I can do that with a rather complex workflow including Column Aggreagation, cell splitting and a Java Snippet. My question is if there is a node I don't know about that could easily achieve this?
It sounds like a good case for unpivot followed by a groupby. Can you post an example workflow?
This would be for ranking passed on occurence.
See attachment. This works but look overly complicated.
I have a solution using unpivot into R snippet, but I there doesn't appear to be an elegant way to handle ties.
After completely unpivoting the table, an R snippet with the following code will give you the most frequent entry for each row. Unfortunately, "which.max" returns only the first entry in a tie. Alternatively, "which.is.max" from the nnet package will return a random winner, but if you need all 3, I think your current method is best.
library(plyr)
myFun <- function(x){
tbl <- table(x$ColumnValues)
x$freq <- rep(names(tbl)[which.max(tbl)],nrow(x))
x
}
knime.out = ddply(knime.in,.(RowIDs),.fun=myFun)
Regards,
Aaron
Hi collegues, in particular @beginner_
I know it's an old thread but I have take a look to the example workflow, in particular the Java Snipped node in which you rank the occurrences.
int highestCount = -1;
for (String occurence : c_Uniqueconcatenatewithcount_SplitResultList) {
final Matcher matcher = pattern.matcher(occurence);
if (matcher.matches()) {
final String compound = matcher.group(1);
final int count = Integer.parseInt(matcher.group(2));
if (count > highestCount) {
out_MostCommon = compound;
highestCount = count;
} else if(count == highestCount) {
out_MostCommon += ", " + compound;
}
}
}
My question is:
and if I want to get as result the top 5 highest unique occurrences instead of the highest one? How the Java code should be structured in this example?
Thanks in advice.
-Giulio