Yes, let's talk about it.
First, what do you mean by granularity? And, how do I test it?
So, the 11K columns could have already been downsampled from 40K, for example, so there could already be significant and purposeful dimension reduction.
Also, PCA masks relationships between specific columns. For example, the relationship between PCA componet 1 and PCA componet 2 is not the same as the relationship between Gene 1 and Gene 2, although there are dimension reduction techniques that do preserve such information (e.g. CUR matrix decomposition).
Yes, agreed, how accurate are the correlations? Who knows? But if the point is correlation filtering at first than it could be a useful dimension reduction step. Let's say I have a gigantic matrix of 11K by 11K (or much more), but I am only interested in correlations of r = +-0.7 Because I have such a large matrix, before I calculate correlation p-values or whatever else for (hopefully) more accurate numbers it would be great to get rid of a few thousand columns, so I could begin by filtering at r = +-0.4 and then work further with the resulting matrix.
Also, of course, Pearson correlation has assumptions and is only appropriate for certain kinds of data distributions, which needs to be taken into account.
Finally, here is some R code for filtering using BigCor, and then bringing the reduced matrix back into a "normal" data frame. Not extensively tested, and not all of the libraries are needed, but I think it works:
library(propagate)
library(caret)
library(corrplot)
library(ffbase)
##Build huge correlation matrix
result <- bigcor(YOUR.DATA, fun = "cor", size = 2000, verbose = TRUE)
dff <- as.ffdf(result)
names(dff) <- names(YOUR.DATA)
rownames(dff) <- names(YOUR.DATA)
namelist <- list()
for (i in 1:ncol(dff)) {
##Set the correlation cutoff values
if (((dff[,i] > 0.7 & dff[,i] < 1) | (dff[,i] < -0.7 & dff[,i] > -1)) == TRUE){
##Create a list of columns that satisfy the cutoff values
namelist[i] <- names(dff[i])
}
}
##Remove NAs from your column list
namelist <- namelist[!sapply(namelist,is.null)]
##Create a normal R data frame
results <- dff[c(unlist(namelist)),c(unlist(namelist))]
-Brian Muchmore