Color scatter-plot by density

bmuchmore · May 15, 2015, 3:51pm

Fellow users,

I would like to create a scatterplot in KNIME that is colored by point density. I know these kinds of plots are easily done in R or python, but that is not exactly what I want. What I want is to create a column that somehow represents the density of an x, y (or x, y, z) coordinate point. For example:

Hypothetical before dataset:

x y

1 2

2 1

2 2

***APPLY FUNCTION(S)***

Hypothetical after dataset:

x y density

1 2 0

2 1 0

2 2 1

I can now use the values of this density column to color a 2D x,y scatterplot using the Erlwood 2D/3D scattterplot node. I have come close to succesfully creating a representative density column using various R packages, but I haven't found exactly what I am looking for. Also, some functions (e.g. exhaustive KNN searches) take to long for the number of x,y points I am dealing with (~100,000)

Anyone willing and able to help?

-Brian Muchmore

Aaron_Hart · May 15, 2015, 8:49pm

Hi Brian,

I don't think there is a way to speficially alpha by a column, but you can speficy a static slpha using the color manager. If you want to keep everything the same color, you can create a constant value column to use for the color model or set the range start and end on a numeric column to be the same and just set the alpha to a lower value.

Hope that helps,

aborg · May 15, 2015, 11:49pm

Hi Brian,

Isn't that something what you would get if you just count the values GroupBy x and y and subtract 1 from it and re-Joinered? (Probably with a normalized-to-grid-x/y values or better with a normalized-to-hexagons-x/y values. Both can be computed with Math Formula or Java Snippet nodes.)

(BTW with the RapidMiner extension you can also create density plots, though without HiLiting. Not sure whether the image would help achieving what you want )

Cheers, gabor

bmuchmore · May 16, 2015, 12:10am

Thanks both of you.

Aaron, I got to be honest and write that I don't really understand what you were suggesting. Alpha? Anyway, reminding me about the color nodes was useful in itself.

Gabor, a straight-up density plot defeats the purpose: It is the density column I want, which can then be used to color a scatterplot. I must admit, however, my use-case is a little obtuse. Either way, I'll check out your suggestion to see what you were thinking.

To anyone else, I figured out the answer to my question, and it works beautifully. However, it is a flow cytometry specific solution involving a R package. If anyone is curious though, email me at bmuchmore@rocketmail.com and I'll share code/let you know what I did.

-Brian

bmuchmore · May 16, 2015, 12:16am

Actually, I should say it is not a flow cytometry *only* specific solution, and it would work for any x, y (or x, y, z) coordinate dataset. However, it would take some rather convoluted workarounds to adapt it to a non-flow cytometry dataset. But it should work.

-Brian

bmuchmore · May 19, 2015, 9:24pm

So, after a little more searching, the code is not so convoluted after all. If anyone is curious:

#source("http://bioconductor.org/biocLite.R")

#biocLite("spade")

library(spade)

Sys.setenv("OMP_NUM_THREADS"=8)

density <- SPADE.density(knime.in, kernel_mult = 15, apprx_mult = 1, med_samples = 2000)

knime.out <- as.data.frame(density)

The output is a density column computed from your input columns. So, if you want to calculate the density of just two columns (e.g. X,Y coordinate columns) be sure to filter out your other columns first. This density column can now be used downstream to color your scatterplot (e.g. erlwood 2D/3D plot), filter by density etc

-Brian