Perform dbscan with spark nodes

Hi,
I see that the only spark node for perform clustering operation is Spark k-means node.
Is there any way to perform also dbscan (or for example also k-medoids) within spark nodes?

Any suggestion would be appreciated.
~G

Hi @gujodm

we currently do not have k-medoids or dbscan for Spark, because Spark MLlib does not offer these at the moment:
https://spark.apache.org/docs/2.2.0/ml-clustering.html

If you feel adventurous (*) you could try one of these using Spark RDD/DataFrame Java Snippets:


If you search on Google, there are a couple of other implementations floating around.

  • Björn

(*) We haven’t tried or tested any k-medoids/dbscan implementations for Spark. We cannot even guarantee that the above work.

I supposed that this dbscan script is built in Scala. Is it possible to integrate a Scala snippet within Knime?
~G

As long as they can be used from within Java (which is true in most cases), then yes. You have to get or build the jars from both projects, and add them to the spark jobserver classpath as well as the Spark Java Snippet configuration.

Best,
Björn

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.