Perform dbscan with spark nodes

gujodm · April 11, 2018, 1:05pm

Hi,
I see that the only spark node for perform clustering operation is Spark k-means node.
Is there any way to perform also dbscan (or for example also k-medoids) within spark nodes?

Any suggestion would be appreciated.
~G

bjoern.lohrmann · April 12, 2018, 9:22am

Hi @gujodm

we currently do not have k-medoids or dbscan for Spark, because Spark MLlib does not offer these at the moment:
https://spark.apache.org/docs/2.2.0/ml-clustering.html

If you feel adventurous (*) you could try one of these using Spark RDD/DataFrame Java Snippets:

If you search on Google, there are a couple of other implementations floating around.

Björn

(*) We haven’t tried or tested any k-medoids/dbscan implementations for Spark. We cannot even guarantee that the above work.

gujodm · April 12, 2018, 10:02am

I supposed that this dbscan script is built in Scala. Is it possible to integrate a Scala snippet within Knime?
~G

bjoern.lohrmann · April 15, 2018, 11:46am

As long as they can be used from within Java (which is true in most cases), then yes. You have to get or build the jars from both projects, and add them to the spark jobserver classpath as well as the Spark Java Snippet configuration.

Best,
Björn

system · June 2, 2023, 9:03pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.