How to replace column expressions node in spark

Hi all
I want to get the cumulative sum to reset the value under certain conditions.
I used it as a coulmn expressions node as in the example below.

But I need to implement this task in Spark environment.
I need to use spark node, but it is difficult to implement with spark sql query. Any good way?

Hello @jjlee

Thank you so much for posting your query here. In order to compute cumulative sum, you might want to look at “Moving Aggregation” node, in KNIME. However, in your case to work in Spark environment, I am afraid Spark SQL Query node is the node I can recommend in this case.

If more complexity is involved and Spark SQL query node is out of option, you might want to check out “PySpark Script” node. There are a couple of dedicated nodes to execute PySpark scripts. Hope it helps.



Thank you for your reply. :laughing:
As you mentioned, I found the answer in pyspark node.
Aggregation was possible using window function and conditional statement in pyspark node


This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.