Database Nodes Datamanipulation w/o SQL

Hey Knimers,

We are currently evaluating if we can leverage KNIME for some of our ETL Processes. The constraint: This solution should be low code only for our business users. SQL should only play a role in rare cases.

The DB Nodes do a neat job and fulfill many requirements (groupby, join, pivot etc.). Yet, what we are missing are nodes, that allow string manipulations, calculations, or if else logic in the DB Node regime.

Is there maybe an extension, providing such options w/o SQL? Or are there some developments planned in this direction? In my opinion, what could be a really cool feature, would be flexible ports, where input output ports can be changed by few clicks. E.g. I can drag a Rule Engine into the WF and then decide which ports it has (DB ports, Variable Ports, Data ports and so on).

Looking forward to more information on this topic.

Best regards,
Stiefel

1 Like

Hi @Residentstiefel . The difference between the DB nodes (DB GroupBy, DB Joiner, etc) and “regular” nodes is that the DB nodes execute their operations on the DB side (server), while the regular nodes will perform their operations on the Knime side (client).

If the business users are only going to read the data and not write, it’s ok to perform the operations on the client side, so you can still use Rule Engine, String Manipulation on the fetched data.

While the flexible/dynamic ports can be a good feature, this would be applicable only for nodes that can run on both the server side and client side. For example, the Joiner, GroupBy, Pivot, Sorter nodes could support that dynamic ports. For Rule Engine for example, you can’t have a DB port to it as there are no DB Rule Engine. I’m not even sure how a Rule Engine should behave on the server side. Would the result be a virtual column under an alias?

Personally I prefer it the way it is as it is easier and faster to use the nodes. For example, double click on a node and it will automatically add it to the highlighted node in the workflow, provided that the highlighted node’s output port type is the same as the clicked node’s input type. If the node has dynamic ports, I can’t do that. I’ll have to drag the node into the workflow, choose a port, and then attach it to the workflow. Similarly when dragging a node between 2 nodes. It will automatically linked between the 2 if the ports are compatible. With dynamic ports, I can’t do that. It will take additional steps.

EDIT: Another thing to mention regarding your comment about “SQL should only play a role in rare cases”, I’m guessing it is because these users might have limited SQL knowledge, which most likely means they have even less knowledge on how the DB system works. So you don’t want to have these users doing operations on the server side that can create table locking for long period or performance issues to the DB system. They should just run basic queries via the DB nodes and eventually retrieve the data via some DB Query nodes, and then apply Rule Engines via Knime on their side without impacting the DB server.

1 Like

Hi @bruno29a,

Thanks for your reply and ideas.

Your explanation wrt DB nodes only execute on DB side is exactly the reason, why we are looking for such a solution.
Concerning the datamanipulation within SQL: In an (Presto) SQL Statement for example, I can easily attach a new column with a string manipulation logic, e.g.“SELECT LOWER(CONCAT(‘Word1’,’,’,‘Word2’)) as NEW_COLUMN”. Yet, the DB nodes don’t offer such a solution with low coding standards (like string manipulation node), I would have to use SQL (at least what I am aware of).

Maybe a bit more background: Currently we are using a database which offers our business a software to develop calculation views visually and this without any line of SQL. Since we are updating our data landscape, it would be neat if KNIME could be an alternative here, as we have already alot of KNIME knowledge in the organisation.

Wrt to the dynamic ports, I totally agree. I was rather thinking of something like: right click on node and you can switch the ports (similar to components). Or maybe something like, you select a DB node, double click a string manipulation node in the repo and it gets added to the WF with DB ports. But not sure if this is feasible or bringing too much value. As you said, it might impact the user experience maybe negatively. So this idea can be discarded :smiley:

Best regards,
Stiefel

1 Like

That would be a great feature for nodes with dynamic ports. For example if the Joiner node does become one with dynamic ports (DB port or Knime data port), it could be that if you are using DB nodes, the Joiner node could automatically change the port to a DB port if you are appending to a DB node, and automatically change to a Knime data port if you are appending to a Knime data node. That way the user experience would not be impacted since the node would “adjust” itself.

Indeed, you have to use SQL for this kind of manipulation.

I just want to re-iterate what I said before with a bit more detail. Let’s suppose that somehow you are able to do these on the server side. Can you explain what happens after? If I understand correctly, you eventually are executing a SELECT in the background (well, Knime will send a SELECT statement to your db server) and retrieving the data, is that correct? The data is coming from the DB server to your (or other users’) local Knime, correct? If all of these are correct, then why not do what you can on the DB side (DB Pivot, DB Row Filter, etc), bring the data to Knime, and then complete the rest of the operations in Knime with String Manipulation, Rule Engine, etc?

I mean, unless you are writing the results to another table in your DB server, where you don’t need the data to come to your Knime, you can apply the data manipulation on the Knime side if you are going to bring the data over anyways. Unless I misunderstood…

1 Like

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.