Cyclomatic Complexity

holmes · August 7, 2021, 3:57pm

Has anyone done anything with cyclomatic complexity? In particular, I’m looking at complexity of SQL and other programmatic interfaces to DBMS platforms (macros, functions, stored procs, etc.). But generally I’d be interested in any work in this subject on the Knime platform.

AlexanderFillbrunn · August 9, 2021, 9:42am

Hi @holmes,
Can you clarify “complexity of SQL and other programmatic interfaces to DBMS platforms”? SQL is a declarative language where you only specify what you want, not how it should be retrieved, so measuring cyclomatic complexity does not make much sense here. Imagine doing a join:

SELECT * FROM A INNER JOIN B ON A.id = B.id

You send this to the database and depending on things like existing indexes, column statistics, and table sizes it decides to use a nested loop join, hash join, sort-merge join, or some other kind of join algorithm. Each of those can have its own cyclomatic complexity. Additionally, often what you write is not what the DB actually executes. Queries are often rewritten by the query optimizer to be more efficient. A useful tool for finding out more about a query is the EXPLAIN keyword and in most cases the output you get from that is more relevant than cyclomatic complexity.
Kind regards,
Alexander

holmes · August 9, 2021, 1:11pm

Views, macros, functions, procedures, and DQL/DML submitted by users and ETL/BI applications. I’m attempting to classify the complexity of code components in a DBMS environment for the purpose of scoping a cloud migration using a refactor/rewrite strategy (code will be rewritten to suite the target environment). These code components include declarative and procedural code, and there are typically thousands that need to be sized.

Perhaps cyclomatic complexity isn’t applicable for declarative languages. I was thinking maybe sentiment analysis would be a better tool. Or is there another tool I’m missing that could help here?

AlexanderFillbrunn · August 9, 2021, 1:59pm

Hi @holmes,
Sentiment analysis usually refers to the classification of pieces of text into “emotional” sentiment, e.g. telling whether a tweet is positive, neutral, or negative towards a topic.
So am I understanding it correctly that you want to estimate the necessary effort for rewriting different types of code and therefore you are looking for a way to know how easy to understand the code is? In that case you probably need different metrics for procedural and declarative languages, but I am also not sure if a metric based on code alone is enough. Sometimes a piece of code uses a library that is not available in another language and that can increase the rewriting time tremendously.
Kind regards,
Alexander

holmes · August 9, 2021, 2:30pm

@AlexanderFillbrunn - thanks for taking time with this…it’s much appreciated.

Spot on: complexity of the code is correlated to time required to reverse engineer (design), the results of which feed into the rewrite (build).

I was expecting to have different models and/or tools for different types of code. But not sure what tool would best suite the task. I was thinking I could repurpose a sentimenter, where tweet=code and positive/neutral/negative=simple/moderate/complex. Sure there will be other complexities (e.g. platform-specific libraries) but these are just bridges to cross.

SamirAbida · August 9, 2021, 2:34pm

Hello @holmes,

Which DB are you on ? Oracle ? SQL server ?
I’m not aware of anyone that have done something regarding “Cyclomatic complexity” in Knime but I remember about something nice I’ve seen on internet :
Better PL/SQL (salvis.com)

Because cyclomatic complexity is not enough, they have combined multiple “metrics” (McCabe’s cyclomatic Complexity, Halstead Volume, Maintainability Index). I think they have a free add-on, but I can’t remember. Everything is explained in the pdf.
Br,
Samir

holmes · August 9, 2021, 2:53pm

@SamirAbida this is great…thanks!

My specialty is Teradata, and as that ship sinks we’re seeing more and more of these migrations…can’t keep up with demand. We get mired in scoping/sizing efforts and it would be nice to have an automated/consistent technique in the toolbox. Complexity is just one dimension, but one I think I can automate.

holmes · August 11, 2021, 1:43pm

I’m testing Palladian text classification for view DDL. Promising results so far.

SamirAbida · August 11, 2021, 2:10pm

Hello @holmes,

So basically, you are using Knime to read your queries and extract the operators (If, Loop, For, Select etc…) to weight them and create a cyclomatic complexity notation, right ?

If so, that sound interesting. I would be interested to see a screenshot of the result, for my knowledge.
Br,
Samir

system · February 10, 2022, 2:10am

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.