How to insert formula for data type that has ">" / "<" symbols

Hi and good day KNIME community :slight_smile:
I am working on the cleaning of numerous river water data analysis and I am really new to KNIME. I am happy if anyone can help me on how to insert math formula for the data type that has less than or more than symbols.

The symbols indicates that the value is below detection limit or above detection limit. Thus, they have their own formula to determine the exact value for that. I am really clueless on how to use the Math Rule node. I hope anyone could assist me to solve my problem.

Attached below is my example of data.

@nuraqlima Welcome to the Forum.
What are you trying to do? How were these columns originally produced? If the data to produce these columns is in the table you can use the Math Formula node to produce the raw number and then the Rule Engine node to set these flags. Can you share some data? That would make it easier to help.

Hi rfeigel, thank you for responding.

I am trying to convert the value that has the < symbols according to its respective formula. Thus, i do not know how to insert the formula in the Math Formula node. The one that i share before is the raw data that was read from the excel reader actually. I am happy if you would enlighten me more on what type of data example that i can share so that it would be easier for you to help :sneezing_face:

Below is the example of the formula that I want to insert:-

I’m sorry, we’re not communicating. Describe step by step with the screen shot data you posted exactly what you want to do. For example where does the /2 come from in your last post?

Hi @nuraqlima and @rfeigel , ok I’ve read and now re-read this and I think I understand now what is being asked.


Firstly, with the aid of OCR provided by chatGPT, I’v reproduced here the data from the screenshot. It might not be entirely accurate, but it’s good enough for our purposes:

BOD (mg/L) BOD INDEX Ca (mg/L) Cd (mg/L) Cl (mg/L) COD (mg/L) COD INDEX Cr (mg/L) CN (mg/L) E. coli (cfu/100ml) Fe (mg/L)
5 79 19.8 <0.001 8 24 70 <0.001 <0.050 540,000 0.74
2 92 14.0 <0.001 3 26 78 <0.001 <0.050 14,000 0.26
4 83 12.3 <0.001 5 16 32 <0.001 <0.050 460,000 0.61
1 61 27.8 <0.001 6 27 37 <0.001 <0.050 900,000 0.82
7 73 18.4 <0.001 9 36 61 <0.001 <0.050 230,000 0.69
8 69 18.9 <0.001 21 29 62 <0.001 <0.050 250,000 0.87
1 45 27.8 <0.001 27 28 61 <0.001 <0.050 280,000 0.87
6 57 30.3 <0.001 6 18 54 <0.001 <0.050 46,000 0.55
9 29 32.7 <0.001 14 28 64 <0.001 <0.050 24,000 0.48
3 73 28.7 <0.001 15 32 63 <0.001 <0.050 140,000 1.24

(@nuraqlima, this is what @rfeigel meant by actually sharing some data, so either upload a sample from Excel, or upload in text form so that it can be copied/pasted and people can then experiment with possible solutions without having to do all the prep work which takes time and effort

see also Easy creation of formatted "table" on forum posts on how to easily make use of text data posted as above)


So, the columns Cd (mg/L) , Cr (mg/L) and CN (mg/L) may contain a numeric value but they may also contain a literal value <n or >n. The problem faced is that for this reason, these columns are represented as strings and not numerics, and so cannot be directly used within Math Formula node, and require conversion first.

My take on this ā€œconversionā€ is as follows, (except that I’m not entirely sure if I got the >n correct)

For each of those columns, derive a value as below

where the cell contains use the formula value to determine the value notes
n n
<n n / 2
>n (n / 2) +1 (is this correct, it wasn’t clear to me from the handwritten note)

So for example

cell value formula calculated value
10 10
<10 10 / 2 5
>10 (10 / 2) +1 6

The calculated value could then be used in any subsequent calculations, but I believe that what is being asked, is how to convert the <n and >n values to actual numeric values based on the above.

@nuraqlima , please can you confirm understanding. I’m sure that we can then give you a variety of methods to achieve this.

1 Like

Hi @takbb,

i guess for operator ā€œ>ā€ data the formula doesn’t make sense. The final result for a >10 value can’t be bellow 10. I guess the formula should be n+1.

@nuraqlima:
the general question is what kind of calculation is done afterwards. Is it only for binning data i think it is ok. Is it for further math calculation i would be very cautious handle data in such a way.

BR

1 Like

Hi @morpheus,

I think you are right about the formula for >n, (ie that it can’t be below n, and I should have considered that), but I think the error is possibly just in my usage example of the formula value rather than the formula itself (albeit we don’t know what the formula actually is :wink:).

ā€œn=10ā€ was probably a poor example on my part, used just to demo the formula rather than to demonstrate the correct range of values.

If I had been using sample such as >0.001, then adding 1 would have put the result above ā€˜n’, so it is conceivable that within the expected range it is OK.

Your points are well made though, and yes further explanation is needed.

Let me re-ask my first question. Are the columns with the </> signs produced from other columns in the dataset? if so, which ones?

1 Like

Hi @takbb, yup correct, that is what I meant , which to convert the non-numerical value to numerical value. The formula for "> "is correct which is (n/2) + 1. Also, thankyou for enlighten me on how i should share my data , I appreciate that a lot :sneezing_face:

Hi @rfeigel, the columns with the </> produced are not from other columns, it is the value itself in the respective column for the respective parameter which in the non-numerical form, that is the data which given by my supervisor.
I am sorry if my question didn’t clear enough to you :pray:
I could just use excel actually to convert the non-numerical value to numerical value which is just insert its formula right, but for my final thesis, i have to include knime workflow , therefore, I am asking . Thank you for trying to help me :slightly_smiling_face:

Hi @morpheus, for the formula , actually I am also not really sure where it is come from as that is given by supervisor, and I haven’t research on the reason behind the formula yet, but I am sure that is certainly only just for simple calculation and not more than that :smile: Thankyou for sharing your knowlegde :slightly_smiling_face:

Thanks for the reply. The idea I had would not work unless these values were calculated. As usual @takbb is very helpful.

alright, no problem on that, i would just use excel for now i guess, then, i would ask you again from the calculated data that i have obtained :laughing:

Glad to help if I can.

1 Like

Hi @nuraqlima , I’ve uploaded a workflow here

As you could have an undefined number of columns that need to be calculated in this way, It uses String Manipulation (Multi-Column) to act on all the String columns. As it is not possible to automatically have String Man (mult) act only on String columns, some additional nodes are needed to pass just the String Columns to that node, and then re-build the table afterwards, so the workflow is more involved than it ought to be!

However, the primary work for the calculations is this rather horrible looking script in String Manipulation (Multi Column):

string(	
$$CURRENTCOLUMN$$==null  
?$$CURRENTCOLUMN$$ 
:substr($$CURRENTCOLUMN$$,0,1).equals("<")
	?string(toDouble(substr($$CURRENTCOLUMN$$,1)) / 2)
	:substr($$CURRENTCOLUMN$$,0,1).equals("<")
		?string(toDouble(substr($$CURRENTCOLUMN$$,1)) / 2 + 1)
		:$$CURRENTCOLUMN$$
)

In the above ā€œ?ā€ means ā€œthenā€ and ā€œ:ā€ means ā€œelseā€
So what this says is:

return a string such that:

if currentcolumn value is null (i.e. missing)
--then: return the currentcolumn value (i.e. leave it as missing)
--else: 
----if the first character (position 0) of current column value is "<"
------then: convert everything from the second character (position 1) onwards to numeric and divide by 2
------else: 
--------if the first character (position 0) of current column value is ">"
----------then: convert everything from the second character (position 1) onwards to numeric and divide by 2 and add 1
----------else: return the current column value unchanged

(further info on this syntax - see ā€œconditional expressionsā€ section of String Manipulation’s ā€œundocumentedā€ features)
After this node, it passes the result to String to Double to convert the values to Double.

btw, this node could have also returned Double by wrapping the whole script in the toDouble() function except that this won’t work if there is a chance you have missing values, since the toDouble() function breaks if it is passed ā€œnullā€. So it is safer to do the string to number conversion separately.

2 Likes

hi @takbb , sorry for the late reply, thankyou so much and I appreciate your help, I would definitely give a try for this :hugs: :+1:

1 Like