Convert a string to a list to extract the maximum

What I noticed in the string is that after the first " 1:moetmp_1", there is 8 spaces. This corresponds to the cell of the diagonal of the matrix.
For example, it is


You see that after “2:moetmp_1”, these 8 spaces are absent but present just after 1.10.
This is the second row of the matrix and the 8 spaces are in the second position.
The same thing in the third one: “3:moetmp_1”, the 8 spaces appear after the 3 number 6.67 and so.
In conclusion, the 8 spaces are moving a step forward for each row of the matrix.

I have noticed two problems in your examples:

  • There are some lists in which the first member of the list comes right after the starting tag. E.g:
    1:moetmp_13.16 which is actually 1:moetmp_1 3.16

  • In the last example:
    Convert a string to a list to extract the maximum - #17 by zizoo
    The lists contain 9 members but you have more than 9 lists (columns). So how do you split the matrix into two parts? And the last lists in the same example have only one member.

Regarding your examples, the best expression I can provide is this:
regexReplace(regexReplace(regexReplace($column1$, "^.*?(?=[\\d]+:moetmp_[\\d])", ""), "[\\d]+:moetmp_[\\d]" ,""),"\\\\012", "_")

  • This expression will remove everything before the first list tag.
  • It supports only this format: <one or more number of digits>:moetmp_<one digit>
    Example: 25:moetmp_8

The workflow works on the assumption that the matrix has equal number of rows and columns where:

Like this:

0     2.2  5      13.25
11.3  0    25.23  44.5
2.5   3.3  0      6.7
87    52   13.13  0

If you have a different case, let me know the new conditions.

:blush:

P.S. Dear @zizoo, I was assuming that the matrix from your friend is following the same conditions as your own. Sorry for my misunderstanding. Please explain it to me how you split this matrix and I will provide you with a new workflow.

1 Like

Hello @armingrudd,
I put here how the matrix should look like


In the string, these lines are concatenated except for the headers.

And what do you want to do with this matrix?
What do you want to find?

Hi @armingrudd,
I am trying to find the maximum above and under the diagonal of the matrix.
I am thinking of a trick: to extract all the numbers that are between “moetmp_1” and 8 spaces. Then find the maximum among this list.
This may simplify the search
Thanks,

Your first example was a matrix with equal number of rows and columns and diagonal with zero values.

Here we have something different. Would you please explain further more how you divide the matrix into two parts?
And an example output will help as well.

Thanks.
:blush:

Hi @armingrudd,

To avoid inconsistency in the string,
I generated more than 600strings in the CSV file attached.
Each string is divided into chains that start with 1:moetmp and finish with \12.
When all the chains are aligned, you notice that there are few spaces that are moving which represent the diagonal of the matrix.
This diagonal divide the matrix into two triangles (the number of rows equals the number of columns).
I am looking for the maximum value in the lower triangle. ie under the diagonal of the matrix

RMSD-string2.zip (947.0 KB)

Thanks,

Dear @zizoo,

On of us is missing something here. Let’s take the first row of your example dataset.
There are 31 lists inside the row (1:moetmp_2 to 31:moetmp_2)
At first each list has 10 member (including the white space which specifies the zero for the diagonal).
In 10:moetmp_2 the space is the last member so I expect the matrix should end here but it does not.
It continues with lists containing no space as the diagonal and then there are some with spaces and the last the lists are starting from 1:moetmp_2 (to 31:moetmp_2) again containing only 1 member.

I cannot understand how should I transform this format to a matrix with equal rows and columns with diagonal splitting the matrix to two triangles.

If you make this clear, I will be able to provide you with the workflow.

:blush:

2 Likes

Dear all,
reading all the stuff about differences of the strings i propose to take the problem at the roots. For my view its the best that the guy who provides the data should use a dedicated column delimiter which solves all the issues.

1 Like

Hi @armingrudd,
In my previous string example, the matrix is 31 columns by 31 rows. The rows are divided by chunks of 10 values. So, you can consider it as snapshot of 10 columns until all columns are visualised. This is why you see chuncks with no spaces corresponding to the diagonal .ie it is a chuck of columns 10 columns far from the diagonal.
My friend who is providing the data is aware of this complication.
It seems to be doable to parse the string but extremely comlex and time consuming.
Thanks @armingrudd. I will ask again my friend to provide better representation of the matrix. https://forum.knime.com/uploads/default/original/2X/2/2fda5dd3b151d3a03f278615bfe635833ce30763.png

1 Like

Dear @armingrudd,
I managed to get a table with zeros in its diagonal.
Now, I would like to find the maximum and the minimum for the upper and lower triangle from the diagonal in the table. It is possible to get zeros outside the diagonal.

Is there any node that can make this task easy for me?
I attach the table here. RMSD-log.zip (8.1 KB)

1 Like

Hi @zizoo

I saw this discussion, and itś quite a challenge. I managed to create a wf min max diagonaal.knwf (418.3 KB) that compute the min/max values above and below the diagonal.


It’s starts with a loop that creates a diagonal (because a 0 is not always an indication of the diagonal). Then via Column Combiner and Splitter it is possible to extract the min and max values on both sides of the diagonal.

gr. Hans

3 Likes

Hi @zizoo,

Here you are:

RMSD-max.knwf (69.0 KB)

I just modified (simplified) my last workflow.

Since your matrix is already extracted, you just need to loop over the columns and find the max value in the top and bottom triangles.
To find the the diagonal and seperate the triangles, you can use the loop iteration number and split rows.

Feel free to ask more questions.

:blush:

3 Likes

Thanks @armingrudd and @HansS
It works for the initial tests.
I am wondering whether it is possible to track the x,y ID of these min and max.
I mean for example if I have the max in the third row and fifth column I can keep that information joined with the values for min and max.
Maybe there is already the node to search for a particular value and record its index from the table.
Thanks,

Hi @zizoo

I don’t know if there is already a node for identifying the x,y ID’s of this min-max. I took quite a few nodes to find those ID’s. I hope some can suprise me, you can tackle this question using less nodes than I did. But anyway it works. min_max_diagonaal_cell.knwf (115.5 KB)

gr. Hans

3 Likes

Thanks @HansS and @armingrudd,
I tried your last workflow with a new matrix that is very similar. Just the values are different but the workflow failed to finish.
I couldn’t find out what is wrong even with this simple matrix.
RMSD-log-test.zip (203 Bytes)

I did a quick look at min_max_diagonaal_cell.knwf (129.4 KB)

The problem was there where min, max values are in the same row (and column). In the Extraxt Table Spec there are 2 possible outcomes. And then the row filter does not act as expected. So I created an solution (live life to the max :grinning:) and uploaded it again (same flow name). It uses a switch in cases there are min/max values in the same row/column.

gr. Hans

gr. Hans

2 Likes

Hi @HansS Hans,
I a not sure how you manage to generate the table from the third file reader.
Did you use another workflow?
Thanks

1 Like

Hi @zizoo

There are 2 workflows, one that find the min and max values above and below the diagonal.

and the second one in my last post. Its possible to combine both flows.
I am not sure anymore how I generate the table from the third file reader. You provided the data in the zip file. I think I created it with the first workflow.

Is your problem solved?

gr. Hans

Hi @HansS,
I tried to combine both workflows as you described but the workflow failed at the middle.
I attahc the workflow and the input csv data.
RMSD-extraction 1.zip (232.8 KB)