how to order the rows based on the row index

Subramanyam · September 19, 2022, 8:55am

Hi Team,

I want to re-order the rows based on the row index. Please suggest an idea to achieve it.

current output is

Expected result is

E2EDP19 430 000018E2EDP19 00000003 002000000000081720418
E2EDP01002430 000017E2EDP0100200000002 81720418 5000 IT 5038
E2EDP19 430 000018E2EDP19 00000003 002000000000081725022
E2EDP01002430 000017E2EDP0100200000002 81725022 12368 IT 5038

Please find the attached workflow for further information.
Generating Idoc for Philippines–Main one.knwf (356.9 KB)

I have tried sorter node to sort the rows based on rowID but it working weird as below

Thanks in advance,
Subramanyam Kinthada.

iCFO · September 19, 2022, 11:04am

Use this node to create a new column with your row IDs as a value, then sort by it.

Subramanyam · September 19, 2022, 4:23pm

Hi @iCFO,

It is creating a new Row ID with continuous numbers. If I sorted with Row ID column I will not get the expected result.

Expected result is

Please suggest me if any other solution.

Thanks,
Subramanyam Kinthada.

ArjenEX · September 19, 2022, 4:30pm

If you disable the ReplaceRowID section and only set the Append RowID column and you will get the rowID in a seperate column to work with like any other column.

The output is of datatype string therefore it applies a string sort and not an interger sort that you might have expected. If you actually need to sort on the numbers in the rowID you need to apply some cleansing operations first and convert it to a number.

Subramanyam · September 19, 2022, 4:54pm

Hi @ArjenEX,

I am not sure whether I am presenting my query in a correct form.
My issue is not with Row ID numerical or string. I want the output like I mentioned in the above screenshot. Do I need to apply any logic if Yes, could you please kindly help me in designing it. I have attached the workflow also.

Thanks in advance,
Subramanyam Kinthada.

iCFO · September 19, 2022, 5:04pm

I think you are definitely going to need to clean up those strings in order to get a clean sort as @ArjenEX mentioned. After looking at your workflow there are a lot of Row ID’s that read along the lines of Row#0#0_dup_dup_dup_dup_dup. There are tons of ways to go about it cleaning up the Row IDs, but I will throw out a quick option that comes to mind. You could change the Duplicate Row ID handling in your Concatenate (Optional in) (deprecated) from “_dup” to something like “,1”. Then create your new column with the RowIDs as shown above. Then you can use the Cell Splitter with a “,” delimitator to split the ID into multiple columns, them sum them up the split columns with column expression node, and then add the sum back to your cleaned up Row IDs as a decimal. At that point you should have clean column that can be sorted as a number to give you a desired result.

ArjenEX · September 19, 2022, 6:04pm

It actually is the source of the problem You are expecting:

Row0
Row0_dup
Row1
Row1_dup

based on the number Row(x). But since the output is of type string, that won’t work. A string sort where numbers are involved is dangerous because it leads to unwanted/misleading situations like you encounter now because Row1_dup, according to the string sort logic, is actually in the right place.

That in regards to the image of the large table that you show in the opening post. For that smaller table, depending on how many records that table will contain, the method propsed here does actually work.

After the sort

But again, it comes with a heavy disclaimer that it depends on how clean you manage to keep your rowID’s.

As for the _dup_dup, @iCFO gives some pointers

Subramanyam · September 19, 2022, 6:45pm

Hi @ArjenEX,

Anyway the row ID will not be added in the final output. Here sorting itself is major criteria. So need suggestion for it. I observed one common factor between the two lines that has to come one after another and the factor. there is a eight digit ( starting with 8 for reference I am mentioning) number common in two lines.
could you please help me in building the workflow as per that logic if that works if not any other method.

Thanks in Advance,
Subramanyam Kinthada.

takbb · September 19, 2022, 8:35pm

Hi @Subramanyam

The rowid in KNIME is not designed to be sortable; it is merely designed to represent a unique identifier. When rows are joined through joining or concatenation, the resultant rowid may be transformed in a variety of ways such as prefixing one rowid with another or the addition of multiple “dup” suffixes, as you have found. Sorting by rowid is therefore essentially meaningless as the algorithm by which such IDs could be “sorted” is almost impossible to define for all circumstances.

Generally sorting of data follows one of two patterns. Either you want the data sorted according to a combination of one or more data columns such as name, or date, or value, or you want the data sorted according to the sequence in which it arrives, or is presented in a file.

If you are wanting to sort the data in sequential order of its appearance, which is typically what you might be wanting to do given that you are considering rowid as a possible sort sequence, then probably what you actually want to do is provide a sequential numeric identifier for each row.

If that is your aim, then use the Counter Generation node early in your workflow which will create a sequential numeric value for each row. Use this for sorting rather than rowid.At a later point in your workflow, if your data becomes unsorted, you can then re-sort it’s using the numbers from counter generation. This is a technique I use often to resort data after multiple transformations especially after joining or concatenating data. I hope that helps.

iCFO · September 20, 2022, 1:11am

Good call @takbb. I use Counter Generation node on pretty much all of my workflows in this way. That way I have my own unique and sortable “Process IDs”, and I am not dependent on the uniqueness or sortability of another system / exports Row IDs. It also helps avoid a ton of potential issues for things like Joins and Column Appends in a complex workflow.

Subramanyam · September 20, 2022, 1:27am

Hi @iCFO @takbb,

Now can I go with the Counter generation node to achieve my output. I am pretty much new to KNIME. Please kindly help me in designing the workflow using Counter Generation.

Thanks in Advance,
Subramanyam Kinthada.

iCFO · September 20, 2022, 2:18am

It looks like you are periodically retrieving small batches of data and then doing some loop process / concatenate that results in row duplications that you want to sort…

The uploaded workflow would need embedded data in order to help you directly, and even then it isn’t clear to me what you are after. I would try using the counter generation node after the file reader nodes or immediately before the concatenate nodes depending on which one provides a better sort point for your aim.

If you want one tables’s rows to fall below another then you can at .1 to the result of the counter generation node for that table.

Subramanyam · September 20, 2022, 2:44am

Hi @iCFO,

I am giving you the input files as well as the workflow. and my aim is as below
current output

expected output
row0
row0a
row1
row1a
row2
row2a
… so on.
pod.txt (2.1 KB)
poh.txt (270 Bytes)

Please find the attached input files and the workflow.
Generating Idoc for Singapore.knwf (160.2 KB)
If I get the above mentioned expected output then it will create my final output.

Please let me know for further information.

Thanks,
Subramanyam Kinthada.

iCFO · September 20, 2022, 2:03pm

Generating Idoc for Singapore.knwf (233.1 KB)
I added the Counter Generator nodes as a “process id” before one of your concatenate nodes. You will notice that I used the Column Expression node to add .01 to each counter on one of the streams so that it fell into the correct row sort order vs the other.

I can’t really tell what is going on use case wise with that second concatenate node, but you can filter out the counter column after the sort that I did and then use the same approach a second time before the next concatenate if you need to apply another row order based sort at the end.

Subramanyam · September 20, 2022, 4:17pm

Hi @iCFO,

Thanks a ton,

Its working as expected and the output is perfect.
Thanks a lot for your support @ArjenEX @takbb.

Thanks,
Subramanyam Kinthada.

takbb · September 20, 2022, 6:19pm

Nice one @iCFO!

@Subramanyam You are welcome although I did very little here. Sometimes standing back a little from the problem is all that is necessary. I’m pleased to see it is working for you.

system · September 27, 2022, 6:20pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.