Generate sequence with first and last value

Pippobaudo89 · June 30, 2023, 2:04pm

hi all

i got a problem.

i tried to build a workflow with table creator and row generator to automatically generate a sequence by three parameters.

First value is the first row from with begin the sequence

Last value is the last row which end the sequence

Number of rows is the number of objcets generate by the sequence

Example
Input:

First value x001
Last value x010
Number of rows 10

Output:
x001
x002
x003
x004
…
x010

I tried in many ways but i can t find any nodes that can help me (someone suggested me java snippet node but i don t know how to write java code)

Anyone can help me??

takbb · June 30, 2023, 2:49pm

Hi @Pippobaudo89 , the majority of nodes are unable to create additional rows, so java snippet for example will not be able to help you here.

For the situation you describe you, the “One Row to Many” node is your friend. It requires that there be an integer column present on the rows which states how many duplicates of each row to create. In your example, I’d suggest that the # of luggages column is probably an integer column which can be used for this purpose.

Typically, if you want to then create a sequence between two points, you could then use the Ranking node or similar to generate a sequence number which can then be used in some form of addition.

Here is a relatively generic example. (Workflow supplied)

Suppose I have a table as follows

In this table, each row has a start number and the number of rows that are required. What we want at the end is for there to be extended ranges based on “Number of Rows”, so we would end up with Start Numbers 100 to 105, 200 to 204 and 300 to 306

To achieve this, the following pattern can be used:

Firstly, we will need to know (later) for each row, which “group” it belongs to, so we can use a Counter Generation to give each row a number, or we can, as shown here, use the RowID node to make a copy of the current rowid in a new column “OriginalRowId”

After that we tell it to duplicate the rows using “One Row to Many”, and specify that the “Number of Rows” column (which must be an Integer column for it to be seen by the node) is to be used to determine how many duplicates of each to make. A value of 1 would mean “do not duplicate” as it identifies the number of rows for each that will be present at the end of the operation.

Next is the ranking node. It is also possible to use Moving Aggregation here, and many people find it preferable. I am demonstrating the Rank node here. The warning with the Rank node is that I find it buggy to configure. You can sometimes (often) find that it doesn’t properly detect the available columns during configuration, and no end of reset/execute fixes it. Even during writing this demo, I found I had to delete the node, execute the node prior to it so that the columns were present, and then re-add the Rank node before it allowed me to choose the grouping column. However, once configured, it works well.

The rows are to be grouped by OriginalRowId, and the Start Number can be chosen as the “ranking column”. It doesn’t matter which ranking column is chosen, because at this point within a group they all contain the same value. You must specify a column, though, so this will do!

Make sure the Mode is set to Ordinal which will give you an ordered sequence beginning at 1 for each group, and I have told it to retain row order (although that probably doesn’t matter in this case).

The result of this is the following table:

You can now do a calculation for new values of the Start Number, using the rank column on each row. So the new start number for each row will be Start Number + rank -1, to keep the first row in each group unchanged. This can be performed with a Math Formula.

After a column filter tidies up the required columns, the result is the required output as shown above.

Dealing with Alphanumeric codes such as yours follows a similar principle, but you will need to perform additional manipulations on the string because you cannot just add to it with a Math Formula. There are a number of ways of achieving this, but I have included here a sequence of nodes to demonstrate one way of performing this:

The upper branch is the “numeric” version detailed above.
The lower branch contains additional manipulation of the string, which if you read through the nodes will hopefully be reasonably explanatory.

In this example, it was important to “retain row order” within the Rank node, because otherwise it reordered by “Start Number” in alphabetical order, resulting in “banana” codes appearing first.

I hope that helps give you some pointers

Forum - build a sequence of rows.knwf (34.5 KB)

gonhaddock · June 30, 2023, 3:05pm

Hello @Pippobaudo89
@takbb 's solution just came up when working in mine one.
Just for the records, almost the same but few nodes less.

20230630_generate_sequence_v0.knwf (32.7 KB)

BR

PS.- To be honest I didn’t include rank hierarchy, used ROWINDEX instead

takbb · June 30, 2023, 3:26pm

Hi @gonhaddock, I think the difference here is that yours is a better specific answer to the question, dealing with a single starting row, whereas mine is more general in terms of handling multiple initial starting rows hence my additional nodes to ensure there is “grouping”, and also multiple initial rows meant I couldn’t use rowindex.

I was tempted to use String Manipulation to cut down on the need for math formula and some of the other nodes in my “alphanumeric” example:

That one String Manipulation node doing the job of five…

join(regexReplace($Start Number$, 
	"[0-9]*$","")
	,
	string(toInt(regexReplace($Start Number$,"[^0-9]*",""))+$rank$ - 1)
	)

because you know how I think String Manipulation should be used for everything… even when it shouldn’t be

gonhaddock · June 30, 2023, 3:36pm

Hello @takbb , excellent as always!
I knew your solution was complete already, and can be easily escale up to a dataset; but I decided to share for two details, besides it was already done.
The first as you mentioned, it is because more specific. The second because it’s included the dealing with the leading zeroes in the name. So it can be upgraded in deployment if needed.

BR

takbb · June 30, 2023, 3:58pm

And perfectly right that you did @gonhaddock !

As always I was keen to look at your solution too, and there’s always room for a variety of approaches, and you never know when that “I never thought of doing it that way” moment will occur.

Pippobaudo89 · July 3, 2023, 8:12am

thanks… but in my version don t exist one row to many, i substitute with one to many but the result aren t as yours…

gonhaddock · July 3, 2023, 9:10am

Hello @Pippobaudo89
‘One to Many’ node work with columns, so the function is different.

Ok. While ‘One to Many’ is part of the KNIME base nodes; ‘One Rows to Many’ node is part of the KNIME data generation extension.

Didn’t you receive the pop up option to install the required extension, when opening the workflow? This is the easiest approach to install required extensions.

BR

Pippobaudo89 · July 3, 2023, 9:27am

o gosh

my bad

i m trying to install it, but probably i select too much extension (once for all) so it taking times… let u know. thanks a lot, i ll give u update

Pippobaudo89 · July 5, 2023, 8:28am

update:

the nodes work (thanks so much) but there s still some problems we can resolve…

in the math formula, you add +1E that in some ways makes error with my series so i put a -1E in the formula to correct…

But with the first range (i have several range in different rows) it s all ok, but with the others it makes error…

I ll give you some real examples for my real table, i tried to use your advices, but seems that must be modified

These are my starting table… i applied your advice only on the first row, and with the edit i wrote above it works.

My next steps are:

make the same work for all the other rows (but in reality with the method above it works for all the rows but something make error (for example, the final output delete the 0 after META, i solved with a constant value and add for the final column “META0”)

2)join for all the series the values from column C in the example above to column I (from “n° luggages” to “description”)

Please if i am not clear feel free to ask

and overall thanks in advance to all!!!

gonhaddock · July 5, 2023, 8:47am

Hello @Pippobaudo89
Let’s go back to the beginning of the post and subsequent discussion. This current workflow is not prepared to be connected to your data, as it is working on a single row basis. A simple deploy based in a row based loop is doable.
Remember this:

@takbb 's workflow is a better example about how to deploy it in your real data (remember about use of hierarchical rank comments…) My answer was focused about to solve your description’s example -this is one row-. You can combine features from both.

I can work trying to create an upgrade of the solution, or you can adapt @takbb 's one. And the last but probably less efficient would be to loop this one. Being aware that now it is not ‘x’ anymore but ‘META’ (if different strings may be worth to implement leading string as variable)

This is why, providing more challenge details, helps into deliver better fitting solutions to your use case.

Please share your thoughts

BR

PS.- the 1E3 is equivalent to 1000. So the ‘3’ is the number of digits in your numeric sequence. Aiming to fill leading zeroes of the sequence. In your picture example you have one leading 0 out of 7 other integers… so you may want to replace it by 8 (it may be worth to automate as variable as well)

PS2.- I’ll work on workflow upgrade, just for the records.

gonhaddock · July 5, 2023, 10:15am

Hello @Pippobaudo89
I had a look to the upgrade, and based in all possible casuistic… the best implementation would be within a loop.

Be aware about your n_luggage estimated column, as it’s underestimating all the way down in the table (based in ‘From’ and ‘To’ columns).

20230630_generate_sequence_v3.knwf (65.7 KB)

Now leading string and numeric length is automated via variable, so you don’t have to take care on workflow customization.

I hope this deploy of the workflow, fits better with your use case.

BR

Pippobaudo89 · July 5, 2023, 10:44am

it seems work perfectly…

OMG i dont know where you are but i wish to you all the luck of this world you re such a great person

thanks very much, i ll propose you as genius of data analysis

thanks again

gonhaddock · July 5, 2023, 10:50am

Thank you @Pippobaudo89 for validating my #100 solution in forum

Just for the records, I did a latest update to variable formula (v3). It will be uploaded into Hub as soon as I get access to it.

BestRegards

gonhaddock · July 5, 2023, 11:40am

Hello @Pippobaudo89

Just for the records, I did a latest update into Hub; applying hierarchical ordinal rank. So now Looping is avoided, resulting in an optimized workflow in terms of efficiency.

BR

Pippobaudo89 · July 5, 2023, 11:44am

you re great

thanks again

takbb · July 5, 2023, 5:26pm

Congrats @gonhaddock … drinks all round!

system · July 12, 2023, 5:27pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.