Repeating a Moving Aggregation (What are the options)

Tyler · June 22, 2019, 12:06pm

Hello, thanks for the help & patience.

PDF parsing runs into a data problem I’m not familiar solving with KNIME. Maybe there’s a tool or set of tools I’m missing. I’m eager to parse a PDF a certain way, I feel I’m missing something.

Goal, incoming data is a PDF. We get it to this point and I noticed a pattern in the data that I want to be able to capture into columns for debits/credits/balance to a bank account.
(can’t share data, only examples)
4/4 Recurring Payment authorized on 04/03 Netflix_Com Netflix_Com 10.81

Splitting per space generates:

Far right row id, if I could restart the rowid per column [new], like a group by, that would be perfect.

4/4 Recurring Payment authorized on 04/03 Netflix_Com Netflix_Com 10.81
5/4 Recurring Payment authorized on 04/03 Netflix_Com Netflix_Com 10.81
6/4 Recurring Payment authorized on 04/03 Netflix_Com Netflix_Com 10.81

At the end of every purchase, there’s always 1 or 2 sets of numbers.

The split on SPACE gives me a potential of saying “what’s my max value” of a running sum of 1. I want to understand other options to increase my ability to solve these problems faster. Seems I’m getting warmer but probably googling the wrong keywords to find the right node for the solution.

4/4
Recurring
Payment
authorized
on
04/03
Netflix_Com
Netflix_Com
10.81

Sometimes there’s two numbers… (fake sample)

4/4
Recurring
Payment
authorized
on
04/03
Whatever_Com
Some
random
text
Whatever_Com
10.81
20.20

Sometimes the number is a credit and sometimes the number is a debit, if the last two are numbers, it’s always…

10.81 (negative from balance)
20.20 (your balance after the purchase)

My question:
When trying to find the max value of a running sum(1), or the last two values, I find myself wanting to do a running sum of 1, grouped by the duplicated rowid column. Shown in the screenshot. The rowid is duplicated similar to this sample data source below…

id	text
0	4-20
0	text
0	text
0	24.5
1	5-21
1	text
1	text
1	5-20
1	text
1	55.5
1	200.20
2	6-1
2	text
2	text
2	text
2	6-3
2	text
2	text
2	60.40
2	520.10

The goal would be to learn how to do the repeating row id, or restarting moving aggregation or whatever we want to call it. Maybe there’s a tool i don’t know about. I feel the loop tool can handle it, but I’m curious what are the options.

End objective.

id	text	desire
0	4-20	1
0	text	2
0	text	3
0	24.5	4
1	5-21	1
1	text	2
1	text	3
1	5-20	4
1	text	5
1	55.5	6
1	200.20	7
2	6-1	1
2	text	2
2	text	3
2	text	4
2	6-3	5
2	text	6
2	text	7
2	60.40	8
2	520.10	9

Now I can find these max values easier. Maybe there’s a better solution. Would love to know.

Thanks for your time.

Thanks
Tyler

edits - fixing links to have underscore VS netflix dot com.

HansS · June 22, 2019, 1:14pm

Hi @Tyler

I created a workflow repeat_moving_aggregation.knwf (20.7 KB). It doesn’t use a Loop node or Moving Aggregation node. I used the rank node to rank you rowś within every id. With the CounterGeneration node I keep the original input order as is.
moving_average

gr
Hans

Tyler · June 22, 2019, 1:23pm

Hello @HansS!
This is exactly what I needed
Good bye data problems.
yes

I know kungfu now, thank you!

Tyler · June 22, 2019, 2:24pm

@HansS thanks again, wanted to mention counter generation is a lot easier than a running sum of a constant value 1… WAY easier, haha. Thank you very much. I love learning new tools.

system · June 29, 2019, 2:29pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.