SOLVED/ beginner level question/ how to interpret Weka generalized sequential patterns output?


My example data (product purchase sequence):


customer1 a a b e d
customer2 a b e a d
customer3 b e a a d
customer4 c a b a a
customer5 c d c a a


And Weka GSP output:


Number of cycles performed: 3
Total number of frequent sequences: 9
Frequent Sequences Details (filtered):
- 1-sequences
[1] <{a}> (2)
[2] <{b}> (2)
[3] <{a}> (3)
[4] <{d}> (2)
- 2-sequences
[1] <{a}{a}> (2)
[2] <{a,b}> (2)
[3] <{b}{a}> (2)
[4] <{a,d}> (2)
- 3-sequences
[1] <{a,b}{a}> (2)
Does "{a,b}{a}> (2)" mean that sequence {a,b}{a} appears two times in data?
- if so, why is sequence {a,a}  not shown? It appers four times in data.
Thank you!


Hello Markus,

I'm not completely sure about this, as i never used the gsp from weka.

But to answer your question, {a,a} is shown in the result. The result is grouped by the size of the pattern. {a,a} has size 2, so it appears in the section: 2-sequence;). And gps only found one frequent  3-sequence namely <{a,b}{a}>.

Another question is, is the input table complete and correct? usually {a, b} is used for an item set in your case  one order. But if I look at your example every order contains only one item... So actually the result you got does not fit the table.

I hope i could help ;)

Greets, Sebastian






Thank you. I did not undestand the item/ item set distiction. This clarified the output.