SOLVED/ beginner level question/ how to interpret Weka generalized sequential patterns output?

Hi!

My example data (product purchase sequence):

 

  I II III IV
customer1 a a b e d
customer2 a b e a d
customer3 b e a a d
customer4 c a b a a
customer5 c d c a a

 

And Weka GSP output:

 

GeneralizedSequentialPatterns
=============================
 
Number of cycles performed: 3
Total number of frequent sequences: 9
 
Frequent Sequences Details (filtered):
 
- 1-sequences
 
[1] <{a}> (2)
[2] <{b}> (2)
[3] <{a}> (3)
[4] <{d}> (2)
 
- 2-sequences
 
[1] <{a}{a}> (2)
[2] <{a,b}> (2)
[3] <{b}{a}> (2)
[4] <{a,d}> (2)
 
- 3-sequences
 
[1] <{a,b}{a}> (2)
 
Question:
 
Does "{a,b}{a}> (2)" mean that sequence {a,b}{a} appears two times in data?
- if so, why is sequence {a,a}  not shown? It appers four times in data.
 
Thank you!
 
Markus
 
 
 

 

Hello Markus,

I'm not completely sure about this, as i never used the gsp from weka.

But to answer your question, {a,a} is shown in the result. The result is grouped by the size of the pattern. {a,a} has size 2, so it appears in the section: 2-sequence;). And gps only found one frequent  3-sequence namely <{a,b}{a}>.

Another question is, is the input table complete and correct? usually {a, b} is used for an item set in your case  one order. But if I look at your example every order contains only one item... So actually the result you got does not fit the table.

I hope i could help ;)

Greets, Sebastian

 

 

 

 

 

Thank you. I did not undestand the item/ item set distiction. This clarified the output. 

Markus