feature selection and data Correlation Problem

Hi all, i am really new to big data and really wish can get some suggestions from community ,

i trying to analysis the SR (service request)  made by customers, and predict when will be the next request.

my planning is that i compute the time difference between the SR, take it as dependent variable (Y), and figure out what are the independent variables (X) which will affect the Y.

i try to use Linear Correlation to find out the independent variables, after that i use Linear Regression and mining technique to predict the next SR.

the problem is that the ID is not unique, i try to compute the new table (table2) with unique ID with the average of time difference (table 2 , sum all the time difference and divide with Number SR), the SR type i compute using one2many node in knime.

Can anyone give me suggestion on the flow? how to get the independent variables ?

Currently i have a table look like below (table1) :

 

ID No SR SR type SR time time_dif area pay_due
1 2 Bill 22/6/2014 ? Oga 200.00
1 2 fault 24/6/2014 2 Oga 200.00
2 4 Bill 10/7/2014 ? Seto 152.00
2 4 fault 15/7/2014 6 Seto 152.00
2 4 enquiries 22/7/2014 7 Seto 152.00
2 4 feedback 25/7/2014 3 Seto 152.00

                                                                                               table 1

ID No SR average area Bill Fault Enquir feedb pay_due
1 2 1 Oga 1 1 0 0 200.00
2 4 4 Seto 1 1 1 1 152.00

                                                                                                  table 2