I have a dataset with cinema transactions from one cinema. I would like to make tree decision learner to predict which seats are attractive for customers. Therefore I would like to create a cell for each row and seat that has been ordered.
The problem is, that in most of the transactions the customer has bought more than one seat and the table looks like this:
11/12-18 00:00:00 Row 1, Seat 1, 2
11/12-18 00:00:00 Row 2, Seat, 2, 4
I would like to create a table that looks like this, so I can see how many times each seat has been occupied:
11/12-18 00:00:00 Row 1, seat 1
11/12-18 00:00:00 Row 1, seat 2
11/12-18 00:00:00 Row 2, seat 2
11/12-18 00:00:00 Row 2, seat 4
Here is a very clunky workflow that produces the requested results. It basically does a bunch of text manipulation steps so that an Unpivot node can be brought to bear on the data. If you are good with RegEx - unlike me! - you could do this in a much more straightforward way. There are a few reasons this is so clunky:
- I assumed the input rows are a single string that must be processed
- the delimiters in such a string are inconsistent (mix of spaces and commas)
- The number of seats can vary
- I avoided use of lists and collections
All of the above add certain complications. I also assumed that seats will only be purchased on a single row, which is probably not correct. Still, bearing all that in mind… it works.
UnpivotingExample.knwf (26.3 KB)
I also tried to look at the RegEx for a short period, but I’m like you and cannot figure out how to use that one.
But clunky or not - the workflow that you created did EXACTLY what I was looking for!
Thank you very much for your time and help with this - now I can finish what I started
I hope you have a very nice day
Great! Glad it helped you.
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.