Time Series Data

Hello,

I would like to analyse some time series data.

My time stamps consist of inputs which look like this: dd.MM.yyyy;HH.mm.ss.S . Thus they consist of a date and a time. I would like to display this data in a certain form. Each time stamp belongs to a string value called "work" or "waiting". I would like to display the data in the following form:

The time on the x-axis and the percentage of workpackages (so called "workflows") which have been in work at a certain time on the y-axis. Since each string value belongs to a disctrete time point there is a problem.

I think for this task I need to "cluster" the time in reasonable time periods and display the percentage of "workflows" which have been in work related to the number of workpackages existing.

So my first question and problem is how to figure out which range of time I should use. I need to figure out which time periods are most reasonable for my data.

Can anyone provide a solution or give me a hint?

Regards

Hi dinoso,

can you make an examplary (small) data set with your expected outcome?

Best, Iris

Hi dinoso,

it sounds like you're trying to make something like a histogram of your data. If that is correct, there's a good and a bad message.

Bad one first: The "perfect" size of a bucket can be highly dependant on the data. There's a lot of research going on about that, and a lot of confusing results to wade through.

The good message: Most of the time, taking the square root of your data range as a bucket size is a good bet. But that's only for actual histograms, so I'm not sure how applicable that is to your usecase.

In any case, you could try to experiment with the interactive histogram view and its bin count, maybe it'll show you something about your data that can help you find an answer.

Hi,

@ Iris:

My data looks like this:

FlowID Date/Time Status
4416784 2014-01-04T12:18:03.0 Work
4416785 2014-01-06T14:30:42.0 Waiting
4416785 2014-01-06T14:31:07.0 Work
4417433 2014-01-08T12:11:37.0 Waiting
4423203 2014-01-09T16:10:50.0 Waiting
4423203 2014-01-10T10:54:03.0 Work
4418777 2014-01-13T09:39:00.0 Work
4417495 2014-01-13T10:39:09.0 Waiting
4417491 2014-01-13T10:41:23.0 Waiting
4428256 2014-01-13T15:02:17.0 Work
4428256 2014-01-13T15:04:52.0 Waiting
4428321 2014-01-13T15:49:11.0 Work
4428329 2014-01-13T15:57:27.0 Work
4435811 2014-01-16T13:36:15.0 Work
4434430 2014-01-16T15:42:15.0 Work
4417486 2014-01-16T15:45:29.0 Work
5729288 2014-07-23T09:54:48.0 Work
5729968 2014-07-23T10:39:52.0 Work
4708755 2014-07-23T11:34:30.0 Work
5748266 2014-07-23T14:14:46.0 Work
5729974 2014-07-23T14:47:38.0 Work
4734484 2014-07-23T14:53:53.0 Work
5748278 2014-07-23T15:01:57.0 Work
4705468 2014-07-23T15:09:12.0 Work
4722731 2014-07-23T15:28:35.0 Waiting
5751852 2014-07-24T07:55:32.0 Work
5751963 2014-07-24T10:08:17.0 Work
4476872 2014-07-24T10:32:14.0 Waiting
4682440 2014-07-24T11:30:35.0 Waiting
5857535 2014-12-19T11:32:52.0 Work
6224752 2014-12-19T12:10:11.0 Work
4524250 2014-12-19T13:00:34.0 Waiting
5921008 2014-12-19T13:00:48.0 Waiting
5995204 2014-12-19T13:24:34.0 Waiting
6224971 2014-12-19T14:00:51.0 Work
5977692 2014-12-19T14:40:04.0 Waiting
5950606 2014-12-19T14:40:32.0 Waiting
4441488 2014-12-19T14:46:15.0 Waiting
6159885 2015-01-05T08:20:41.0 Waiting
4805659 2015-01-05T11:35:51.0 Work

Now, I want to count the amount of appearances of the word "waiting" and put it in relation to the total amount of rows. In this case this would be 16/40 for the whole period from January 2014 to January 2015.

As Marlin suggested I could use the square root of the total amount of rows. In the case above this would be the square root of 40 what is equal to 6,32. So I could use approcimately 6 dates as one Period and count for each period the relation of waiting to the total amount of rows. But I don't know how to do this actually:

1. I would like to make the analysis as generic as possible, so I can use it for larger or smalle data of a similar kind.

2. I'd don't know how to put 6 of the dates to a period, and there is nother problem. If I act as Marlin suggested there is a risk that some periods are much larger than other. I.e. one bin could include dates from January until March. Other Periods could range from March to December as my Data is not regular because it reflects for example a work task some worker is working on. This workpackages are called "Flows" in the first column in the table above. So "work" means that an employee is working on a certain worktask and does not complete it because a different worker has to work on this task also. So the workpackage pauses in the state waiting until the next employee has all his other work done and starts to work on the flow which is "waiting". This flow changes then again to "work" until the task is solved and the workpackage is entirelly done.

3. So my task is now to show the percentage of flows waiting measured an the total amount of "flows" (workpackages) during a certain period. (this will also show vice versa how many flows have been in work during the same period).

As I'm not that experienced in KNIME and still learning and beginning it would be very kind if one of you two could provide me some help on this :) Tha would be very kind.

Regards

 

 

Hi,

I've posted a message with data but it has been qued before beeing posted here. It takes quite long until the webmaster puts it to the forum?!

Regards

delay is due to spam filtering. Send a message to webmaster if you experience the problem again.

My data looks like this:

FlowID Date/Time Status
4416781 2014-02-24T09:17:05.0 Waiting
4416781 2014-02-24T09:17:17.0 Work
4416781 2014-09-17T10:49:03.0 Waiting
4416782 2014-06-02T12:30:03.0 Work
4416784 2014-01-04T12:13:22.0 Waiting
4416784 2014-01-04T12:18:03.0 Work
4416785 2014-01-06T14:30:42.0 Waiting
4416785 2014-01-06T14:31:07.0 Work
4416786 2014-01-23T16:07:47.0 Work
4417433 2014-01-08T12:11:37.0 Waiting
4417435 2014-03-18T08:28:06.0 Work
4417482 2014-01-21T11:35:54.0 Waiting
4417482 2014-09-02T09:32:16.0 Work
4417483 2014-01-21T11:14:13.0 Waiting
4417483 2014-05-27T11:05:23.0 Work
4417484 2014-01-21T09:59:13.0 Work
4417485 2014-01-20T15:45:58.0 Waiting
4417485 2014-01-28T10:13:39.0 Work
4417486 2014-01-16T15:45:29.0 Work
4417487 2014-01-31T15:40:32.0 Work
4417491 2014-01-13T10:41:23.0 Waiting
4417491 2014-02-04T16:25:25.0 Waiting
4417495 2014-01-13T10:39:09.0 Waiting
4417495 2014-01-24T08:12:38.0 Waiting
4417495 2014-02-07T11:41:41.0 Waiting
4417495 2014-02-19T12:08:56.0 Work
4417499 2014-01-16T17:22:37.0 Work
4417501 2014-05-06T10:05:08.0 Waiting
4417501 2014-05-22T11:15:58.0 Work
4417502 2014-02-26T11:39:50.0 Waiting
4417502 2014-03-03T08:50:09.0 Work
4417503 2014-02-27T14:33:00.0 Work
4417503 2014-03-25T14:26:47.0 Waiting
4417503 2014-04-17T08:58:05.0 Waiting
4417503 2014-06-05T14:58:47.0 Work
4417503 2014-06-17T15:06:08.0 Work
4417504 2014-02-07T11:40:51.0 Work
4417504 2014-02-17T13:10:24.0 Waiting
4417506 2014-01-17T10:06:39.0 Waiting

The data describes some worklflow information in working processes of a company. The number in the column "FlowID" represents a certain project. There is a database in which it is recorded if a certain employee is working on the project or if the project is pausing. This condition is called "waiting" in the example data shown at the top. 

I would like get something like a chart which shows the percentage of projects in work over the period of time at a certain point fo time. i don't know how i can manage this problem which is due to the fact that I#m still a beginner at KNIME. Maybe one of you guys can help me out or provides me some support or gives me a hint on this issue.

Regar

My data looks like this:

FlowID Date/Time Status
4416781 2014-02-24T09:17:05.0 Waiting
4416781 2014-02-24T09:17:17.0 Work
4416781 2014-09-17T10:49:03.0 Waiting
4416782 2014-06-02T12:30:03.0 Work
4416784 2014-01-04T12:13:22.0 Waiting
4416784 2014-01-04T12:18:03.0 Work
4416785 2014-01-06T14:30:42.0 Waiting
4416785 2014-01-06T14:31:07.0 Work
4416786 2014-01-23T16:07:47.0 Work
4417433 2014-01-08T12:11:37.0 Waiting
4417435 2014-03-18T08:28:06.0 Work
4417482 2014-01-21T11:35:54.0 Waiting
4417482 2014-09-02T09:32:16.0 Work
4417483 2014-01-21T11:14:13.0 Waiting
4417483 2014-05-27T11:05:23.0 Work
4417484 2014-01-21T09:59:13.0 Work
4417485 2014-01-20T15:45:58.0 Waiting
4417485 2014-01-28T10:13:39.0 Work
4417486 2014-01-16T15:45:29.0 Work
4417487 2014-01-31T15:40:32.0 Work
4417491 2014-01-13T10:41:23.0 Waiting
4417491 2014-02-04T16:25:25.0 Waiting
4417495 2014-01-13T10:39:09.0 Waiting
4417495 2014-01-24T08:12:38.0 Waiting
4417495 2014-02-07T11:41:41.0 Waiting
4417495 2014-02-19T12:08:56.0 Work
4417499 2014-01-16T17:22:37.0 Work
4417501 2014-05-06T10:05:08.0 Waiting
4417501 2014-05-22T11:15:58.0 Work
4417502 2014-02-26T11:39:50.0 Waiting
4417502 2014-03-03T08:50:09.0 Work
4417503 2014-02-27T14:33:00.0 Work
4417503 2014-03-25T14:26:47.0 Waiting
4417503 2014-04-17T08:58:05.0 Waiting
4417503 2014-06-05T14:58:47.0 Work
4417503 2014-06-17T15:06:08.0 Work
4417504 2014-02-07T11:40:51.0 Work
4417504 2014-02-17T13:10:24.0 Waiting
4417506 2014-01-17T10:06:39.0 Waiting

The data describes some worklflow information in working processes of a company. The number in the column "FlowID" represents a certain project. There is a database in which it is recorded if a certain employee is working on the project or if the project is pausing. This condition is called "waiting" in the example data shown at the top. 

I would like get something like a chart which shows the percentage of projects in work over the period of time at a certain point fo time. i don't know how i can manage this problem which is due to the fact that I#m still a beginner at KNIME. Maybe one of you guys can help me out or provides me some support or gives me a hint on this issue.

Regards.

My data looks like this:

FlowID Date/Time Status
4416781 2014-02-24T09:17:05.0 Waiting
4416781 2014-02-24T09:17:17.0 Work
4416781 2014-09-17T10:49:03.0 Waiting
4416782 2014-06-02T12:30:03.0 Work
4416784 2014-01-04T12:13:22.0 Waiting
4416784 2014-01-04T12:18:03.0 Work
4416785 2014-01-06T14:30:42.0 Waiting
4416785 2014-01-06T14:31:07.0 Work
4416786 2014-01-23T16:07:47.0 Work
4417433 2014-01-08T12:11:37.0 Waiting
4417435 2014-03-18T08:28:06.0 Work
4417482 2014-01-21T11:35:54.0 Waiting
4417482 2014-09-02T09:32:16.0 Work
4417483 2014-01-21T11:14:13.0 Waiting
4417483 2014-05-27T11:05:23.0 Work
4417484 2014-01-21T09:59:13.0 Work
4417485 2014-01-20T15:45:58.0 Waiting
4417485 2014-01-28T10:13:39.0 Work
4417486 2014-01-16T15:45:29.0 Work
4417487 2014-01-31T15:40:32.0 Work
4417491 2014-01-13T10:41:23.0 Waiting
4417491 2014-02-04T16:25:25.0 Waiting
4417495 2014-01-13T10:39:09.0 Waiting
4417495 2014-01-24T08:12:38.0 Waiting
4417495 2014-02-07T11:41:41.0 Waiting
4417495 2014-02-19T12:08:56.0 Work
4417499 2014-01-16T17:22:37.0 Work

The data describes some worklflow information in working processes of a company. The number in the column "FlowID" represents a certain project. There is a database in which it is recorded if a certain employee is working on the project or if the project is pausing. This condition is called "waiting" in the example data shown at the top. 

I would like get something like a chart which shows the percentage of projects in work over the period of time at a certain point fo time. i don't know how i can manage this problem which is due to the fact that I#m still a beginner at KNIME. Maybe one of you guys can help me out or provides me some support or gives me a hint on this issue.

Regar

My data looks like this:

FlowID Date/Time Status
4416781 2014-02-24T09:17:05.0 Waiting
4416781 2014-02-24T09:17:17.0 Work
4416781 2014-09-17T10:49:03.0 Waiting
4416782 2014-06-02T12:30:03.0 Work
4416784 2014-01-04T12:13:22.0 Waiting
4416784 2014-01-04T12:18:03.0 Work
4416785 2014-01-06T14:30:42.0 Waiting
4416785 2014-01-06T14:31:07.0 Work
4416786 2014-01-23T16:07:47.0 Work
4417433 2014-01-08T12:11:37.0 Waiting
4417435 2014-03-18T08:28:06.0 Work
4417482 2014-01-21T11:35:54.0 Waiting
4417482 2014-09-02T09:32:16.0 Work
4417483 2014-01-21T11:14:13.0 Waiting
4417483 2014-05-27T11:05:23.0 Work
4417484 2014-01-21T09:59:13.0 Work
4417485 2014-01-20T15:45:58.0 Waiting
4417485 2014-01-28T10:13:39.0 Work
4417486 2014-01-16T15:45:29.0 Work
4417487 2014-01-31T15:40:32.0 Work
4417491 2014-01-13T10:41:23.0 Waiting
4417491 2014-02-04T16:25:25.0 Waiting
4417495 2014-01-13T10:39:09.0 Waiting
4417495 2014-01-24T08:12:38.0 Waiting
4417495 2014-02-07T11:41:41.0 Waiting
4417495 2014-02-19T12:08:56.0 Work
4417499 2014-01-16T17:22:37.0 Work
4417501 2014-05-06T10:05:08.0 Waiting
4417501 2014-05-22T11:15:58.0 Work
4417502 2014-02-26T11:39:50.0 Waiting
4417502 2014-03-03T08:50:09.0 Work
4417503 2014-02-27T14:33:00.0 Work
4417503 2014-03-25T14:26:47.0 Waiting
4417503 2014-04-17T08:58:05.0 Waiting
4417503 2014-06-05T14:58:47.0 Work
4417503 2014-06-17T15:06:08.0 Work
4417504 2014-02-07T11:40:51.0 Work
4417504 2014-02-17T13:10:24.0 Waiting
4417506 2014-01-17T10:06:39.0 Waiting

The data describes some worklflow information in working processes of a company. The number in the column "FlowID" represents a certain project. There is a database in which it is recorded if a certain employee is working on the project or if the project is pausing. This condition is called "waiting" in the example data shown at the top. 

I would like get something like a chart which shows the percentage of projects in work over the period of time at a certain point fo time. i don't know how i can manage this problem which is due to the fact that I#m still a beginner at KNIME. Maybe one of you guys can help me out or provides me some support or gives me a hint on this issue.

Regards

My data looks like this:

FlowID Date/Time Status
4416781 2014-02-24T09:17:05.0 Waiting
4416781 2014-02-24T09:17:17.0 Work
4416781 2014-09-17T10:49:03.0 Waiting
4416782 2014-06-02T12:30:03.0 Work
4416784 2014-01-04T12:13:22.0 Waiting
4416784 2014-01-04T12:18:03.0 Work
4416785 2014-01-06T14:30:42.0 Waiting
4416785 2014-01-06T14:31:07.0 Work
4416786 2014-01-23T16:07:47.0 Work

The data describes some worklflow information in working processes of a company. The number in the column "FlowID" represents a certain project. There is a database in which it is recorded if a certain employee is working on the project or if the project is pausing. This condition is called "waiting" in the example data shown at the top. 

I would like get something like a chart which shows the percentage of projects in work over the period of time at a certain point fo time. i don't know how i can manage this problem which is due to the fact that I#m still a beginner at KNIME. Maybe one of you guys can help me out or provides me some support or gives me a hint on this issue.

Regards

There are probably a few ways to do this.

one way is to use a One2Many node on the Status Column.

then use a GroupBy node and choose to Aggregate on the two new columns, using Percent or Count for your aggregation method.

you could then do a bar chart on this.

you may even be able to skip the GroupBy step, and do a chart at that point on the two new columns and do the aggregation in the chart itself such as by count.

simon.

Hi Simon,

thank you for your support. But I would like to display a chart which shows me the certain percentage at each point of time. the gruop by node aggregates the percentage of the whole data table.

I would like to se the aggregation at every time point in the data starting with the earliest time.

I need something that iterates the aggregation as many times as my data has rows.

I think I have to do somethink with the loop stuff in knime but have no idea because I've never done it before. Are there webinars out there which show how I can use the loop stuff or can you suggest a good example on the knime server which I can examine for learning issues?

Regards

Hi Dinoso,

there are loop examples on the example server under 011_FlowVarsAndLoops. But it's actually pretty easy. I was intimidated at first, too, but now I use them all the time. The most important parts are these:

  • A loop consists of a some start node, some end node, and stuff in between. Feel free to combine (mostly).
  • The stuff in between can be connected to nodes outside of the loop, but only if they take input from there; the only way "out" is through the single loop end.
  • Many more complicated tasks can only be achieved with flow variables, so try to learn both things (maybe separately)
  • There are more specialised loop nodes like the parameter estimator or the recursive loops, that only work in pairs, not together with the other ones. Maybe ignore those first. The description is helpful here. If you feel comfortable with the basics, maybe experiment with the recursive ones, because they are the Mjolnir of Knime (heavy and slow, but oh so useful if Knime resists your other efforts)
  • Iterating over rows is simplest with a Chunk Loop Start node set to a chunks size of one

Hi Simon,

maybe you could provide a little example with the data I posted?

regards