My time stamps consist of inputs which look like this: dd.MM.yyyy;HH.mm.ss.S . Thus they consist of a date and a time. I would like to display this data in a certain form. Each time stamp belongs to a string value called "work" or "waiting". I would like to display the data in the following form:
The time on the x-axis and the percentage of workpackages (so called "workflows") which have been in work at a certain time on the y-axis. Since each string value belongs to a disctrete time point there is a problem.
I think for this task I need to "cluster" the time in reasonable time periods and display the percentage of "workflows" which have been in work related to the number of workpackages existing.
So my first question and problem is how to figure out which range of time I should use. I need to figure out which time periods are most reasonable for my data.
it sounds like you're trying to make something like a histogram of your data. If that is correct, there's a good and a bad message.
Bad one first: The "perfect" size of a bucket can be highly dependant on the data. There's a lot of research going on about that, and a lot of confusing results to wade through.
The good message: Most of the time, taking the square root of your data range as a bucket size is a good bet. But that's only for actual histograms, so I'm not sure how applicable that is to your usecase.
In any case, you could try to experiment with the interactive histogram view and its bin count, maybe it'll show you something about your data that can help you find an answer.
Now, I want to count the amount of appearances of the word "waiting" and put it in relation to the total amount of rows. In this case this would be 16/40 for the whole period from January 2014 to January 2015.
As Marlin suggested I could use the square root of the total amount of rows. In the case above this would be the square root of 40 what is equal to 6,32. So I could use approcimately 6 dates as one Period and count for each period the relation of waiting to the total amount of rows. But I don't know how to do this actually:
1. I would like to make the analysis as generic as possible, so I can use it for larger or smalle data of a similar kind.
2. I'd don't know how to put 6 of the dates to a period, and there is nother problem. If I act as Marlin suggested there is a risk that some periods are much larger than other. I.e. one bin could include dates from January until March. Other Periods could range from March to December as my Data is not regular because it reflects for example a work task some worker is working on. This workpackages are called "Flows" in the first column in the table above. So "work" means that an employee is working on a certain worktask and does not complete it because a different worker has to work on this task also. So the workpackage pauses in the state waiting until the next employee has all his other work done and starts to work on the flow which is "waiting". This flow changes then again to "work" until the task is solved and the workpackage is entirelly done.
3. So my task is now to show the percentage of flows waiting measured an the total amount of "flows" (workpackages) during a certain period. (this will also show vice versa how many flows have been in work during the same period).
As I'm not that experienced in KNIME and still learning and beginning it would be very kind if one of you two could provide me some help on this :) Tha would be very kind.
The data describes some worklflow information in working processes of a company. The number in the column "FlowID" represents a certain project. There is a database in which it is recorded if a certain employee is working on the project or if the project is pausing. This condition is called "waiting" in the example data shown at the top.
I would like get something like a chart which shows the percentage of projects in work over the period of time at a certain point fo time. i don't know how i can manage this problem which is due to the fact that I#m still a beginner at KNIME. Maybe one of you guys can help me out or provides me some support or gives me a hint on this issue.
The data describes some worklflow information in working processes of a company. The number in the column "FlowID" represents a certain project. There is a database in which it is recorded if a certain employee is working on the project or if the project is pausing. This condition is called "waiting" in the example data shown at the top.
I would like get something like a chart which shows the percentage of projects in work over the period of time at a certain point fo time. i don't know how i can manage this problem which is due to the fact that I#m still a beginner at KNIME. Maybe one of you guys can help me out or provides me some support or gives me a hint on this issue.
The data describes some worklflow information in working processes of a company. The number in the column "FlowID" represents a certain project. There is a database in which it is recorded if a certain employee is working on the project or if the project is pausing. This condition is called "waiting" in the example data shown at the top.
I would like get something like a chart which shows the percentage of projects in work over the period of time at a certain point fo time. i don't know how i can manage this problem which is due to the fact that I#m still a beginner at KNIME. Maybe one of you guys can help me out or provides me some support or gives me a hint on this issue.
The data describes some worklflow information in working processes of a company. The number in the column "FlowID" represents a certain project. There is a database in which it is recorded if a certain employee is working on the project or if the project is pausing. This condition is called "waiting" in the example data shown at the top.
I would like get something like a chart which shows the percentage of projects in work over the period of time at a certain point fo time. i don't know how i can manage this problem which is due to the fact that I#m still a beginner at KNIME. Maybe one of you guys can help me out or provides me some support or gives me a hint on this issue.
The data describes some worklflow information in working processes of a company. The number in the column "FlowID" represents a certain project. There is a database in which it is recorded if a certain employee is working on the project or if the project is pausing. This condition is called "waiting" in the example data shown at the top.
I would like get something like a chart which shows the percentage of projects in work over the period of time at a certain point fo time. i don't know how i can manage this problem which is due to the fact that I#m still a beginner at KNIME. Maybe one of you guys can help me out or provides me some support or gives me a hint on this issue.
one way is to use a One2Many node on the Status Column.
then use a GroupBy node and choose to Aggregate on the two new columns, using Percent or Count for your aggregation method.
you could then do a bar chart on this.
you may even be able to skip the GroupBy step, and do a chart at that point on the two new columns and do the aggregation in the chart itself such as by count.
thank you for your support. But I would like to display a chart which shows me the certain percentage at each point of time. the gruop by node aggregates the percentage of the whole data table.
I would like to se the aggregation at every time point in the data starting with the earliest time.
I need something that iterates the aggregation as many times as my data has rows.
I think I have to do somethink with the loop stuff in knime but have no idea because I've never done it before. Are there webinars out there which show how I can use the loop stuff or can you suggest a good example on the knime server which I can examine for learning issues?
there are loop examples on the example server under 011_FlowVarsAndLoops. But it's actually pretty easy. I was intimidated at first, too, but now I use them all the time. The most important parts are these:
A loop consists of a some start node, some end node, and stuff in between. Feel free to combine (mostly).
The stuff in between can be connected to nodes outside of the loop, but only if they take input from there; the only way "out" is through the single loop end.
Many more complicated tasks can only be achieved with flow variables, so try to learn both things (maybe separately)
There are more specialised loop nodes like the parameter estimator or the recursive loops, that only work in pairs, not together with the other ones. Maybe ignore those first. The description is helpful here. If you feel comfortable with the basics, maybe experiment with the recursive ones, because they are the Mjolnir of Knime (heavy and slow, but oh so useful if Knime resists your other efforts)
Iterating over rows is simplest with a Chunk Loop Start node set to a chunks size of one