Documentation and features

Hi,
I work as market analyst and for some reasons search statistical program for me and my collegues. Earlier they used SPSS. As I see, KNIME have a lot of functions, and very good desing, so we can use it, probably. But I had some problems, when begun work with KNIME, may be some of them is "child":
I couldn't find something like crosstable and correlation table.Are they really absent?
When I use histogram or pie chart can I change font or position of the labels any way?
When I use pie chart, can I use, as example sum by another field, instead of count?
And, finally, I cann't find full user guide.
Can you help me in this questions?
Thank you very much!

Hi Max,

with respect to the correlation table: We plan to release an add-on package very soon (soon = few weeks) that also contains a node to compute the correlation table. This node will have an accompanying view that shows the correlation measures by means of color grades, so it should be easily possible to visually identify correlated variables (columns).

wit respect to "crosstable": I'm not sure what a crosstable is. A google search yields http://www.spsslog.com/2006/05/27/how-to-make-a-cross-table/. That looks a bit what in Excel is called "Pivoting"? If that's what it is: This add-on package will also contain a node to do pivoting, i.e. select a pivot, an aggregation, and a data column, then select the aggregation type (count, sum, average, ...), let it run, finally the output table will contain the pivot table.

So, these nodes are currently absent but will be availabe shortly. Stay tuned!

Regards,
Bernd

Thank you for so detail answer!
Sorry,I used not common term "сross table". Of course, cross table and pivoting table is the same things.

Hi Max,

When you say market analysis are you doing financial, marketing or some other type of work?

I'm interested in seeing what applications people are using Knime for.

Best Regards,

Jay

Hi Bernd,

What else is coming in the package? There was some discussion previous about aggregations and variable creation. http://tech.knime.org/node/20166 Will the group by node include aggregations like first and last or median in addition to the regular stuff? This can be quite useful.

Another useful ability to have would be the create a rank variable with the possiblity of using group variables to create "intra-group" rankings.

A node for derived variable creation would also be very useful. It was briefly mentioned in that link's post above.

I’ll keep my eye out for your new node pack.

Best Regards,

Jay

Hi Jay,

the package will probably contain these nodes:

    - Polynomial Regression ("Learner" & Applier),
    - Joiner (more advanced, not only row ID based but also on user selected columns),
    - (Linear) Correlation & Correlation Filter,
    - Low Variance Filter,
    - Multi-Dimensional-Scaling,
    - Pivoting,
    - Enrichment Plotter,
    - Naive Bayes Learner & Predictor and
    - DB-Reader supporting Blobs and Clobs (making the currently available one obsolete in the future).
All of these nodes are currently under review (which delays things a bit).

Not sure if the Group-By node will make it in time since we are still working on it. We plan to provide the following aggregations for numeric columns: "Mean", "Min", "Max", "Sum", "Variance". For categorical columns we will offer "Mode" (most frequent value in column). "First" and "Last" seems to be useful (thanks for the hint!) so we will also include those for both numeric and categorical columns. We decided to not include "Median" for now since that makes it complicated (computing the median requires either to have the data in main memory or otherwise sort it beforehand (correct me if I'm wrong) - all the above can easily be determined within a single scan on the table). We will certainly add a "Median" option in future versions, though.

Quote:
Another useful ability to have would be the create a rank variable with the possibility of using group variables to create "intra-group" rankings.

You lost me somewhere between "create rank variable" and "group variables". Can you elaborate a bit? Is that still referring to the group-by node? Help.

Quote:
A node for derived variable creation would also be very useful. It was briefly mentioned in that link's post above.

That I didn't get either. I guess you don't mean the functionality that's provided in the java snippet node (append a column as function of the input columns).

Regards
Bernd

[/]

Hi Max,

I just read your first posting in this thread once more and figured that I didn't comment on all of your questions.

Quote:
When I use histogram or pie chart can I change font or position of the labels any way?

To be honest: No! There is currently no way of changing font or label positions in either of these nodes. I was always tempted to think that our histogram implementation is actually too feature rich to be useful. But your question leads me to think differently. 8)

Maybe this is somewhat unrelated to your question but people have also asked for more powerful reporting capabilities within KNIME (take the histogram view from node A + the output table from node B + scatterplot from node C and create a nice looking PDF of it). We don't have the resources to implement that in the very near future but once we get to this point we may also have those view settings (such as font size etc). Let's see.

Quote:
When I use pie chart, can I use, as example sum by another field, instead of count?

Our pie chart is currently based on the JFreeChart library and by far not as customizable as the histogram. But hopefully the Group-By node will give you those statistics?!
Quote:
And, finally, I cann't find full user guide.

There is the quickstart guide that should make the user comfortable with the KNIME workbench, and help for individual nodes is available in the "Node Description" window (select a node and it will display all sorts of helpful information to this node). If you accidentally closed the node description window, you can get it back by selecting the entry in the "View" menu.

- Bernd

Jay wrote:
Hi Max,

When you say market analysis are you doing financial, marketing or some other type of work?

I'm interested in seeing what applications people are using Knime for.

Best Regards,

Jay

Hi, Jay,

We work at B2B sector, as industrial markets researchers

Best Regards,
Max

Hi Bernd,

Really, I'm very glad to see so complete answers for my questions.
I wait new features with impatience, and will be happy test it.
Of course, pie diagrams and histograms from KNIME looks more beautiful, than from many other applications. So, I agree with you, that question about fonts editing is not too fundamental at the current moment.
And, about reporting tools. Some days ago I tried to use KNIME Reporting (BIRT) Feature, and as I saw it cann't work properly with Russian. Is it bug, or I can change language parameters and correct this?

Best regards,
Max

Quote:
And, about reporting tools. Some days ago I tried to use KNIME Reporting (BIRT) Feature, and as I saw it cann't work properly with Russian. Is it bug, or I can change language parameters and correct this?

That's a bug. We haven't tested KNIME with respect to different languages/encodings (one of the benefits of java is that it (typically) doesn't have a problem with different encodings .. .I thought).

I assume you have been reading a file containing cyrillic characters (or got that from a database connection ...) and then connected the output to the pdf writer? I can try to reproduce it.

Thanks for the problem report!
Bernd

Quote:

I assume you have been reading a file containing cyrillic characters (or got that from a database connection ...) and then connected the output to the pdf writer?

Yes, it's so. And HTML Report have the same problem too. There is one interesting moment - cyrillic symbols in columns names reading good. All other cyrillic symbols in data have code #xfffd in HTML version.

Hi Bernd,

I glad to hear about the joiner node getting upgraded. I assume it will support INNER JOINS. Will outer joins also be possible?

Is there anything else on the horizon (not in this release) for data manipulation such as: a node to filter for unique(distinct) values? Stratified sampling as well as discretization nodes like equi-count and equi-range would be good editions.

The PIVOT node is a great addition! Dataset preparation nodes around time-oriented datasets would be a really strong addition to the platform. Alot of data in industry is structured this way.

For ranking I was thinking of a rank function to apply to a column. The option to supply multiple "grouping/subset" comuns would be excellent as well. This leads to the variable creation node. Just like the Java node you mentioned but some kind of GUI interface (perhaps exposing Java or Knime functions) for creating dervied variables (both transforms and calculated variables).

I really wish my Java skills were already there. A guide to adding nodes for beginners in Java would be great!!!

Best Regards,

Jay

Quote:

I really wish my Java skills were already there. A guide to adding nodes for beginners in Java would be great!!!

We were hoping the guide already available online does serve this purpose? It is not supposed to teach how to program in Java, of course, but once someone knows the Java basics it should be easy to get a new node working.

http://www.knime.org/extension.html#section2

But do let us know if you feel it needs improvement (and where). Thanks!

Hi Jay!

Quote:
I glad to hear about the joiner node getting upgraded. I assume it will support INNER JOINS. Will outer joins also be possible?

We decided to make the join based on the row key column of the first table and a user-selected column (or the row key) of the second table. There is going to be an option as to how to handle missing counterparts (basically decide between inner and outer join).

Quote:
Is there anything else on the horizon (not in this release) for data manipulation such as: a node to filter for unique(distinct) values? Stratified sampling as well as discretization nodes like equi-count and equi-range would be good editions.

The group-by node will allow you to filter for unique values. Stratified sampling has been added as an option in the "Row Sampling" node (available in the next major release). For binning/discretization: Apart from the (manual) binner node, we also have a CAIM (class-attribute interdependence maximization) binner implementation. Look for "CAIM" in the node repository and refer to the node description for details. I'm not aware of anyone in the group working on another binner implementation (but equi-count/range binning isn't complicated - maybe that's a good excuse to start learning Java and KNIME node development?)

Quote:
This leads to the variable creation node. Just like the Java node you mentioned but some kind of GUI interface (perhaps exposing Java or Knime functions) for creating dervied variables (both transforms and calculated variables).

We are going to have a "Math Formula" node in future versions, which allows the user to derive a new variable (similar to the java snippet node just without the requirement to know java). It also enables access to column based constants (mean, min, max) - actually you can get the equi-range binning implemented without coding :)

Regards
Bernd

Quote:

as well as discretization nodes like equi-count and equi-range would be good editions.

By the way, the extension howto exactly describes how to write a simple equi-distant binner. If you follow the extension howto http://www.knime.org/extension.html#section3
you will learn how to write your own node in KNIME and you will get exactly what you want :D

If you don't have the time to learn Java you may download the source code from the tutorial
http://www.knime.org/extension.html#code and install it into your KNIME installation.

Hi Bernd,

Thanks for the reply. It sounds like there will be many useful additions to the platform.

The Math Formula node sounds useful (like the Math formula filter in Weka?). One that exposes string functions from Java in an easy to use way would also be useful. Stuff like toLowerCase, toUpperCase, concatenate, conditional actions with equals, etc... I know it's not too hard to do that in the Java node but it would definitely go along with your Math Filter node and it would ease the use of the platform for beginners.

I am actually using a Java based application for something else now so it looks like I've been thrust into learning Java. In the next few months I'll see if I can put together my first node.

As for the Joiner node it would be very useful to have both sides of the join be on user selected columns from both tables. This would reduce the amount of step required or data preparation outside of the environment.

Any ideas on how to ease the production of aggregate data sets for learners from time-oriented data?

Best Regards,

Jay

Hi Jay and Max,

I don't know if you are on the email distribution list and received the notification that the add-on and math-formula packages are out. That should also answer your question, Jay, as to what's the functionality of the math node.

Regarding the time-oriented data. We plan to share a time-series node package with the community (leave the community to drive the development - Jay, you'd better learn java :wink:)

Best regards,
Bernd

Jay,

Bernd forgot to mention that a few of those "toUppercase" type nodes will show up in v1.3 (thanks to Kudos Pharmaceuticals for sponsoring their development!) and so will an outter and inner join node. The Math node really only does math - no string manipulations.

- Michael

Hi,

The new add-ons look great! I'll experimented a little with them this morning and I'll continue to try them out.

Bernd, I have been working with Java a little over the last while and am becoming a little more familiar with the language. I definitely have a long way to go though. I'll take a look at the time series package when it comes out.

Michael, that excellent! These kind of utilities in Knime will definitely make the platform even easier for new users and people new to java. The joins and possibly aggregations nodes will make it possible to do more and more data prep in the platform "out-of-the-box".

Knime is getting better and better with each release. The platform is very impressive. I saw that the conference is now posted on the site. It looks excellent! Everyone should take a look at that.

Best Regards,

Jay

Hi Jay,

small correction on the joiner node that Michael (erroneously) promised for v1.3. The new joiner is already available in the add-on package, there won't be yet another joiner implementation in v1.3 (instead we will make the joiner, which originally came with v1.2, obsolete). The one in the add-on package allows the user to specify inner and outer join (though the join will be performed on the row key column of the top input table).

Bernd