help on Bitvector for item set mining needed

Hi all,

I'm trying to use the AssociationRuleLearner node to search for frequent itemsets. This node takes a Bitvector as input but I don't know the data structure and semantic of this class.

My data is a simple table with 2 columns (TransactionID, ItemID) representing a n:m relationship and I don't find a Node manipulator that can transform it to the appropriate Bitvector representation.

So, does a node can perform this operation (like Bitvector Generator for example, but I didn't succeed with it) ? or do I have to implement it ? Any suggestions ?

thanks for your help.

lamboringo wrote:

... I don't find a Node manipulator that can transform it to the appropriate Bitvector representation.

The BitvectorGenerator generates bitvectors out of your data and its output is used as the input for the Association Rule Learner.

But you are right: There isn't any node in KNIME to transform the kind of data you described into the input format the BitvectorGenerator expects.
One possible input format for the BitvectorGenerator is a table with the Transaction ID is the RowID and the Item IDs are space separated in one single String column. For example

  1. 1 9 12 55 102
  2. 2 12 34 108
  3. 66 102

and so on... As far as I know this is the standard input format for association rule mining
(e.g. see http://fimi.cs.helsinki.fi/).

The other possible input format would be a table where the TransactionID is the RowID and each item is a column with value = 1 if the item was present in this transaction and 0 otherwise. But this would generate large tables.

lamboringo wrote:

So, does a node can perform this operation (like Bitvector Generator for example, but I didn't succeed with it) ? or do I have to implement it ? Any suggestions ?

If you have no means to transform your data into one of the above described formats you can write your own node (as described in http://www.knime.org/extension.html). If you take care that the Transaction IDs are sorted you only have to iterate over the rows with the same Transaction ID and put the values of the Item IDs in a space separated String and provide this as the output. Then connect your node with the BitVectorGenerator and convert it into bitvectors.

If you try this don't hesitate to ask further questions if you encounter any problems.

Hope that helped a bit.

[/]

Thanks for the quick clear answer.

I was impatient, so I eventually achieve my goal with a perl script that
1. translate ItemIDs associated to a transaction to integers,
2. run an apriori program from a publication,
3. translate back frequent itemsets to my ItemIDs (plus a more detailed description).

But, I'm willing to implement a node that does this (I'm just starting using Knime and it seems a really nice platform). I don't know when I have time to do that, but then, I will need your input, I think.

I was thinking of a node for data manipulation as such:
input: a table
configure: select transID and itemID
output: usual transaction representation as a set of integers per line (String? I have to check the Bitvector Generator's input).

And then, use the Bitvector Generator node.

However, I'm wondering about the best way to implement the translation back to original ItemIDs.

lamboringo wrote:

But, I'm willing to implement a node that does this (I'm just starting using Knime and it seems a really nice platform). I don't know when I have time to do that, but then, I will need your input, I think.

I was thinking of a node for data manipulation as such:
input: a table
configure: select transID and itemID
output: usual transaction representation as a set of integers per line (String? I have to check the Bitvector Generator's input).

And then, use the Bitvector Generator node.

Well, to save your time I present you a sketchy implementation of how your Node's execute method should look like: (paste it into an editor to see it in better formatted)

    /**
     * @see org.knime.core.node.NodeModel#execute(
     * org.knime.core.node.BufferedDataTable[], 
     * org.knime.core.node.ExecutionContext)
     */
    @Override
    protected BufferedDataTable[] execute(BufferedDataTable[] inData,
            ExecutionContext exec) throws Exception {
        // first of all you have to create the spec of your node's output
        // since this also have to be provided in the configure method 
        // it is recommended to put this in an extra method
    // DataColumnSpecs are created with the builder DataColumnSpecCreator
    
    // the column spec of the column containing the TransactionIDs
    DataColumnSpecCreator tidCreator = new DataColumnSpecCreator(
            inData[0].getDataTableSpec().getColumnSpec(0));
    // the column spec of the item ids -> 
    // results in a space separated string 
    DataColumnSpecCreator iidCreator = new DataColumnSpecCreator("ItemID", 
            StringCell.TYPE);
    // the output spec contains the column specs
    DataTableSpec outputSpec = new DataTableSpec(tidCreator.createSpec(),
            iidCreator.createSpec());
    // a data structure to iteratively add rows (DataTables are read-only)
    BufferedDataContainer container = exec.createDataContainer(
        outputSpec);
    
    // now iterate over the input table
    int oldTID = -1;
    StringBuilder builder = new StringBuilder();
    int rowNr = 0;
    for (DataRow row : inData[0]) {
        // validate in the configure method that the first column
        // is indeed of type int
        // for more convenience you can make 
        // the columns adjustable in a dialog 
        int tid = ((IntValue)row.getCell(0)).getIntValue();
        // first transactionID
        if (oldTID < 0) {
            oldTID = tid;
        }
        // same transaction ID as in the row before
        if (tid == oldTID) {
            builder.append(row.getCell(1) + " ");
        } else {
            // sets the progress message when mousing over the node
            exec.setMessage("processed row no. " + rowNr);
            // checks if you cancelled the execution
            exec.checkCanceled();
            // found new tid -> 
            // flush the string builder into the StringCell 
            DataRow newRow = new DefaultRow(
                    // the row ID of the row 
                    new RowKey(new StringCell("Row" + rowNr++)),
                    // the transaction ID
                    new IntCell(oldTID),
                    // the newly created space separated string
                    new StringCell(builder.toString())
                );
            container.addRowToTable(newRow);
            // flush the string builder
            builder = new StringBuilder();
            // and append the item belonging to the new TID
            builder.append(tid + " ");
        }
        oldTID = tid;
    }
    // and add the last row
    DataRow newRow = new DefaultRow(
            // the row ID of the row 
            new RowKey(new StringCell("Row" + rowNr++)),
            // the transaction ID
            new IntCell(oldTID),
            // the newly created space separated string
            new StringCell(builder.toString())
        );
    container.addRowToTable(newRow);
    container.close();
    // that's it, close the container and create a BuffredTable out of it
    // it will be provided at the outport of your node
    return new BufferedDataTable[] {exec.createBufferedDataTable
            (container.getTable(), exec)};
}

Of course, the indices of the columns should not be hard-coded, and so on...

Quote:

However, I'm wondering about the best way to implement the translation back to original ItemIDs.

Good point. Never wondered about this data can be further processed. Maybe it would be helpful to have the ItemIDs as columns or at least without the "item" at the beginning. Next release :wink:

Hi

I have been trying to implement the mba in knime and iam trying to find the frequent itemsets as i used the itemset finder and association ruler iam getting the frequent itemsets for attributes i want to find the frequent itemsets for the data. So can please help me complete this