Matching presence of particular columns and a row value

InsilicoConsulting · April 25, 2011, 9:52am

Hi,

I have a table with m columns and n rows. All the values are 1. This indicates that the entities representing the rows are positive for all columns (representing descriptors). All columns from the original table that had 0's for all rows or a mix of 0's and 1's have been filtered out.

The columns remaining names thus represent a consensus signature e.g.

C1 C12 C23 C24 C51

R1 1 1 1 1 1

R2 1 1 1 1 1

R3 1 1 1 1 1

Test table

C1 C2 C3 C4.........

R1 1 1 0 0

R2 1 1 1 1

R3 1 0 0 1

A second "test" table contains a superset of columns of the above table and a certain number of rows. It also has 1, 0

I want to filter those Rows from the second test table, where the columns names correspond with the train/consensus table and have the value 1. So in above example the output table would coontain R1 AND R2 provided other column names and values match. Columns absent in train/consensus tables are not used for matching.

Doing this with Reference row filter for each column in turn seems tedious! Thanks in advance!

gabriel · April 27, 2011, 4:07pm

I assume, you can use the Subset Matcher when translating your rows into itemsets of R1: C1, C2; R2: C1,C2,C3,C4; R3: C1,C4 and check them against your original (itemset) table.

InsilicoConsulting · May 7, 2011, 1:22pm

Thanks gabriel,

I will try that. I could get a result by using the column combiner and reference row filter to combine the column with the string obtained by combining all columns.

But what you suggest could allow more detailed comparison.

regards