I am attempting to write a JPython Script 1:1 that will take an input set of compounds with generated fingerprints and produce a Tannimoto similarity matrix as an output table. Conceptually, this is quite simple, but I'm having a very hard time sorting out the necessary details from the KNIME API. Any tips??
As I've gotten a teeny bit deeper I've begun to wonder if this is even possible. I notice that the JPython components need to have the number of incoming table columns plus the number of new columns (specified in the Configure dialog) equal the number of output table columns. Making a Tanimoto similarity matrix will result in a square table with the number of rows (coming in from the input table) equal to the number or columns. Can I generate a completely new table that has nothing to do with the incoming table in a JPython component? Or, do I need to develop a new component? I may be learning JAVA soon, eh?
you are right, the current Jython node does always add columns to an existing table - which is a bit odd, I must confess. We hardly ever use this node ourselves here since we tend to simply write new java based nodes. After all there is this cute wizard ;-)
Anyway, for your most immediate need (producing a similarity matrix) maybe you can simply append to the original matrix and throw out the useless columns via the column filter later?
But you will need to change the output spec every time the number of compounds (rows) changes since the number of columns will depend on this... So the 1:1 Scripting Node is not really well suited to generate tables with variable numbers of output columns that depend on the length of the input matrix, for example.
Stuff like that can be done sooooo much easier directly in java! And you allow KNIME to control execution and the data handling, too....
Thanks for the response, especially considering you are likely preparing for next week's conference.
Yes, JAVA would be an exceptional option. Unfortunately, I don't know JAVA well enough. It is, however, now on my list of things to learn. Sun can thank you for that. 8^) I will most certainly be learning in order to develop new nodes to do my bidding.
In the meantime, I found a workaround. My current workflow consists of an SDF Reader->SDF to CDK. From here, I link to a Transposer, then link the SDF to CDK output and the Transposer output to a Joiner. The result is that I end up with a square matrix having the dimensions of the number of molecules in my original SDF. I can easily filter out the the extra molecule column/row either before or after the Transpose/Joiner step. Then I merely need to fill in the content of the matrix and I can access each column by the name of the original molecule which is also reflected on the row titles. It works out quite ideally.
I have to say that I'm impressed at what can be done in KNIME with just a little creativity. Nice work, gentlemen.