Ruby Scripting

Preliminary Ruby scripting support: https://github.com/rssdev10/ruby4knime

The main purpose of the project is to make possible of using of clear and laconic Ruby style for interactive data analysis. The first step is integration of Ruby. But the second is a creation of good wrapper over Java classes that allow to use the Ruby style.

Simple examples see: https://github.com/rssdev10/ruby4knime/blob/master/RubyScript/rb/README.rdoc

Waiting for feedback and suggestions.

Now realized following node types:

  • Ruby Generator allows to generate any string or numeric data. Or process any external sources using Ruby.
  • Ruby Script allows to process input DataTable into output DataTable.
  • Ruby Script 2x2 allows to process 2 input DataTable into 2 output DataTable.
  • Ruby Snippet allows to process input DataTable into output DataTable by writing code only for one row processing.

Also available binary feature archive and sample workflow.

https://github.com/rssdev10/ruby4knime

Simple examples of the Ruby code

Ruby Script nodes

Copy all input rows into output:

$inData0.each { |row|  $outContainer0 << row }

 

Create new output table with new uniq rowkeys. The first column of an output table contains a copy of the column with index 0 of an input table. The second column contains the difference between input columns with indexes 1 and 2.

$inData0.each do |row|
  $outContainer << Cells.new.int(row[0].to_i).double(row[1].to_f - row[2].to_f)
end

Cells is utility class of the Ruby wrapper. It created for simplifying an addition of new table columns. Special methods int, double, string add new column with an appropriate type.

row[0] gets a value of column with index 0. This is equivalent of row.getCell(0).

This form of the new cells addition is appropriate for the Ruby Data Generator node.

 

It is also possible to add new column in the existing row. E.g. add new column that contains a length of the first column with type of string.

$inData0.each_with_index do |row, i|
  $outContainer0 << (row << Cells.new.int(row[0].to_s.length))
end

Special methods << realize an addition of new cells into the row and addition the row into the output DataTable instead of an explicit creation of the instance of AppendedColumnRow and addRowToTable call.

 

Also the same example but the progress state indication added.

count = $inData0.length

$inData0.each_with_index do |row, i|
  $outContainer << (row << Cells.new.int(row[0].to_s.length))

  setProgress "#{i*100/count}%" if i%100 != 0
end

The construction setProgress "#{i*100/count}%" if i%100 != 0 allows to optimize a time to redraw of the current progress in tables with large number of rows.

 

The following example is the code for reorganizing of input rows into columns in the order 3 rows into 3 columns. That is

0
1
2
3
4
5

Converts into:

0          1          2

3          4          5

 

$inData0.map{|row| row[0].to_i}.each_slice(3) do |vec|
  $outContainer << Cells.new.
                     int(vec[0].to_i).
                     int(vec[1]).
                     int(vec[2])
end

In this code the method map{|row| row[0].to_i} produces the array of integer from column with index 0 by all rows. The method each_slice(3) combines items of that array and produces a new array object that used via vec-variable.

 

Example of complex sorting. Sort input rows by string in the order of reverse of each string.

$inData0.map{|row| row[0].to_s}.sort_by{|s| s.reverse}.each do |str|
    $outContainer << Cells.new.string(str)
end

In this code the method map{|row| row[0].to_s} produces the array of Ruby-string from column with index 0 by all rows. The method sort_by{|s| s.reverse} generates a sorted array where the order of sorting set by result of symbol reversing in the string.

 

Example for Ruby Snippet node

The Ruby Snippet operates with row level.  Therefore the code is slightly shorter.

Example: parse string with formatted numbers into separate columns of double type. The column e.g. contains data in format: [1.123, 2.234, 3.345]. Real data may be complex.

row[0].to_s.delete('[]').split(',').map(&:to_f).
  reduce(Cells.new){|row, item| row.double(item)}

Now step by step. The method to_s converts the column 0 to a string. The method delete('[]') for that string deletes useless symbols. The method split(',') generates an array with stings those contain numbers “1.123”, “2.234”, “3.345”. The method map(&:to_f) applies the conversation to double for each element of that array and generates new array if doubles. Cells.new prepares the container for new cells. And the method reduce(Cells.new) returns that cells-container with number of columns equal to number of items in the input array.

 

If you want to add new columns into existing table the only that you need is the following modification:

row << row[0].to_s.delete('[]').split(',').map(&:to_f).
  reduce(Cells.new){|row, item| row.double(item)}

 

Waiting for feedback.

Updated Ruby script nodes. Some changes:

  • Added Ruby syntax highlighting.
  • Added script execution error processing and highlighting in the source.
  • Added possibility to create an output table with any KNIME data type. Simply type the type in a table. Note: supported only full qualified Java class names!
  • Refactored Ruby mediator over Java.

New Ruby possibilities:

  • Added unified global variable: $num_inputs, $input_datatable_arr, $num_outputs, $output_datatable_ar and $in_data_0, $in_data_1, $out_data_0, $out_data1.

Example (copy all data from all inputs to outputs. Applied for Ruby Script 2x2):

(0..1).each do |i|
  out = $output_datatable_arr[i]
  $input_datatable_arr[i].each do |row|
    out << row
  end
end
  • Added dynamic generation of methods for accessing to cells by name of column.

Example of the code for Ruby Snippet. Source table contains columns with names 'x', 'Y1', 'Y2', 'y1(1)', 'y2(1)':

row << Cells.new.double(row[0].to_f)
                 double(row.y1.to_f - row.y2.to_f)

 

Sources and ready for use binary assembly see https://github.com/rssdev10/ruby4knime

Updated Ruby4KNIME

  • added flow variables support. It is possible now to read, modify and create new flow variables.
  • added visual selection of flow variables
  • added visual selection of input columns directly in script code tab
  • updated jruby to 1.7.18
  • removed previous synchronization workaround for jruby 1.7.13
  • built with KNIME 2.11

See https://github.com/rssdev10/ruby4knime

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.