Using knext.MultiColumnParameter

Hi there,

I am currently working on a rather simple node that aims to be able to streamline date formats across multiple columns.

The node works for now as intended, except for 2 points.

To illustrate the matter, here is a picture of the test table:

Here is a picture of the config panel:

The first one is not that important, but just to mention it in case someone has a brilliant simple idea, there is an option to convert the dates back to String format, and if not ticked, the format will be Local Date Time. If that option is unchecked, there is a warning : DataSpec generated by configure does not match spec after execution.
Which is logical as the column types are different and my configure just returns the input table schema.

The second and more important issue is that I can get the MultiColumnParameter to work, and the documentation doesn’t provide much help here.
Here is my variable definition in the DateNode class:

class DateNode:  
...
  selected_columns = knext.MultiColumnParameter(
      label="Specify Dates Columns",
      description="Select the columns that should be filtered out."
  )
  
  def configure(self, config_context, input_table_schema):
      # the multiple column selection parameter needs to be provided the list of columns of an input table
      self.selected_columns = input_table_schema.column_names
      return input_table_schema
  
  def execute(self, exec_context, input_table):
  
      df = input_table.to_pandas()
  
      selected_cols = self.selected_columns

      df_processed = streamline_dates_processing_v7(df.copy(), selected_cols, to_string=to_string, to_replace=to_replace, date_formats_list=date_formats_list, string_output_format=string_output_format)

      return knext.Table.from_pandas(df_processed)

Obviously nothing happens here, but I used some loggers around to try and get at which point, or under which variable or attribute my column selection list appears, and I couldn’t find any, whether in the configure or the execute method.

I only need the list of columns, in order to pass this as an argument to my function (my function takes a dataframe, takes multiple arguments including a columns list to which it will apply the script), and for the life of my I can’t find how to retrieve it from the attribute (or I’m dumb and/or blind, which is also very possible).

Any help would be greatly appreciated :slight_smile:

Regarding the first issue, if someone has a brilliant idea to remove that warning if the column types are different (I can’t really specify them manually in the return schema as I can’t know which columns and which types will be returned).

Thanks!

Dear @Vonwen,

very good questions indeed. I internally asked about the first one (dynamically change the configure output depending on the config dialog settings).

About the second one (knext.MultiColumnParameter): did you also find the corresponding docu in the following subsection? This should give you a minimal example:

The resulting question should then be How to use the column_filter parameter? A lot of examples are in the geospatial extension repository.

Does that help already?

Best regards
Steffen

1 Like

Hey Steffen, thanks for the reply :slight_smile:

That page is indeed what I used to build my node, and this is exactly where I’m stuck, as the example provided is quite minimal here.

Here is the configuration panel on a different test dataset that doesn’t include non date strings (to avoid causing an error):

Here is the code with a logger added:

  def configure(self, config_context, input_table_schema):
      # the multiple column selection parameter needs to be provided the list of columns of an input table
      LOGGER.warning(f"config: selected_cols: {self.selected_columns}")
      self.selected_columns = input_table_schema.column_names
      return input_table_schema

  def execute(self, exec_context, input_table):

      df = input_table.to_pandas()

      date_formats_list = [i.strip() for i in self.selection_date_formats_list.split(',')]
      to_replace = [i.strip() for i in self.selection_to_replace.split(',')]
      to_string = self.selection_to_string
      string_output_format = self.selection_string_output_format
      LOGGER.warning(f"execute: selected_cols: {self.selected_columns}")
      selected_cols = self.selected_columns
      LOGGER.warning(f"execute: selected_cols: {selected_cols}")

Here I am just trying to figure out what list is returned from the column selection panel, and see if I can use it. As you can see, I have left one column out of the selection in this example.

I have put logger on the configure method to see what the value of self.selected_columns exactly is (the attribute definition is unchanged from earlier) before setting it to the input table schema, and at two points in the execute method.

The console output for this execution is this:

WARN  Wakeo Date Parsing   8:1607     my_extension:config: selected_cols: ['ETD', 'ATD', 'ETA', 'final ETA', 'ATA']
WARN  Wakeo Date Parsing   8:1607     my_extension:execute: selected_cols: ['ETD', 'ATD', 'ETA', 'final ETA', 'ATA']
WARN  Wakeo Date Parsing   8:1607     my_extension:execute: selected_cols: ['ETD', 'ATD', 'ETA', 'final ETA', 'ATA']
WARN  Wakeo Date Parsing   8:1607     DataSpec generated by configure does not match spec after execution.

So this is the part I really don’t understand, the way I see it, the attribute definition does absolutely nothing except get the full table columns list and not those selected. I’m sure I’m missing something but I can’t figure out what.

Hey @Vonwen,

interesting.

I tried to reproduce your issue and wrote a minimal example node (you will need to adjust icon and category of the knext.node annotation) (see code below).
When executing with input tables as selected in screenshot below, I get the following logging information:

WARN  Test Node            3:2316     my_extension:['id_1']

Which behaves as it should. Now yours does not behave as it should. Would you mind sharing a complete code example containing all of the necessary code so that I can maybe reproduce it? And maybe more interesting: can you have a look at my example and verify that this one works for you as it should? Because if that fails for you already, we will need to look deeper into that issue.

I hope that we will resolve that issue together soon :slight_smile:

Best regards
Steffen

Minimal node

@knext.node("Test Node", knext.NodeType.LEARNER, "icons/icon.png", utils.category)
@knext.input_table("name of input", "desc of input")
@knext.output_table("name of output", "desc of output")
class TestNodeMultiColumn(knext.PythonNode):
    multiCols = knext.MultiColumnParameter()

    def configure(self, config_context: ConfigurationContext, *inputs):
        return super().configure(config_context, *inputs)

    def execute(self, exec_context: ExecutionContext, input_1):
        cols = self.multiCols
        df = input_1.to_pandas()
        LOGGER.warning(cols)
        return knext.Table.from_pandas(df)

Screenshot

Thank you very much Steffen, it works like a charm now, so the issue was clearly in the configure method, that way works fine and as a bonus, it also solves the DataSpec warning :slightly_smiling_face:

Thanks again!

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.