Extract rows based on variable position

Hi All,

I have an Excel file that I only want to extract specific rows from. More specifically, I want to be able to extract the rows where column A contains a specific word until column A contains another specific word.

For example,
Col A | Col B | Col C
Name | John | Smith
Country | UK
Occupation | Student

So in the example above, I want to extract the rows from Name to Occupation. However, the position of this row can vary between different Excel files (i.e. in one file, it could be row 20, and in another, could be row 100), and there could be multiple rows in between Name and Occupation.

Is there a way to do this please? Thank you for any help you can provide on this.

Best wishes,
Allan

Hi @ZaKNaFeiN

Welcome to the KNIME community!

This is definitely possible, but a question that comes to mind: How should the start and stop point be determined? Are those fixed values, should the user input this manually or is it determined somewhere in your script? The answers to these questions will determine what a suitable solution could be for your use case.

1 Like

Thank you for your response. The start and stop values can be fixed (i.e. it always be Name and Occupation in that example).

Is it always Name Country Occupation? Does row filter node not work for you? (Rule based row filter as another alternative)
br

@ZaKNaFeiN

An option is to use a variable controlled row filter based on row number.

First, determine where the start and end values are located within the table. You can do this by looking for the RowIndex of “Name” and “Occupation”. I opt to use a Java Snippet for this since this attribute is included in there.

if (c_column1.equals("Name")) {
	out_start = ROWINDEX;
} else {
	out_start = null;
}

if (c_column1.equals("Occupation")) {
	out_end =  ROWINDEX;
} else {
	out_end =  null;
}

Note: make sure that the output is of type integer. Now I know it’s between row index 2 and 4. To equalize all values of the start and end, a Missing Value node can be used.

To use these values as variable, use the Table Row to Variable node and select start and end.

You now have the start and end determined dynamically which you can subsequently use in a Row Filter. Connect the variable output to the row filter.
image

In here, there is an option to filter based on rows by number.

image

To use the variables created earlier, navigate to the Flow Variables tabsheet and select the start and end variable for the RowRangeStart and RowRangeEnd

image

With input:
image

The output is:
image

Another input in a different order:

image

Output:
image

Note1: this approach assumes that the “Name” and “Occupation” only appear once.
Note2: this assumes that “Name” and “Occupation” will appear in this particular order.

Therefore, depending on your actual dataset this has approach may has to be tweaked a bit!

See WF:
Extract rows based on variable position.knwf (27.4 KB)

Hope this provides some inspiration!

4 Likes

Very clean solution @ArjenEX!

I wish the Row Splitter node had similar options to the Row Filter node… I nearly always split instead of filter. The Row Filter node has some interesting options built into it for situations like this, while the Row Splitter node seems to basically just be a duplicate of the Rule Based Row Splitter with a different presentation.

1 Like

Thank you so much @ArjenEX ! The Table Row to Variable and using those as inputs to the Row Filter is an elegant solution.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.