Delete rows and continue with new table

Hello everybody,

I use a rule-based line filter to filter specific data in a large file. I compare two values ​​from different tables. If they matche, the filtered rows will be aggregated and written to a CSV file.
I repeat this several times within a loop.

However, the rule-based row filter is very slow. Is it possible to use a rule-based row spliter and use the second Output (false rows) as the new input for the rule-based row filter? This would reduce the amount of data after a loop-Iteration.

Thank you in Advance :slight_smile:

Hi,

Would you please provide an example to better explain your gaol?

:blush:

Hey @armingrudd,

this is my workflow. It works and I receive the file I want to have.


Date Recording is the date prepreperation of two files. File one (connected with the Sorter node) is having 7,9 mio rows, file two (connected with Chunk Loop Start) 1000 rows.
In configured the rule-based row filter as followed:
$${SStartTime}$$ <= $TimeStamp$ AND $TimeStamp$ <= $${SEndTime}$$ => TRUE
Afterwards I aggregate the colomns by the GroupBy node.

I want to speed up the process of filtering. Maybe I can delete all the rows I already aggregated and use the reduced file as a new input for the Rule-based Row Filter? Or do you have a better idea?

Best regards!

Which output port of the Metanode has the “TimeStamp” column?

The first file (connected to the Sorter) with 7,9 mio rows.

Ok,
The main approach you are following is the same as what I would do except in these:

:blush:

2 Likes

@armingrudd

Thanks for your help. I will adjust the workflow next week and post the result.

It is a great community and nice tool :).

2 Likes

Hey @armingrudd,

one more question. I configured the Rule-Based Row Filter as followed:


So I have different StartTime and EndTime in every Iteration of the loop.

How can I configure the Date&Time-based Row Filter the same way?


Best regards!

Go to the Flow Variables tab:

:blush:

2 Likes

Hey,
it works and I also improved my workflow with your ideas.

Thank you!

2 Likes

Hello,

is there a function to delete rows in a data table? I dont mean filtering. I really mean delete or update the table in my workflow.

Best regards.

Hi,

What’s the problem with filtering?

In some cases you may need to use Domain Calculator. Check this topic as an example:

If you explain your case further, It would be possible to help you better.

:blush:

1 Like

Hello,

I still use a date and time based row filter to filter my data. I use this filter in a loop with a Table Row To Variable Loop Start, as you previously recommended.

However, the date and time based row filter takes a long time to filter a large amount of data. Once I filtered my data table, I wirte them into a csv file. Afterwords, I dont need the already filtered data anymore. To improve the running time, I like to reduce/update the data-table for the filtered data.

Don’t have to update the data-table in every Iteration, but at least every 50. Is there a way to do this or do you have another idea?

image

Best regards!

To have only the remaining rows in the next iterations, you can use Recursive Loop.

Use the Recursive Loop Start node after the Sorter node. Use the current iteration number to filter the output of the Row Filter node and convert the output to variable. Split the main table based on dates using Rule-based Row Splitter node. Use Variable to table row after the CSV Writer node and close the loop using the Recursive Loop End node. Pass the second port of the splitter to the second port of the Recursive Loop End node.

If you provide a sample dataset, I would build an example workflow for you.

:blush:

2 Likes

Hey @armingrudd

that would be awesome! Here are the two (smaller) sample datasets.
I used data from the rows “Start and Endtime” to filter the rows in “Testsample 1”. (starttime < timestamp < endtime)
Instead of join the filtered rows with another docuement (as in the screenshot in the last post), you can also aggreagete the filtered values and write it into an csv.

Start and Endtime.txt (59.3 KB)
Testsampel 1.txt (3.8 MB)

Notice, the “Testsample1” contains only 90.000 rows instead of millions. Had to reduce the size significantly to upload an example.

Best regards!

1 Like

Hi there @Phibu,

apart from optimizing workflow with different nodes here are some things you can also try.

For better speed execution you can try Streaming Extension in KNIME:
https://www.knime.com/blog/streaming-data-in-knime

For general tips&tricks on optimizing KNIME workflows check this blog post:
https://www.knime.com/blog/optimizing-knime-workflows-for-performance

Additionally if you are not using last KNIME version, 4.0.0, I highly recommend as performance has been a major focus of this release:
https://www.knime.com/whats-new-in-knime-40#performance

Br,
Ivan

4 Likes

Sorry for the delay @Phibu,

Here is the example workflow:

recursive_filter.knwf (1.3 MB)

I hope this would help you.

:blush:

2 Likes

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.