Rule engine node expression

NabilEnn · April 21, 2024, 11:09pm

Can anyone help figure out an expression for the rule engine node. I’m trying to capture data five rows under the number 500402 which is basically my keyword. If anyone knows any better nodes to complete this that would be great as well!
My column that has the data is also named “Content_SplitResultList”
Thanks in advance

rfeigel · April 21, 2024, 11:30pm

Your description isn’t very clear. What do you mean by “capture data five rows under the number 500402”? Can you share your data? That would be very helpful.

NabilEnn · April 21, 2024, 11:44pm

So I have one column with about a 100 rows and in one of the rows there is a keyword aka in my situation it’s the number 500402, the text I’m trying to extract is in all the 5 rows below the number 500402. Everything else I don’t need. I only need the text that is in the 5rows below the row containing 500402.

rfeigel · April 21, 2024, 11:47pm

Is the “keyword” arbitrarily located? As I said earlier please share your data. If its proprietary, create masked data.

NabilEnn · April 21, 2024, 11:51pm

[Image removed per user request - ScottF]

NabilEnn · April 21, 2024, 11:51pm

I need the text from line 26-33
Not 5 rows but that is what I need to create an expression for

rfeigel · April 22, 2024, 12:00am

You apparently want to exclude blank rows from your count of five?

NabilEnn · April 22, 2024, 12:02am

Not only blank rows but any other rows with data. Can make the expression do 10 rows if that is simpler. 10 rows below 500402 is all the data I want. Everything else I don’t want. Thanks in advance!!

rfeigel · April 22, 2024, 12:05am

This getting pretty painful. Do you want 5 or 10 rows following your keyword or the first 5 non-blank rows?

NabilEnn · April 22, 2024, 12:07am

Sorry about the confusion 10 rows would be better.

rfeigel · April 22, 2024, 12:08am

Including blank rows?

NabilEnn · April 22, 2024, 12:10am

Yes anything that is not in the 10 rows under that keyword I don’t want

rfeigel · April 22, 2024, 1:34am

Try this. You can add a writer node to write out the output.

NabilEnn · April 22, 2024, 3:49pm

Can you send me the configurations please. That’s what I was stuck on is the regex expression.

NabilEnn · April 22, 2024, 3:51pm

I have an existing workflow extracting all the data but I’m not using string manipulation node or rule engine node due to the regex expressions, everytime I create an expression it doesn’t accept it prob due to the syntax I’m used to writing for python. If anyone or you can make me a string manip node or rule engine node expression that extracts all the 10rows below 500402 as the keyword. That would be great. So I can just use my existing workflow, and just add string manip or rule engine node and apply the regex expression

mlauber71 · April 22, 2024, 4:01pm

@NabilEnn maybe you should try and write down what you actually want and provide some sample data for that. If you want you could write it out in your preferred language and then translate it with the help of ChatGPT or Deepl or just leave it here and we can do it.

One concept you might want to explore is identifying blocks within your data and then continue working with them with maybe the help of loops.

Daniel_Weikert · April 22, 2024, 4:15pm

@NabilEnn
Split the table with table splitter and then use row sampling top 10 entries
br

rfeigel · April 22, 2024, 4:25pm

Did you try my workflow? It seems to work ok without any regex. All you need to do is feed it your data table.

NabilEnn · April 22, 2024, 4:44pm

Well the picture I posted is my excel sheet after I extracted the data from pdf. It extracted too much data. Which is why I’m thinking I need to run regex.
With the keyword 500402

NabilEnn · April 22, 2024, 4:45pm

I will try this and let you know. So I should add the table splitter node before the excel node?