Can anyone help figure out an expression for the rule engine node. I’m trying to capture data five rows under the number 500402 which is basically my keyword. If anyone knows any better nodes to complete this that would be great as well!
My column that has the data is also named “Content_SplitResultList”
Thanks in advance
Your description isn’t very clear. What do you mean by “capture data five rows under the number 500402”? Can you share your data? That would be very helpful.
So I have one column with about a 100 rows and in one of the rows there is a keyword aka in my situation it’s the number 500402, the text I’m trying to extract is in all the 5 rows below the number 500402. Everything else I don’t need. I only need the text that is in the 5rows below the row containing 500402.
Is the “keyword” arbitrarily located? As I said earlier please share your data. If its proprietary, create masked data.
[Image removed per user request - ScottF]
I need the text from line 26-33
Not 5 rows but that is what I need to create an expression for
You apparently want to exclude blank rows from your count of five?
Not only blank rows but any other rows with data. Can make the expression do 10 rows if that is simpler. 10 rows below 500402 is all the data I want. Everything else I don’t want. Thanks in advance!!
This getting pretty painful. Do you want 5 or 10 rows following your keyword or the first 5 non-blank rows?
Sorry about the confusion 10 rows would be better.
Including blank rows?
Yes anything that is not in the 10 rows under that keyword I don’t want
Try this. You can add a writer node to write out the output.
Can you send me the configurations please. That’s what I was stuck on is the regex expression.
I have an existing workflow extracting all the data but I’m not using string manipulation node or rule engine node due to the regex expressions, everytime I create an expression it doesn’t accept it prob due to the syntax I’m used to writing for python. If anyone or you can make me a string manip node or rule engine node expression that extracts all the 10rows below 500402 as the keyword. That would be great. So I can just use my existing workflow, and just add string manip or rule engine node and apply the regex expression
@NabilEnn maybe you should try and write down what you actually want and provide some sample data for that. If you want you could write it out in your preferred language and then translate it with the help of ChatGPT or Deepl or just leave it here and we can do it.
One concept you might want to explore is identifying blocks within your data and then continue working with them with maybe the help of loops.
@NabilEnn
Split the table with table splitter and then use row sampling top 10 entries
br
Did you try my workflow? It seems to work ok without any regex. All you need to do is feed it your data table.
Well the picture I posted is my excel sheet after I extracted the data from pdf. It extracted too much data. Which is why I’m thinking I need to run regex.
With the keyword 500402
I will try this and let you know. So I should add the table splitter node before the excel node?