Extract Data using Column Type (String, Integer) as Pattern

SOESCHEN · May 16, 2023, 8:59am

Hello Community,

after extracting my data into in single ROW in KNIME Table,my next steps is to combine/transform them into a new table.
I check/try “Unpivoting, Cell Splitter, Column Filter, Rule Engine” but i can not find howto use “regex” by Type of the Column.

The Pattern for each Dataset is:

Integer Column - String Column [n-1] - Integer Column

The “String of Columns” have no pattern neither in count nor content, i can filter.

I am strugling to find a node(s) in Knime for resolvition it.
Do you have an ideas how todo ?

Kind regards
Sven

ArjenEX · May 16, 2023, 9:39am

Hi @SOESCHEN

Welcome to the KNIME Community!

Can you supplement your question with an example of your current input (preferable in a workable format) and draft the expected output? At the moment it’s a bit hard to imagine what’you’re looking for.

In general, some nodes support regex based processing whenever a valid expression is used.

SOESCHEN · May 16, 2023, 11:36am

Extract_PDF_Content.knwf (35.5 KB)
Content_pdf_rename_it.txt (400.6 KB)

SOESCHEN · May 16, 2023, 12:04pm

@ArjenEX,

please find attached an example .During the pre of the workflow, i found for me too , that the structure is 100% which i described. I rebuild the workflow without the internal documents.

Is that sufficient for understanding ?

Daniel_Weikert · May 16, 2023, 5:40pm

Is that really a consistant pattern you get when you read the pdf? I tried tika parser with your pdf and get inconsistent results back
br

SOESCHEN · May 17, 2023, 6:19am

I re-run the workflow with this pdf serveral times. My output is everytime the same.

The PDF is based on Template.
Otherwise the extraction is not a consistent pattern.

But back to my questions - with the regex howto use it.
Is there a Examples ?
Can these combine with Flow variables ?

Daniel_Weikert · May 17, 2023, 4:27pm

Sorry for being inprecise. I meant are your columns following a consistent pattern like topic1, 1_1, 1_2,1_3, topic2, 2_1, 2_2, 2_3, …
That was not the case on my end
br

system · August 15, 2023, 4:28pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.