Content field to lines

I read PDF file by Tika parser. In Content column my data presented in lines. How I can split Content column to have every line as separate line in the table?

Thank you

Hi @izaychik63 -

Maybe you can try this? It was inspired in part by an example I found from @armingrudd. Not sure if it’s exactly what you need but at least maybe it gives you a starting point:

2021-10-12 13_29_12-KNIME Analytics Platform

TikaLineBreakSplittingExample.knwf (32.0 KB)


Thank you, @ScottF . Do KNIME has plans to add table extractor from PDF or so?

Hi @ScottF , yes I think that’s what @izaychik63 meant.

And that would be the way I was going to do it. But I was taking much time just to come up with the sample input data, that is having multiple lines for 1 record.

I know I can replicate the sample input by concatenating lines with join() and joining with “\n”, but it can’t be done in the Table Creator itself. Would have had to either combine columns into multiple rows, or start with a column expressions and do join() there

1 Like

Thank you, @bruno29a, for your attempt.

There is a ticket for this (AP-14910). I’ve added a +1 on it from you. :slight_smile:


Hello there,

if not mistaken using As list option in Cell Splitter node one doesn’t need Column Aggregator and Missing Value nodes.

@bruno29a indeed creating such a sample data is not possible within Table Creator node (will add it on the list). One (simple) way to do it in KNIME is Table Creator (where in one column 1 row is 1 line while second column contains groupID) followed by a GroupBy node with \n as delimiter.



This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.