Document Vector Erro

Hi.

I had a workflow for text minning with two Flat File Document Parse->Concatenate->BoW Creator->PunctuationErase->NumerFilter->CaseConverter->StopWord->PortStemmer->N-CharsFilter->DocumentVector.

In some documents the workflow run well, but in one document, in DocumentVector the execute fail and obtain the error: duplicate column name "suspend" at positions...

I don´t understand why this happened, because document vector should create columns only for different terms.

What should i do to solve this?

Thanks for the help.

Hi Rui,

yes, you are right. The node should create columns only for different terms. Which KNIME Textprocessing version are you using? Could you please post/attach a small workflow with a little data that reproduces the error if possible, and/or post the full stack trace? Thank you.

Cheers, Kilian

Hi Rui, hi kilian,

I'm experiencing the same problem. I used the workflow you provided here in the forum (

) just for a quick similary trial with a custom set of documents, however I'm also observing the same error (in my case: Duplicate column name "measur" at positions 545 and 897).

Any further Information I can provide to help? (except the texts themselves)

edit: If  "ignore tags" is unchecked in the document vector node, the node runs perfectly fine (although there are no tags assigned in this workflow -see above).
Rui: does this option change sth. in your workflow as well?

Hi Tim,

thank you for ponting this out. Rui, can you confirm that the option "ignore tags" solves the problem?

I will have a look at the implementation.

edit: Which KNIME version are you using?

Cheers, Kilian

Hi KNIME team,

I just started with using Knime, version 2.12.1 (Windows) and was building a simple workflow based on the “Text Mining” webinar on KnimeTV channel of youtube. When I executed a ‘Document Vector’ node with ‘Ignore tags’, ‘Bitvector’ checked and ‘As collection cell’ unchecked, I get the following error. I looked up on your forums but couldn’t find a proper resolution for it. Can you please help me in this regard.

ERROR Document vector      0:52       Execute failed: Duplicate column name "proprietary" at positions 1984 and 3541.

Could you please share your workflow including data with me, that I can reproduce the problem?

Cheers, Kilian
 

Hi Kilian,

I was able to use the Document Vector node without errors. The problem was in my filters, I had set one of them with deep preprocessing and append to both as 'Document', when I set the append to... 'Original Document' the problem vanished.

Thanks,

Sudha

Hello kilian.thiel,

i am working with text mining.i worked 50 000 rows. i want to gain document vector, but it is a problem;

duplicate columns are in position at....

How can i solve this problem.Thanks for your help

 

Soner

Hi Soner,

plese make sure that the terms are properly filtered before creating the document vectors. See therefore my last post in this thread: https://tech.knime.org/forum/knime-textprocessing/there-is-not-enough-space-on-the-disk

Did you check the checkbox "Ignore tags"?

Cheers, Kilian