Document Vector Erro

Rui · April 1, 2014, 2:43pm

Hi.

I had a workflow for text minning with two Flat File Document Parse->Concatenate->BoW Creator->PunctuationErase->NumerFilter->CaseConverter->StopWord->PortStemmer->N-CharsFilter->DocumentVector.

In some documents the workflow run well, but in one document, in DocumentVector the execute fail and obtain the error: duplicate column name "suspend" at positions...

I don´t understand why this happened, because document vector should create columns only for different terms.

What should i do to solve this?

Thanks for the help.

kilian.thiel · April 2, 2014, 8:54am

Hi Rui,

yes, you are right. The node should create columns only for different terms. Which KNIME Textprocessing version are you using? Could you please post/attach a small workflow with a little data that reproduces the error if possible, and/or post the full stack trace? Thank you.

Cheers, Kilian

TimB · July 10, 2014, 2:43pm

Hi Rui, hi kilian,

I'm experiencing the same problem. I used the workflow you provided here in the forum (

) just for a quick similary trial with a custom set of documents, however I'm also observing the same error (in my case: Duplicate column name "measur" at positions 545 and 897).

Any further Information I can provide to help? (except the texts themselves)

edit: If "ignore tags" is unchecked in the document vector node, the node runs perfectly fine (although there are no tags assigned in this workflow -see above).
Rui: does this option change sth. in your workflow as well?

kilian.thiel · July 16, 2014, 7:10pm

Hi Tim,

thank you for ponting this out. Rui, can you confirm that the option "ignore tags" solves the problem?

I will have a look at the implementation.

edit: Which KNIME version are you using?

Cheers, Kilian

sballa · October 8, 2015, 3:31pm

Hi KNIME team,

I just started with using Knime, version 2.12.1 (Windows) and was building a simple workflow based on the “Text Mining” webinar on KnimeTV channel of youtube. When I executed a ‘Document Vector’ node with ‘Ignore tags’, ‘Bitvector’ checked and ‘As collection cell’ unchecked, I get the following error. I looked up on your forums but couldn’t find a proper resolution for it. Can you please help me in this regard.

ERROR Document vector 0:52 Execute failed: Duplicate column name "proprietary" at positions 1984 and 3541.

kilian.thiel · October 26, 2015, 5:25pm

Could you please share your workflow including data with me, that I can reproduce the problem?

Cheers, Kilian

sballa · October 26, 2015, 10:32pm

Hi Kilian,

I was able to use the Document Vector node without errors. The problem was in my filters, I had set one of them with deep preprocessing and append to both as 'Document', when I set the append to... 'Original Document' the problem vanished.

Thanks,

Sudha

soner_cakal · April 21, 2016, 8:36am

Hello kilian.thiel,

i am working with text mining.i worked 50 000 rows. i want to gain document vector, but it is a problem;

duplicate columns are in position at....

How can i solve this problem.Thanks for your help

Soner

kilian.thiel · April 21, 2016, 1:26pm

Hi Soner,

plese make sure that the terms are properly filtered before creating the document vectors. See therefore my last post in this thread: https://tech.knime.org/forum/knime-textprocessing/there-is-not-enough-space-on-the-disk

Did you check the checkbox "Ignore tags"?

Cheers, Kilian

system · June 2, 2023, 9:48pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.