I had a workflow for text minning with two Flat File Document Parse->Concatenate->BoW Creator->PunctuationErase->NumerFilter->CaseConverter->StopWord->PortStemmer->N-CharsFilter->DocumentVector.
In some documents the workflow run well, but in one document, in DocumentVector the execute fail and obtain the error: duplicate column name "suspend" at positions...
I don´t understand why this happened, because document vector should create columns only for different terms.
What should i do to solve this?
Thanks for the help.
yes, you are right. The node should create columns only for different terms. Which KNIME Textprocessing version are you using? Could you please post/attach a small workflow with a little data that reproduces the error if possible, and/or post the full stack trace? Thank you.
Hi Rui, hi kilian,
I'm experiencing the same problem. I used the workflow you provided here in the forum (
) just for a quick similary trial with a custom set of documents, however I'm also observing the same error (in my case: Duplicate column name "measur" at positions 545 and 897).
Any further Information I can provide to help? (except the texts themselves)
edit: If "ignore tags" is unchecked in the document vector node, the node runs perfectly fine (although there are no tags assigned in this workflow -see above).
Rui: does this option change sth. in your workflow as well?
thank you for ponting this out. Rui, can you confirm that the option "ignore tags" solves the problem?
I will have a look at the implementation.
edit: Which KNIME version are you using?
Hi KNIME team,
I just started with using Knime, version 2.12.1 (Windows) and was building a simple workflow based on the “Text Mining” webinar on KnimeTV channel of youtube. When I executed a ‘Document Vector’ node with ‘Ignore tags’, ‘Bitvector’ checked and ‘As collection cell’ unchecked, I get the following error. I looked up on your forums but couldn’t find a proper resolution for it. Can you please help me in this regard.
ERROR Document vector 0:52 Execute failed: Duplicate column name "proprietary" at positions 1984 and 3541.
Could you please share your workflow including data with me, that I can reproduce the problem?
I was able to use the Document Vector node without errors. The problem was in my filters, I had set one of them with deep preprocessing and append to both as 'Document', when I set the append to... 'Original Document' the problem vanished.
i am working with text mining.i worked 50 000 rows. i want to gain document vector, but it is a problem;
duplicate columns are in position at....
How can i solve this problem.Thanks for your help
plese make sure that the terms are properly filtered before creating the document vectors. See therefore my last post in this thread: https://tech.knime.org/forum/knime-textprocessing/there-is-not-enough-space-on-the-disk
Did you check the checkbox "Ignore tags"?