Trouble with reading subjectivity corpus

Perceptive_Analytics · November 13, 2013, 9:59am

Hi,

I'm just experimenting with the text processing extension. I am trying to ready the subjectivity corpus - subjclueslen1-HLTEMNLP05.tff using File Reader node. But I'm not able to read the file properly into columns.

i.e. the data is being read as follows:

Col0 Col1 ..........................

type=weaksubj len=1 .............................

But I guess, the right way is to get as follows:

type len ..........................

weaksubj 1 ..................................

Can anyone please point out what I am missing here?

Thanks!

kilian.thiel · November 15, 2013, 12:37pm

Hi,

unfortunalety the subjectivity corpus format is not a real csv format. You have to use the String Manipulation node or one (or many) Java Snippet nodes to get rid of the "type=", "len=", ... substrings in order to get the value.

Attached you find a workflow (the configured nodes have been taken from the Social Media example workflow on the web) which is doing that.

When you are configuring the File Reader node of the attached workflow to set your local file location of the corpus, make sure to check the "Preserve user settings for new location" checkbox.

Cheers, Kilian

subjectivitycorpus.zip

Quratulain · June 3, 2014, 9:30am

i am facing the same problem in loading dictionary as csv. Please help me if you solved it.

kilian.thiel · June 10, 2014, 2:50pm

Hi Quratulain,

what kind of dictionary do you want to read? Is your dict csv formatted? In general the File Reader Node is more powerful than the CSV Reader Node. I suggest to use that node. Your dictionary file needs to have one or more columns separated by a certain character e.g. "," or ";".

If you have problems reading the dict with the File Reader Node, you can send me the dict and i will try to buidl a workflow for you.

Cheers, Kilian

system · June 2, 2023, 9:49pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.