Trouble with reading subjectivity corpus

Hi,

I'm just experimenting with the text processing extension. I am trying to ready the subjectivity corpus - subjclueslen1-HLTEMNLP05.tff using File Reader node. But I'm not able to read the file properly into columns.

i.e. the data is being read as follows:

Col0                       Col1 ..........................

type=weaksubj      len=1 .............................

But I guess, the right way is to get as follows:

type                        len ..........................

weaksubj                1  ..................................

 

Can anyone please point out what I am missing here?

 

Thanks!

Hi,

unfortunalety the subjectivity corpus format is not a real csv format. You have to use the String Manipulation node or one (or many) Java Snippet nodes to get rid of the "type=", "len=", ... substrings in order to get the value.

Attached you find a workflow (the configured nodes have been taken from the Social Media example workflow on the web) which is doing that.

When you are configuring the File Reader node of the attached workflow to set your local file location of the corpus, make sure to check the "Preserve user settings for new location" checkbox.

Cheers, Kilian

i am facing the same problem in loading dictionary as csv. Please help me if you solved it.

Hi Quratulain,

what kind of dictionary do you want to read? Is your dict csv formatted? In general the File Reader Node is more powerful than the CSV Reader Node. I suggest to use that node. Your dictionary file needs to have one or more columns separated by a certain character e.g. "," or ";".

If you have problems reading the dict with the File Reader Node, you can send me the dict and i will try to buidl a workflow for you.

Cheers, Kilian