I am working on text classification. I have data in excel file. Each row contains multiple sentences. I want the classify the sentences into 3 different types. the assertive, directive, declaration. I have trained in the model. But now I want to know how many types of sentences are in each row/document.
My workflow is below.
Kindly suggest me a good approach.
Hi Majid and welcome to the KNIME community forum,
I am not sure what exactly you have done. But considering your last question:
You need the IDs corresponding to the rows before splitting like using a RowID node and keeping the new column and at the end of the workflow, you can use a GroupBy node to aggregate columns based on this ID column and count values. Usually it’s possible to modify the current row IDs to reproduce the previous IDs.
If you need more detailed help, I suggest you provide (upload) your workflow including some sample data.
Thank you for your help.
I have some multiple sentences(like an article/tweets) in each row of excel file. I converted them into sentences by using “Sentences Extractor” Node. I build my model and checked the accuracy. But now I want to know how many sentences in each article/tweet are assertive, how many are directive and how many are declarations.
I’m sharing my workflow. Kindly have a look at it.model on sample data(Text Classification).knwf (52.6 KB)
Unfortunately the workflow you have shared contains no data.
To export the workflow with data, execute the workflow and save it, then export it while you have the option “Reset workflow before export” unchecked.
Sorry. The workflow can be found here. https://mega.nz/#!Wy5zxSRB!Sh6S8yQPK9ssudWRwLfTxJ4V1ZGfVtTY7jkIM3eWnHE
The data is labeled with five different categories. I wanted to train the model on this data and then run the model on the articles that are in excel file. But for doing that I need to extract sentences from articles. I have extracted the sentences and ran the model on them. But now, I want to find how many sentences in each article one(row 1 of excel file) are of category 1, category 2, category 3, etc.
First of all, the Decision Tree Learner needs nominal or numerical data and your input has none. So it does not work.
Since you are trying to classify text, have you tried Text Classifier Learner and Text Classifier Predictor?
And about counting, consider my first reply to this topic.
I’m really glad for your solution.
However, I couldn’t understand your first reply to the post. Could you please provide me a sample workflow? I would be very grateful to you.
Your initial dataset in the workflow consists of single sentences with the corresponding label in each row.
So, if you want to classify the sentences, you can use the initial table as the input for the Text Classifier nodes.
Text Classifier nodes are part of Palladian nodes. So, @qqilihq may be a better help here.
I’m not sure about what assertive, declarative, etc actually is. However, this does read to me like a sentiment analysis application.
I have tried rowID and groupBy nodes. But I couldn’t make it.
The workflow can be downloaded from here. There are two workflows one for modeling and one for applying the model on unlabeled data.
Any help would be appreciated. I’m working on this for the last two weeks.
@armingrudd, @Geo, and @kilian.thiel I would really appreciate if you help me out.
didn’t go into details but if I got you right and you have identifier which sentence belongs to which paragraph you can try Pivoting node. That should do the trick.
Thanks @ipazin. However, what would be the criteria to concatenate the extracted sentences into their original paragraph? Means, I have column of extracted sentences which I extracted from some paragraphs, now what would be the criteria to join them back into their original paragraphs?
You don’t need to, in Knime you can work with parallel flows instead of sequential flows.In other words, with the identifier you should be fine to join back and forth.
Thanks geo, but I have tried everything. But couldn’t make it. Could you please solve this in the workflow I’ve provided above. Would be really glad.
not really into text processing but not sure what are you doing and what is a paragraph, what is a sentence and what is a document in your case
Anyways try joining your prediction (9509 rows) with output from Sentence Extractor node for example cause there you have same amount of rows. Then from there you should define your paragraph (if I got this right you have 500 paragraphs) by extracting one (maybe better more to be sure to be unique) first words from documents. Then using Pivoting node will give you number of sentences per paragraph.
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.