Classifying new data based on old data

Hi Team,

I have got two sets of data.
Classified data - I have got around 800 rows of Classified data.
Based on Positive and Negative signs and based on the description it has been classified into Different account types.


This is the data which needs to be classified based on the above logic. I have got around 200 rows. I don’t need 100% of the data to be classified, even if 40% of the data is classified I will consider that as win.

Data to be classified

Similar to this I have got around 100 more files in which I need to do the same exercise but these classification’s varies industry by industry and can also vary client by client.

As a I am noob in machine learning and AI, I am not sure where to start form. Can someone tell me what I need to look at so that I can start working on this workflow.

Any reference to the books will also be appreciated.

I understand this problem can be solved using create a classification word and then searching it against the each line items and then classifying it but as I want to learn more about machine learning/artificial learning I want to use these to solve this problem.

My background
I am a Accountant, I know most of the data wrangling and Dashboard stuff but I don’t have much knowledge about machine learning and AI. I want to learn this so that I can further automate some of the boring task.

@Ankit_smart a few points about your case and some general remarks.

This seems to be a text analytical and classification question. I have a small collection about articles and exampels:

Maybe you start with the [L4-TP] Introduction to Text Processing course.

In this case in general you might have to clear the data from numbers and then define special words that would contain meaning (or let an algorithm find them) like “printer”, “storage”. And if your classification would be different for various clients you might have to construct several model (or one model per industry).

If you could provide some sample data (without spelling any secrets) someone from the forum might be able to explore further.

A final workflow for labeling might look something like this. Though you will have to make some adaptions:

Also therte are some books to be explored From Data Collection to Text Mining and Interpretation | KNIME

If you want to learn about KNIME and Machine Learning in general there are a lot of ressources:

4 Likes

Thanks a lot for your response. I will go through these.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.