Hi,
having looked at your data and to your original workflow, it is now clear to me that you are missing some crucial steps in your data preparation in relation to what you are trying to achieve.
You cannot simply submit a bunch of comma separated keywords as strings to the learner/predictor and hope they will work correctly. The learner will consider each string as a unique feature with multiple possible values, but there are not enough variations or distinct features to base any discrimination on. It is therefore not surprising that your classifier (predictor) has a low accuracy and behaves almost randomly. It doesn't have much to work on!
The right way to do it is to first assign each keyword to an own feature, then pass on the resulting feature vector to the learner/predictor. Let me explain in details with an example.
Assume you have only 3 articles with the following keywords associated to each one and the respective classes:
1: AAA, BBB, CCC --> class 1
2: BBB, DDD --> class 2
3: AAA, BBB, EEE --> class 3
You have 5 possible distinct value for the keywords (AAA, BBB, CCC, DDD, EEE) so the feature vector will have dimension 5 and will look like this for each article:
1: 1,1,1,0,0 --> class 1
2: 0,1,0,1,0 --> class 2
3: 1,1,0,0,1 --> class 3
Where 1 indicates the presence of that keyword in association with the article, 0 its absence.
Now you can use the feature vector as input to a learner node, together with the class. The node will "learn" the association of each specific value of the feature vector to each specific class. With that classification model you can run a new data set, where keywords for an article are also expressed through a feature vector, through a classifier (predictor) to predict the class of each new article.
It is important to note that the feature vector has to be built on all the data set (learning + test set) otherwise it may be incomplete.
With this in mind you should now know how to modify your workflow to have higher accuracy prediction. Even with such a limited dataset you should be able to go to around 90%.
Feel free to post here again if you get stuck.
Cheers,
Marco.