BBC Documents classification with BERT extension

This workflow demonstrates how to conduct multiclass classification using the Redfield BERT Nodes. After 2 epochs of training, the classifier should reach more than 54% test accuracy without fine tuning and more than 97% test accuracy with fine tuning. Increasing the number of training epochs can increase the performance significantly. The BBC Full Text Document Classification data set used here consists of the 2225 documents in 5 categories and is taken from D. Greene and P. Cunningham. "Practical Solutions to the Problem of Diagonal Dominance in Kernel Document Clustering", Proc. ICML 2006. It can be found on Kaggle: https://www.kaggle.com/shivamkushwaha/bbc-full-text-document-classification If you wish to track your training progress, you can go to File->Preferences->KNIME->KNIME GUI and set the console log level to Info. Then you can monitor the status of the training in the console view (typically at the bottom right of the KNIME workbench). Required Python packages (need to be available in your TensorFlow 2 Python environment): bert==2.2.0 bert-for-tf2==0.14.4 Keras-Preprocessing==1.1.2 numpy==1.19.1 pandas==0.23.4 pyarrow==0.11.1 tensorboard==2.2.2 tensorboard-plugin-wit==1.7.0 tensorflow==2.2.0 tensorflow-estimator==2.2.0 tensorflow-hub==0.8.0 tokenizers==0.7.0 tqdm==4.48.0 transformers==3.0.2


This is a companion discussion topic for the original entry at https://kni.me/w/5Tkx-hpCjjXVPYP3