Final Graduate Project
Leveraging Human Feedback for Improved Classification Performance
Introduction
Using pre-trained machine learning models in robotics can be advantageous when the computing resources are limited or the task is stationary.
However, the performance of a pre-trained model might not be optimal due to different factors such as the data distribution shift from training to test settings, and due to the complexities of the real-world data, compared to training datasets and benchmarks.
For instance, we trained a classification model on MNIST dataset (For Exercise 4), and we observed that although the model performed well on the training and validation set collected from MNIST, it performed poorly when used to detect the digits in the duckietown.
In this report, we propose a solution to this problem that leverages human feedback to collect new data from the real world and improve the performance of the machine learning model.
We aim to update the classifier’s knowledge and improve the accuracy so that the robot can use the collected data from the real world to learn offline and improve its performance in a changing world.
The idea, in summary, is as follows:
The duckiebot, equipped with a pre-trained classification model for digit recognition moves in the duckietown which contains the digit labels while a human observer monitors the classification results.
Each time the model makes a mistake in classification, the human observer provides feedback by providing keyboard input for the running program. This will trigger the duckiebot to stop, collect new images of the digit from different angles, and then resume its track.
We address the issue of limited data and aim to partially automate the process of collecting new data for retraining the pre-trained classifier and improving its performance.
Problem Statement
We are addressing the problem of poor performance of a pre-trained classifier that has been trained on the MNIST dataset when deployed on the duckiebot in the real world.
Due to the various resolutions, camera angles, lighting conditions, and other factors, the image collected by the duckiebot might be hard to detect using a machine learning model solely trained on the MNIST dataset.
Proposed Solution
The proposed solution is using leveraging human feedback to collect more data in the real world to later train the model on it.
The duckiebot moves in the duckietown which has digit labels on the road. A human observer monitors the classification results in real time, and when the pre-trained classifier makes an incorrect classification, the human observer provides feedback by hitting a key on the keyboard, indicating that the classification was wrong. This feedback triggers the Duckiebot to pause its movement, rotate slowly back and forth, and collect new images of the number from slightly different angles using its camera. After a short period of time, the duckiebot resumes its preset track around the room.
Implementation
To implement this solution, we take the following steps:
Pre-training the Classifier
We define and train a convolutional neural network (CNN) classifier on the MNIST dataset using Pytorch. This neural network will serve as the pre-trained classifier for the Duckiebot. (We used the implementation in exercise 5 from this repository.)
Lane Following:
The duckiebot must perform lane following autonomously. We use a PID controller designed from previous exercises to do the lane following. In general cases, the robot performs the lane following using the PID controller, but when human feedback is provided indicating that the classifier was wrong, it triggers a new hardcoded behavior in the duckiebot which is turning left and right slightly, to take images in different angles for later training. We save the collected images in a rosbag to later train the model on them. The duckiebot used the PID in general, but in case of human feedback, it pauses, performs the hardcoded policy based on the behavior trigger it has just received, and then returns to the general PID lane following after collecting some data.
Data Collection:Â
When the Duckiebot receives feedback from the human observer, it pauses its movement and collects new images of the number from slightly different angles by rotating slowly back and forth. We save the images in a rosbag for later use.
Retraining the Classifier:Â
After collecting a sufficient amount of new data, we retrain the pre-trained classifier using the collected images. We do this step offline after running the duckiebot around the town. The reason is that duckiebot does not have sufficient computing resources to train the model online. We could have used other strategies such as defining a package on the local computer to train the model online and sending the trained model using publishers and subscribers on the duckiebot and the local machine, but the current strategy is used for its convenience for now.Â
Resuming the Track:Â
Once the pre-trained classifier is retrained, the Duckiebot resumes its track around the room and continues the image classification task. We can show that after training on the rosbag data collected from the duckietown, the classification performance of the robot has improved.
Conclusion
In conclusion, this project addresses the challenge of data collection in robotics and real-world environments, where variations in lighting conditions, camera angles, and other environmental factors can make real-world data much more challenging from pre-defined datasets.Â
We propose the idea of using human feedback to change the normal behavior of the duckiebot and encourage it to collect data whenever there is a sign of a drop in the accuracy of the image classifier.
This approach provides a solution to overcome the limitations of pre-trained classifiers when deployed in dynamic environments.Â
Resources
My code is in this repository. The baseline was taken from:
https://github.com/jihoonog/CMPUT-503-Exercise-5
Acknowledgments
I discussed with Adam White, my supervisor, how I can define and implement a project related to my research interests. I am interested in making neural networks stay plastic and learning stability. We came up with the idea together.