Crowdsourcing & Machine Learning


In this project, we seek to investigate the impact of added subclass information on the performances of traditional machine learning classifiers. To illustrate, as some of the state-of-the-art ML algorithms powered by deep-learning and numerous physical resources can already classify cats and dogs images with an accuracy of up to 94%, we aim to significantly reduce the size of the training dataset and the resources required and integrate subclass information (for example, whether the cat is a tabby cat or an Egyptian cat) of each image into the algorithm to achieve the same classification accuracy.


I contributed extensively to the following two parts of this project: fast labeling system development and crowd pattern detection ability test.

Fast Labeling System

Image labeling has always been a challenging, time-consuming task that requires significant amount of human effort. My expertise in system building and crowdsourcing comes in when we need to develop an efficient image labeling tool for gathering superclass and subclass information of the images in the dataset from the crowd, and in the meantime evaluate their performances and fine-tune our system accordingly. With the help of previous research experience and technical skills, I was able to iterate through several versions of the system and examine their respective effectiveness (quality of subclass the crowd provides, accuracy of the superclass they provide, easiness of usage, time consumed, etc.) when tested against the crowd.

Below is the current labeling interface we have. The interface is still under constant development as the project unfolds.


Crowd Pattern Detection Ability Test

The success of this project highly depends on the performance of crowd workers in providing valuable subclass information. Therefore, we carefully designed and ran hundreds of Mturk experiments testing the ability of the crowd workers to recognize patterns in images (patterns that could be used as subclasses) and the quality of the subclasses they provide.

Below is an instruction we gave to the crowd workers in one of the experiments:


Go to top