Cookies on this website
We use cookies to ensure that we give you the best experience on our website. If you click 'Continue' we'll assume that you are happy to receive all cookies and you won't see this message again. Click 'Find out more' for information on how to change your cookie settings.

In semi-supervised learning for classification, i t is assumed that every ground truth class of data is present in the small labelled dataset. In many real-world sparsely-labelled datasets, it is possible that not all ground-truth classes are captured in the labelled dataset: a biased data collection process could result in some classes of data to be found only in the unlabelled dataset. We call this regime 'semi-unsupervised learning', an extreme case of semi-supervised learning, where some classes have no labelled exemplars. First, we outline the pitfalls associated with trying to apply deep generative model (DGM)-based semi-supervised learning algorithms to datasets of this type. We then show how a combination of clustering and semi-supervised learning, using DGMs, can be brought to bear on this problem. We study several different datasets, showing how one can still learn effectively when half of the ground truth classes are entirely unlabelled and the other half are sparsely labelled.

Original publication

DOI

10.1109/BigData50022.2020.9378265

Type

Conference paper

Publication Date

10/12/2020

Pages

5286 - 5295