ALLStars: Overcoming Multi-Survey Selection Bias using Crowd-Sourced Active Learning

Abstract

Developing a multi-survey time-series classifier presents several challenges. One problem is overcoming the sample selection bias which arises when the instruments or observing cadences differ between the training and testing datasets. In this case, the probabilistic distributions characterizing the sources in the training survey dataset differ from the source distributions in the other survey, resulting in poor results when a classifier is naively applied. To resolve this, we have developed the ALLStars active learning framework which allows us to bootstrap a classifier onto a new survey using a small set of optimally chosen sources which are then presented to users for manual classification. Several iterations of this crowd-sourcing process results in a significantly improved classifier. Using this procedure, we have built a variable star light-curve classifier using OGLE, Hipparcos, and ASAS survey data.

Publication
Astronomical Data Analysis Software and Systems XXI. Proceedings of a Conference held at Marriott Rive Gauche Conference Center, Paris, France, 6-10 November, 2011. ASP Conference Series, Vol. 461. Edited by P. Ballester, D. Egret, and N.P.F. Lorente. San Francisco: Astronomical Society of the Pacific, 2012., p.581