A Map/Reduce Parallelized Framework for Rapidly Classifying Astrophysical Transients

Abstract

The Berkeley Transients Classification Pipeline (TCP) is a source identification, classification, and broadcast pipeline which federates data streams from multiple surveys. The TCP identifies variable science by making probabilistic statements about the scientific classification of newly discovered sources observed by the Palomar Transient Factorytextquoterights all sky survey. The primary purpose of PTF is to consistently map the available sky with the intent to discover a variety of galactic and extragalactic transient sources and events. The TCP identifies and alerts follow-up telescopes such as PAIRITEL (Bloom et al. 2005) and end users to these newly discovered transient sources. Here we discuss software used within the TCP to generate science classifiers when little or no data has been acquired by the survey of interest. This case proves more challenging than when generating classifiers for a well populated survey. We present some of the difficulties encountered and a parallelized Hadoop/MapReduce based technique we use to resolve them.

Publication
Astronomical Data Analysis Software and Systems XIX. Proceedings of a conference held October 4-8, 2009 in Sapporo, Japan. Edited by Yoshihiko Mizumoto, Koh-Ichiro Morita, and Masatoshi Ohishi. ASP Conference Series, Vol. 434. San Francisco: Astronomical Society of the Pacific, 2010., p.406