October 5-9, 2014

Abstract

O1.3 Rapid Large Scale Reprocessing of the ODI archive using the QuickReduce Pipeline

Arvind Gopu (Indiana University)

Ralf Kotulla (University of Wisconsin, Milwaukee); Michael D. Young (Indiana University); Soichi Hayashi (Indiana University); Daniel Harbeck (WIYN); Wilson Liu (WIYN), Robert Henschel (Indiana University)

The traditional model of astronomers collecting their observations as raw instrument data is being increasingly replaced by astronomical observatories serving standard calibrated data products. In general, most of such data also become available to the public at large after proprietary restrictions are lifted. For this model to work effectively, observatory need the ability to re-calibrate archival data products as improved master calibration products or pipeline improvements become available, and also to rapidly calibrate the data on-the-fly when requested by the user. Traditional astronomy pipelines are heavily I/O dependent and do not scale favorably on most systems as data volumes increase over time. In this paper, we present the One Degree Imager - Portal, Pipeline and Archive (ODI-PPA) calibration pipeline framework that offers a modern approach to this long-standing problem. PPA leverages existing cyberinfrastructure at Indiana University, mostly the BigRed II supercomputer and the Data Capacitor filesystem, to allow a large number of simultaneous parallel data reduction jobs - initiated by operators AND/OR users. We combine this with an efficient pipeline (QuickReduce), written in python and utilizing parallel computing, to ensure short processing times as well as full data provenance. We present an overview of this integrated pipeline system including recent updates to the pipeline and data processing system, that allow us to re-process and re-calibrate the entire ODI archive ~26,000 frames with a raw data volume of ~16 TB - within 1 day using 12 compute nodes. This flexible, fast, highly scalable and yet easy to operate framework will improve access to data from WIYN and ODI, in particular once data rates double with the upgraded focal plane (in 2015). It will also serve as template for future data processing infrastructure across the astronomical community and beyond.

Mode of presentation: Oral (need to be confirmed by the SOC)

Applicable ADASS XXIV theme category: Big Data Challenges