October 5-9, 2014


O1.1 Rule-based Cross-matching of Very Large Catalogs

Patrick Ogle (IPAC )

The NASA Extragalactic Database (NED) has deployed a new rule-based cross-matching algorithm called MatchEx, capable of cross-matching very large catalogs (VLCs) with >10 million objects. MatchEx goes beyond traditional position-based cross matching algorithms by using other available data together with expert logic to determine which candidate match is the best. Furthermore, the local background density of sources is used to determine and minimize the false-positive match rate and to estimate match completeness. The logical outcome and statistical probability of each match decision is stored in the database, and may be used to tune the algorithm and adjust match parameter thresholds. For our first production run, we cross-matched the GALEX All Sky Survey Catalog (GASC), containing nearly 40 million NUV-detected sources, to the NED. Candidate matches were identified for each GASC source within a 7.5" radius. These candidates were filtered on position-based matching probability, and on other criteria including object type and object name. We estimate a match completeness of 97.7% and a match accuracy of 99.8%. Over the next year, we will be cross-matching nearly 1 billion new catalog sources to NED, including the 2MASS point-source catalog, All-WISE, SDSS DR 12, and the Spitzer Source List. We expect to add new capabilities to filter candidate matches based on source diameters, redshifts, refined object classifications, and spectral energy distributions. We will also extend MatchEx to handle more heterogenous datasets federated from smaller catalogs through NED's literature pipeline.

