# Final stage of HELP data processing

This notebook collates the final output files ready for writing to csv for ingestion to a VO server. At the bottom of the notebook we also summarise the pipeline products which are processed on a given field. This are generated using the dmu32 meta_main.yml files which contain links to the XID+, CIGALE and photo-z catalogues which feed in to the final catalogues for publishing.

Summary of notebook:

- Take DR1 masterlist suffixes from overview table
- Find dmu32 full table names and write to a file
- Create summary of all the data products per field using the dmu32 meta_main.yml files

In [1]:
from  herschelhelp_internal  import git_version
print("This notebook was run with herschelhelp_internal version: \n{}".format(git_version()))
import datetime
print("This notebook was executed on: \n{}".format(datetime.datetime.now()))

This notebook was run with herschelhelp_internal version: 
017bb1e (Mon Jun 18 14:58:59 2018 +0100)
This notebook was executed on: 
2018-07-12 17:19:01.808314


In [2]:
from astropy.table import Table, Column
from astropy import units as u
import numpy as np

from pymoc import MOC

from herschelhelp_internal.masterlist import find_last_ml_suffix

import yaml

import os
import time

import humanfriendly



In [3]:
TODAY = os.environ.get('SUFFIX', time.strftime("_%Y%m%d"))

## The definition of HELP DR1
Here we take the DR1 definition from dmu1. We then propagate those suffixes through to dmu32. Note that this does not include a specification of how the masterlist made its way through the whole HELP process.

In [4]:
dr1 = Table.read("../dmu1/dr1_overview.fits")

In [5]:
dr1.show_in_notebook()

idx,field,dr1_suffix,last_suffix,objects,area_sq_degrees,file_size_bytes,file_size_readable
0,AKARI-NEP,20180215,20180215,531746,9.194732358779468,348848640,348.85 MB
1,AKARI-SEP,20180221,20180221,844172,8.713306475131118,538606080,538.61 MB
2,Bootes,20180520,20180520,3481661,11.42815299095486,5002044480,5 GB
3,CDFS-SWIRE,20180613,20180613,2171051,12.971246403717068,5705593920,5.71 GB
4,COSMOS,20180619,20180619,2599374,5.083863478496816,10517175360,10.52 GB
5,EGS,20180501,20180501,1412613,3.566383275122158,3808480320,3.81 GB
6,ELAIS-N1,20171016,20180216,4026292,13.507484555454765,5242279680,5.24 GB
7,ELAIS-N2,20180218,20180218,1783240,9.167479903991111,2007964800,2.01 GB
8,ELAIS-S1,20180416,20180416,1655564,9.002940646885508,1970156160,1.97 GB
9,GAMA-09,20180601,20180601,12937982,62.01393417284915,19911611520,19.91 GB


In [6]:
GAVO_FOLDER = '/mnt/hedam/data_vo/'
stilts_command = 'stilts tpipe {in_file} omode=out ofmt=csv out={GAVO_FOLDER}{out_file}'

final_data = open('help_to_vo.sh', 'w+')
for field in dr1:
    final_help_product = './dmu32_{}/data/{}_{}.fits'.format(field[0], field[0], field[1])
    cigale_input = './dmu32_{}/data/{}_{}_cigale.fits'.format(field[0], field[0], field[1])

    if os.path.exists(final_help_product):
        print(final_help_product)
    
        #Test with Cigale input files
        final_data.write(stilts_command.format(in_file=final_help_product, 
                                               GAVO_FOLDER=GAVO_FOLDER, 
                                               out_file='herschelhelp/main/{}_{}_all.csv \n'.format(field[0], 
                                                                                                field[1])))
    elif os.path.exists(cigale_input):
        print(cigale_input)
        final_data.write(stilts_command.format(in_file=cigale_input, 
                                               GAVO_FOLDER=GAVO_FOLDER, 
                                               out_file='herschelhelp/main/{}_{}_incomplete.csv \n'.format(field[0], 
                                                                                                       field[1])))
        
    else:
        final_data.write('# No data for {} \n'.format(field[0]))
        
    #final_data.write('./dmu32_{}/data/{}_{}.fits'.format(field[0], field[0], field[1]))
    
final_data.close()

./dmu32_AKARI-NEP/data/AKARI-NEP_20180215_cigale.fits
./dmu32_AKARI-SEP/data/AKARI-SEP_20180221_cigale.fits
./dmu32_Bootes/data/Bootes_20180520_cigale.fits
./dmu32_CDFS-SWIRE/data/CDFS-SWIRE_20180613.fits
./dmu32_COSMOS/data/COSMOS_20180619.fits
./dmu32_EGS/data/EGS_20180501_cigale.fits
./dmu32_ELAIS-N1/data/ELAIS-N1_20171016.fits
./dmu32_ELAIS-N2/data/ELAIS-N2_20180218_cigale.fits
./dmu32_ELAIS-S1/data/ELAIS-S1_20180416.fits
./dmu32_GAMA-09/data/GAMA-09_20180601.fits
./dmu32_GAMA-12/data/GAMA-12_20180218.fits
./dmu32_GAMA-15/data/GAMA-15_20180213.fits
./dmu32_HDF-N/data/HDF-N_20180427_cigale.fits
./dmu32_Herschel-Stripe-82/data/Herschel-Stripe-82_20180307_cigale.fits
./dmu32_Lockman-SWIRE/data/Lockman-SWIRE_20180219.fits
./dmu32_NGP/data/NGP_20180219_cigale.fits
./dmu32_SA13/data/SA13_20180501_cigale.fits
./dmu32_SGP/data/SGP_20180221.fits
./dmu32_SPIRE-NEP/data/SPIRE-NEP_20180220_cigale.fits
./dmu32_SSDF/data/SSDF_20180221_cigale.fits
./dmu32_xFLS/data/xFLS_20180501_cigale.fits
./dmu

The out put of this notebook is a shell script which will write all the fits files to csv files in the vo folder

In [7]:
depths_to_vo = open('depths_to_vo.sh', 'w+')
for field in dr1:
    final_depth_product = '../dmu1/dmu1_ml_{}/data/depths_{}_{}.fits'.format(field[0], field[0].lower(), field[1])
    

    if os.path.exists(final_depth_product):
        print(final_depth_product)
    
        #Test with Cigale input files
        depths_to_vo.write(stilts_command.format(in_file=final_depth_product, 
                                               GAVO_FOLDER=GAVO_FOLDER, 
                                               out_file='depth/{}_{}.csv \n'.format(field[0], 
                                                                                                field[1])))

        
    else:
        depths_to_vo.write('# No depths for {} \n'.format(field[0]))
        
    #final_data.write('./dmu32_{}/data/{}_{}.fits'.format(field[0], field[0], field[1]))
    
depths_to_vo.close()

../dmu1/dmu1_ml_AKARI-NEP/data/depths_akari-nep_20180215.fits
../dmu1/dmu1_ml_AKARI-SEP/data/depths_akari-sep_20180221.fits
../dmu1/dmu1_ml_Bootes/data/depths_bootes_20180520.fits
../dmu1/dmu1_ml_CDFS-SWIRE/data/depths_cdfs-swire_20180613.fits
../dmu1/dmu1_ml_COSMOS/data/depths_cosmos_20180619.fits
../dmu1/dmu1_ml_EGS/data/depths_egs_20180501.fits
../dmu1/dmu1_ml_ELAIS-N1/data/depths_elais-n1_20171016.fits
../dmu1/dmu1_ml_ELAIS-N2/data/depths_elais-n2_20180218.fits
../dmu1/dmu1_ml_ELAIS-S1/data/depths_elais-s1_20180416.fits
../dmu1/dmu1_ml_GAMA-09/data/depths_gama-09_20180601.fits
../dmu1/dmu1_ml_GAMA-12/data/depths_gama-12_20180218.fits
../dmu1/dmu1_ml_GAMA-15/data/depths_gama-15_20180213.fits
../dmu1/dmu1_ml_HDF-N/data/depths_hdf-n_20180427.fits
../dmu1/dmu1_ml_Herschel-Stripe-82/data/depths_herschel-stripe-82_20180307.fits
../dmu1/dmu1_ml_Lockman-SWIRE/data/depths_lockman-swire_20180219.fits
../dmu1/dmu1_ml_NGP/data/depths_ngp_20180219.fits
../dmu1/dmu1_ml_SA13/data/depths_sa13_2018

## Summarise completeness of HELP data sets

Here we get information about what is available on each field to summarise the data products available per field. We take the cigale, xid+ and photo-z filenames from the per field meta_main.yml files here and check they are there and how large they are. This then given a summary of all the data present.

In [8]:
dr1_data_products = dr1.copy()

fields_info = yaml.load(open("../dmu2/meta_main.yml", 'r'))

In [9]:
dr1_data_products.add_column(Column(data =np.full(len(dr1_data_products), 0, dtype=int), name = 'xid_objects'))
dr1_data_products.add_column(Column(data =np.full(len(dr1_data_products), 0, dtype=int), name = 'photoz_objects'))
dr1_data_products.add_column(Column(data =np.full(len(dr1_data_products), 0, dtype=int), name = 'cigale_objects'))

In [10]:
for field in fields_info['fields']:
    print(field['name'] + ':')
    xid_objects = 0
    photoz_objects = 0
    cigale_objects = 0
    dmu32_info = yaml.load(open('./dmu32_{}/meta_main.yml'.format(field['name']), 'r'))

        
    try: 
        for n, xid_file in enumerate(dmu32_info['xid']):
            #print(n,xid_file,xid_file.replace('dmu_products', '..'))
            xid_objects = len(Table.read(xid_file.replace('dmu_products', '..')))
            print("  - xid: {}".format(dmu32_info['xid'][n]))
    except FileNotFoundError:
        print("  - xid not done.".format(field['name']))
        
    try: 
        #print(dmu32_info['photoz'].replace('dmu_products', '..'))
        photoz_objects = len(Table.read(dmu32_info['photoz'].replace('dmu_products', '..')))
        print("  - photoz: {}".format(dmu32_info['photoz']))
    except FileNotFoundError:
        print("  - photoz not done.".format(field['name']))
        
    try: 
        cigale_objects = len(Table.read(dmu32_info['cigale'].replace('dmu_products', '..')))
        print("  - cigale: {}".format(dmu32_info['cigale']))
    except FileNotFoundError:
        print("  - cigale not done.".format(field['name']))
 
                   
    dr1_data_products['xid_objects'][dr1_data_products['field'] == field['name']] = xid_objects
    dr1_data_products['photoz_objects'][dr1_data_products['field'] == field['name']] = photoz_objects
    dr1_data_products['cigale_objects'][dr1_data_products['field'] == field['name']] = cigale_objects

AKARI-NEP:
  - xid not done.
  - photoz not done.
  - cigale not done.
AKARI-SEP:
  - xid not done.
  - photoz not done.
  - cigale not done.
Bootes:
  - xid not done.
  - photoz not done.
  - cigale not done.
CDFS-SWIRE:
  - xid: dmu_products/dmu26/dmu26_XID+SPIRE_CDFS-SWIRE/data/dmu26_XID+SPIRE_CDFS-SWIRE_cat_20170919.fits
  - xid: dmu_products/dmu26/dmu26_XID+MIPS_CDFS-SWIRE/data/dmu26_XID+MIPS_CDFS-SWIRE_cat_20170901.fits
  - xid: dmu_products/dmu26/dmu26_XID+PACS_CDFS-SWIRE/data/dmu26_XID+PACS_CDFS-SWIRE_cat_20171019.fits
  - photoz: dmu_products/dmu24/dmu24_CDFS-SWIRE/data/master_catalogue_cdfs-swire_20171103_photoz_20180210.fits
  - cigale: dmu_products/dmu28/dmu28_CDFS-SWIRE/data/zphot/HELP_final_results.fits
COSMOS:
  - xid: dmu_products/dmu26/dmu26_XID+MIPS_COSMOS/data/dmu26_XID+MIPS_COSMOS_20170213.fits
  - xid: dmu_products/dmu26/dmu26_XID+PACS_COSMOS/data/dmu26_XID+PACS_COSMOS_20170303.fits
  - xid: dmu_products/dmu26/dmu26_XID+SPIRE_COSMOS/data/dmu26_XID+SPIRE_COSMOS_2016

In [11]:
dr1_data_products.show_in_notebook()

idx,field,dr1_suffix,last_suffix,objects,area_sq_degrees,file_size_bytes,file_size_readable,xid_objects,photoz_objects,cigale_objects
0,AKARI-NEP,20180215,20180215,531746,9.194732358779468,348848640,348.85 MB,0,0,0
1,AKARI-SEP,20180221,20180221,844172,8.713306475131118,538606080,538.61 MB,0,0,0
2,Bootes,20180520,20180520,3481661,11.42815299095486,5002044480,5 GB,0,0,0
3,CDFS-SWIRE,20180613,20180613,2171051,12.971246403717068,5705593920,5.71 GB,240489,139532,9308
4,COSMOS,20180619,20180619,2599374,5.083863478496816,10517175360,10.52 GB,44310,694478,15747
5,EGS,20180501,20180501,1412613,3.566383275122158,3808480320,3.81 GB,0,1183490,0
6,ELAIS-N1,20171016,20180216,4026292,13.507484555454765,5242279680,5.24 GB,120282,2885116,50129
7,ELAIS-N2,20180218,20180218,1783240,9.167479903991111,2007964800,2.01 GB,0,0,0
8,ELAIS-S1,20180416,20180416,1655564,9.002940646885508,1970156160,1.97 GB,194276,1102319,25393
9,GAMA-09,20180601,20180601,12937982,62.01393417284915,19911611520,19.91 GB,1386659,8834023,130293


In [12]:
has_xid = dr1_data_products['xid_objects'] != 0
has_photoz = dr1_data_products['photoz_objects'] != 0
has_cigale = dr1_data_products['cigale_objects'] != 0

print("XID+: There are {} fields with XID+ fluxes computed totalling {area:.2f} square degrees.".format(
    np.sum(has_xid),
    area=np.sum(dr1_data_products['area_sq_degrees'][has_xid])))
print("      This is a total of {} objects with an average of {perc:.0f}% of objects on processed areas.".format(
    np.sum(dr1_data_products['xid_objects'][has_xid]),
    perc=100 * np.sum(dr1_data_products['xid_objects'][has_xid])/np.sum(dr1_data_products['objects'][has_xid])))

print("Photo-zs: There are {} fields with photozs computed totalling {area:.2f} square degrees.".format(
    np.sum(has_photoz),
    area=np.sum(dr1_data_products['area_sq_degrees'][has_photoz])))
print("      This is a total of {} objects with an average of {perc:.0f}% of objects on processed areas.".format(
    np.sum(dr1_data_products['photoz_objects'][has_photoz]),
    perc=100 * np.sum(dr1_data_products['photoz_objects'][has_photoz])/np.sum(dr1_data_products['objects'][has_photoz])))

print("CIGALE SEDs: There are {} fields with SEDs computed totalling {area:.2f} square degrees.".format(
    np.sum(has_cigale),
    area=np.sum(dr1_data_products['area_sq_degrees'][has_cigale])))
print("      This is a total of {} objects with an average of {perc:.0f}% of objects on processed areas.".format(
    np.sum(dr1_data_products['cigale_objects'][has_cigale]),
    perc=100 * np.sum(dr1_data_products['cigale_objects'][has_cigale])/np.sum(dr1_data_products['objects'][has_cigale])))

XID+: There are 9 fields with XID+ fluxes computed totalling 543.97 square degrees.
      This is a total of 8075547 objects with an average of 10% of objects on processed areas.
Photo-zs: There are 14 fields with photozs computed totalling 1221.34 square degrees.
      This is a total of 92190306 objects with an average of 56% of objects on processed areas.
CIGALE SEDs: There are 9 fields with SEDs computed totalling 543.97 square degrees.
      This is a total of 855766 objects with an average of 1% of objects on processed areas.


In [13]:
dr1_data_products.write('dr1_data_products_overview{}.csv'.format(TODAY), overwrite=True)