Introduction

This tutorial will show how to understand and manipilate the carpyncho Python Client.

First we need to import the module, and instantiate the client

[1]:
# import the module
import carpyncho

# instance the client
client = carpyncho.Carpyncho()

Firsts lets check which tiles have available catalogs to download.

[2]:
client.list_tiles()
[2]:
('b206',
 'b214',
 'b216',
 'b220',
 'b228',
 'b234',
 'b247',
 'b248',
 'b261',
 'b262',
 'b263',
 'b264',
 'b277',
 'b278',
 'b356',
 'b360',
 'b396')

Well lets asume we are interested in the tile b216, so we can check which catalogs are available in this tiles

[3]:
client.list_catalogs("b216")
[3]:
('features', 'lc')

Well we see that catalogs with the light curves (lc), and the features of those curves (features) are available.

So for example we now can retrieve more info of any of this catalogs, for simplicity let’s check the b216 lc

[4]:
client.catalog_info("b216", "lc")
[4]:
{'hname': 'Time-Serie',
 'format': 'BZIP2-Parquet',
 'extension': '.parquet.bz2',
 'date': '2020-04-14',
 'md5sum': '236e126f82e80684f29247220470b831  lc_obs_b216.parquet.bz2',
 'filename': 'lc_obs_b216.parquet.bz2',
 'driveid': '1C-_3A6almD42ewASe8n74Y355mYn9tZG',
 'size': 369866999,
 'records': 37839384}

The attribute hname is a human readable version of the name of the catalog, the next two keys have information of format of the catalog (how is stored in the cloud), next are information about the date of publication of the file, check-sums and the cloud-ID (all of this is mostly for internal use).

Finally we have the two more important information: size is the size in bytes of the file (352.7 MiB) and the number of records stored in the file (more than 37 millons).

Ok… to big, lets check the b278 features catalog

[5]:
client.catalog_info("b216", "features")
[5]:
{'hname': 'Features',
 'format': 'BZIP2-Parquet',
 'extension': '.parquet.bz2',
 'date': '2020-04-14',
 'md5sum': '433aae05541a2f5b191aa95d717fa83c  features_b216.parquet.bz2',
 'filename': 'features_b216.parquet.bz2',
 'driveid': '1-t165sLjn0k507SFeW-A4p9wYVL9rP4B',
 'size': 149073679,
 'records': 334773}

In this case this file is only 142.2 MiB of size, let’s retrive it into a dataframe.

[6]:
# the first time this can be slow
df = client.get_catalog("b278", "features")
df
[6]:
id cnt ra_k dec_k vs_type vs_catalog Amplitude Autocor_length Beyond1Std Con ... c89_jk_color c89_m2 c89_m4 n09_c3 n09_hk_color n09_jh_color n09_jk_color n09_m2 n09_m4 ppmb
0 32780000001647 33 270.675437 -30.833556 0.4205 1.0 0.303030 0.0 ... 0.254668 14.690558 14.666523 0.183141 0.026877 0.229526 0.256404 14.820497 14.825871 0.000044
1 32780000001722 32 270.601058 -30.797561 0.2815 1.0 0.250000 0.0 ... 0.714901 14.020039 13.975850 0.344830 0.136610 0.580695 0.717305 14.243584 14.253669 5.275137
2 32780000001725 31 270.586525 -30.790697 0.7770 1.0 0.225806 0.0 ... 0.737901 12.123443 12.090208 0.228780 0.187609 0.552696 0.740305 12.351745 12.358436 3.785030
3 32780000001764 35 270.533529 -30.764936 0.6025 1.0 0.200000 0.0 ... 0.708924 12.578053 12.528972 0.398169 0.114875 0.596295 0.711170 12.798067 12.809809 6.580914
4 32780000001766 49 270.575246 -30.785039 0.4850 1.0 0.224490 0.0 ... 0.587902 13.324888 13.287657 0.285994 0.111611 0.478696 0.590306 13.522170 13.530534 5.746016
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
866878 32780000865946 56 270.961067 -29.417511 0.0230 1.0 0.357143 0.0 ... 0.687494 12.213607 12.171067 0.347818 0.125109 0.563573 0.688682 12.398301 12.408567 2.734307
866879 32780000879797 56 270.962108 -29.392694 0.0260 1.0 0.375000 0.0 ... 0.655494 12.358619 12.329497 0.212287 0.163109 0.493573 0.656682 12.536672 12.542938 6.877899
866880 32780000894698 57 270.987062 -29.378722 0.0310 1.0 0.350877 0.0 ... 0.743494 10.892837 10.854592 0.297559 0.164110 0.580572 0.744682 11.089152 11.097934 5.883032
866881 32780000894881 55 272.064012 -29.893197 0.0285 1.0 0.381818 0.0 ... 0.464515 13.531632 13.505489 0.203314 0.096373 0.369172 0.465545 13.667806 13.673903 5.310308
866882 32780000900244 56 272.123942 -29.912258 0.0280 1.0 0.267857 0.0 ... 0.706600 12.044150 12.008327 0.273388 0.158705 0.549095 0.707800 12.229126 12.236734 0.922049

866883 rows × 73 columns

Well we have a lot of imformation to play here. Let’s check if we have some multiple types of sources

[7]:
df.groupby("vs_type").id.count()
[7]:
vs_type
               857450
BLAP                1
CV-DN               5
Cep-1               1
Cep-F               1
ECL-C             729
ECL-ELL           486
ECL-NC           3246
LPV-Mira          104
LPV-OSARG        3820
LPV-SRV           592
RRLyr-RRab        289
RRLyr-RRc         145
RRLyr-RRd           3
SP_ECL-C            1
SP_ECL-NC           1
T2Cep-BLHer         6
T2Cep-RVTau         2
T2Cep-WVir          1
Name: id, dtype: int64

Well 41 RRab stars (and more than 334K of unknow sources)

Well we have a lot to use here, lets make some plots.

Form now on, yo simple have a big pandas dataframe to manipulate.

All the methods of carpyncho.Carpyncho client are well documented and you can acces it whit the ‘?’ command in Jupyter

[8]:
client.get_catalog?
Signature: client.get_catalog(tile, catalog, force=False)
Docstring:
Retrieve a catalog from the carpyncho dataset.

Parameters
----------
tile: str
    The name of the tile.
catalog:
    The name of the catalog.
force: bool (default=False)
    If its True, the cached version of the catalog is ignored and
    redownloaded. Try to always set force to False.

Returns
-------
pandas.DataFrame:
    The columns of the DataFrame changes between the different catalog.

Raises
------
ValueError:
    If the tile or the catalog is not found.
IOError:
    If the checksum not match.
File:      ~/proyectos/carpyncho-py/src/carpyncho.py
Type:      method

[9]:
import datetime as dt
dt.datetime.now()
[9]:
datetime.datetime(2020, 4, 23, 23, 58, 1, 645410)
[ ]: