Introduction¶

This tutorial will show how to understand and manipilate the carpyncho Python Client.

First we need to import the module, and instantiate the client

[1]:

# import the module
import carpyncho

# instance the client
client = carpyncho.Carpyncho()

Firsts lets check which tiles have available catalogs to download.

[2]:

client.list_tiles()

[2]:

('others',
 'b206',
 'b214',
 'b216',
 'b220',
 'b228',
 'b234',
 'b247',
 'b248',
 'b261',
 'b262',
 'b263',
 'b264',
 'b277',
 'b278',
 'b356',
 'b360',
 'b396')

Well lets asume we are interested in the tile b216, so we can check which catalogs are available in this tiles

[3]:

client.list_catalogs("b216")

[3]:

('features', 'lc')

Well we see that catalogs with the light curves (lc), and the features of those curves (features) are available.

So for example we now can retrieve more info of any of this catalogs, for simplicity let’s check the b216 lc

[4]:

client.catalog_info("b216", "lc")

[4]:

{'hname': 'Time-Series',
 'format': 'BZIP2-Parquet',
 'extension': '.parquet.bz2',
 'date': '2020-04-14',
 'md5sum': '236e126f82e80684f29247220470b831  lc_obs_b216.parquet.bz2',
 'filename': 'lc_obs_b216.parquet.bz2',
 'url': 'https://catalogs.iate.conicet.unc.edu.ar/carpyncho/lcurves/b216/lc_obs_b216.parquet.bz2',
 'size': 369866999,
 'records': 37839384}

The attribute hname is a human readable version of the name of the catalog, the next two keys have information of format of the catalog (how is stored in the cloud), next are information about the date of publication of the file, check-sums and the cloud-ID (all of this is mostly for internal use).

Finally we have the two more important information: size is the size in bytes of the file (352.7 MiB) and the number of records stored in the file (more than 37 millons).

Ok… to big, lets check the b278 features catalog

[5]:

client.catalog_info("b216", "features")

[5]:

{'hname': 'Features',
 'format': 'BZIP2-Parquet',
 'extension': '.parquet.bz2',
 'date': '2020-04-14',
 'md5sum': '433aae05541a2f5b191aa95d717fa83c  features_b216.parquet.bz2',
 'filename': 'features_b216.parquet.bz2',
 'url': 'https://catalogs.iate.conicet.unc.edu.ar/carpyncho/lcurves/b216/features_b216.parquet.bz2',
 'size': 149073679,
 'records': 334773}

In this case this file is only 142.2 MiB of size, let’s retrive it into a dataframe.

[6]:

# the first time this can be slow
df = client.get_catalog("b278", "features")
df

b278-features: 349MB [02:23, 2.42MB/s]

[6]:

	id	cnt	ra_k	dec_k	vs_type	vs_catalog	Amplitude	Autocor_length	Beyond1Std	Con	...	c89_jk_color	c89_m2	c89_m4	n09_c3	n09_hk_color	n09_jh_color	n09_jk_color	n09_m2	n09_m4	ppmb
0	32780000001647	33	270.675437	-30.833556			0.4205	1.0	0.303030	0.0	...	0.254668	14.690558	14.666523	0.183141	0.026877	0.229526	0.256404	14.820497	14.825871	0.000044
1	32780000001722	32	270.601058	-30.797561			0.2815	1.0	0.250000	0.0	...	0.714901	14.020039	13.975850	0.344830	0.136610	0.580695	0.717305	14.243584	14.253669	5.275137
2	32780000001725	31	270.586525	-30.790697			0.7770	1.0	0.225806	0.0	...	0.737901	12.123443	12.090208	0.228780	0.187609	0.552696	0.740305	12.351745	12.358436	3.785030
3	32780000001764	35	270.533529	-30.764936			0.6025	1.0	0.200000	0.0	...	0.708924	12.578053	12.528972	0.398169	0.114875	0.596295	0.711170	12.798067	12.809809	6.580914
4	32780000001766	49	270.575246	-30.785039			0.4850	1.0	0.224490	0.0	...	0.587902	13.324888	13.287657	0.285994	0.111611	0.478696	0.590306	13.522170	13.530534	5.746016
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
866878	32780000865946	56	270.961067	-29.417511			0.0230	1.0	0.357143	0.0	...	0.687494	12.213607	12.171067	0.347818	0.125109	0.563573	0.688682	12.398301	12.408567	2.734307
866879	32780000879797	56	270.962108	-29.392694			0.0260	1.0	0.375000	0.0	...	0.655494	12.358619	12.329497	0.212287	0.163109	0.493573	0.656682	12.536672	12.542938	6.877899
866880	32780000894698	57	270.987062	-29.378722			0.0310	1.0	0.350877	0.0	...	0.743494	10.892837	10.854592	0.297559	0.164110	0.580572	0.744682	11.089152	11.097934	5.883032
866881	32780000894881	55	272.064012	-29.893197			0.0285	1.0	0.381818	0.0	...	0.464515	13.531632	13.505489	0.203314	0.096373	0.369172	0.465545	13.667806	13.673903	5.310308
866882	32780000900244	56	272.123942	-29.912258			0.0280	1.0	0.267857	0.0	...	0.706600	12.044150	12.008327	0.273388	0.158705	0.549095	0.707800	12.229126	12.236734	0.922049

866883 rows × 73 columns

Well we have a lot of imformation to play here. Let’s check if we have some multiple types of sources

[7]:

df.groupby("vs_type").id.count()

[7]:

vs_type
               857450
BLAP                1
CV-DN               5
Cep-1               1
Cep-F               1
ECL-C             729
ECL-ELL           486
ECL-NC           3246
LPV-Mira          104
LPV-OSARG        3820
LPV-SRV           592
RRLyr-RRab        289
RRLyr-RRc         145
RRLyr-RRd           3
SP_ECL-C            1
SP_ECL-NC           1
T2Cep-BLHer         6
T2Cep-RVTau         2
T2Cep-WVir          1
Name: id, dtype: int64

Well 41 RRab stars (and more than 334K of unknow sources)

Well we have a lot to use here, lets make some plots.

Form now on, yo simple have a big pandas dataframe to manipulate.

All the methods of carpyncho.Carpyncho client are well documented and you can acces it whit the ‘?’ command in Jupyter

[8]:

client.get_catalog?

Signature: client.get_catalog(tile, catalog, force=False)
Docstring:
Retrieve a catalog from the carpyncho dataset.

Parameters
----------
tile: str
    The name of the tile.
catalog:
    The name of the catalog.
force: bool (default=False)
    If its True, the cached version of the catalog is ignored and
    redownloaded. Try to always set force to False.

Returns
-------
pandas.DataFrame:
    The columns of the DataFrame changes between the different catalog.

Raises
------
ValueError:
    If the tile or the catalog is not found.
IOError:
    If the checksum not match.
File:      ~/proyectos/carpyncho-py/src/carpyncho.py
Type:      method

[9]:

import datetime as dt
dt.datetime.now()

[9]:

datetime.datetime(2022, 6, 13, 0, 1, 52, 183122)