Introduction¶
This tutorial will show how to understand and manipilate the carpyncho
Python Client.
First we need to import the module, and instantiate the client
[1]:
# import the module
import carpyncho
# instance the client
client = carpyncho.Carpyncho()
Firsts lets check which tiles have available catalogs to download.
[2]:
client.list_tiles()
[2]:
('others',
'b206',
'b214',
'b216',
'b220',
'b228',
'b234',
'b247',
'b248',
'b261',
'b262',
'b263',
'b264',
'b277',
'b278',
'b356',
'b360',
'b396')
Well lets asume we are interested in the tile b216
, so we can check which catalogs are available in this tiles
[3]:
client.list_catalogs("b216")
[3]:
('features', 'lc')
Well we see that catalogs with the light curves (lc
), and the features of those curves (features
) are available.
So for example we now can retrieve more info of any of this catalogs, for simplicity let’s check the b216 lc
[4]:
client.catalog_info("b216", "lc")
[4]:
{'hname': 'Time-Series',
'format': 'BZIP2-Parquet',
'extension': '.parquet.bz2',
'date': '2020-04-14',
'md5sum': '236e126f82e80684f29247220470b831 lc_obs_b216.parquet.bz2',
'filename': 'lc_obs_b216.parquet.bz2',
'url': 'https://catalogs.iate.conicet.unc.edu.ar/carpyncho/lcurves/b216/lc_obs_b216.parquet.bz2',
'size': 369866999,
'records': 37839384}
The attribute hname
is a human readable version of the name of the catalog, the next two keys have information of format of the catalog (how is stored in the cloud), next are information about the date of publication of the file, check-sums and the cloud-ID (all of this is mostly for internal use).
Finally we have the two more important information: size
is the size in bytes of the file (352.7 MiB) and the number of records stored in the file (more than 37 millons).
Ok… to big, lets check the b278 features catalog
[5]:
client.catalog_info("b216", "features")
[5]:
{'hname': 'Features',
'format': 'BZIP2-Parquet',
'extension': '.parquet.bz2',
'date': '2020-04-14',
'md5sum': '433aae05541a2f5b191aa95d717fa83c features_b216.parquet.bz2',
'filename': 'features_b216.parquet.bz2',
'url': 'https://catalogs.iate.conicet.unc.edu.ar/carpyncho/lcurves/b216/features_b216.parquet.bz2',
'size': 149073679,
'records': 334773}
In this case this file is only 142.2 MiB
of size, let’s retrive it into a dataframe.
[6]:
# the first time this can be slow
df = client.get_catalog("b278", "features")
df
b278-features: 349MB [02:23, 2.42MB/s]
[6]:
id | cnt | ra_k | dec_k | vs_type | vs_catalog | Amplitude | Autocor_length | Beyond1Std | Con | ... | c89_jk_color | c89_m2 | c89_m4 | n09_c3 | n09_hk_color | n09_jh_color | n09_jk_color | n09_m2 | n09_m4 | ppmb | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 32780000001647 | 33 | 270.675437 | -30.833556 | 0.4205 | 1.0 | 0.303030 | 0.0 | ... | 0.254668 | 14.690558 | 14.666523 | 0.183141 | 0.026877 | 0.229526 | 0.256404 | 14.820497 | 14.825871 | 0.000044 | ||
1 | 32780000001722 | 32 | 270.601058 | -30.797561 | 0.2815 | 1.0 | 0.250000 | 0.0 | ... | 0.714901 | 14.020039 | 13.975850 | 0.344830 | 0.136610 | 0.580695 | 0.717305 | 14.243584 | 14.253669 | 5.275137 | ||
2 | 32780000001725 | 31 | 270.586525 | -30.790697 | 0.7770 | 1.0 | 0.225806 | 0.0 | ... | 0.737901 | 12.123443 | 12.090208 | 0.228780 | 0.187609 | 0.552696 | 0.740305 | 12.351745 | 12.358436 | 3.785030 | ||
3 | 32780000001764 | 35 | 270.533529 | -30.764936 | 0.6025 | 1.0 | 0.200000 | 0.0 | ... | 0.708924 | 12.578053 | 12.528972 | 0.398169 | 0.114875 | 0.596295 | 0.711170 | 12.798067 | 12.809809 | 6.580914 | ||
4 | 32780000001766 | 49 | 270.575246 | -30.785039 | 0.4850 | 1.0 | 0.224490 | 0.0 | ... | 0.587902 | 13.324888 | 13.287657 | 0.285994 | 0.111611 | 0.478696 | 0.590306 | 13.522170 | 13.530534 | 5.746016 | ||
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
866878 | 32780000865946 | 56 | 270.961067 | -29.417511 | 0.0230 | 1.0 | 0.357143 | 0.0 | ... | 0.687494 | 12.213607 | 12.171067 | 0.347818 | 0.125109 | 0.563573 | 0.688682 | 12.398301 | 12.408567 | 2.734307 | ||
866879 | 32780000879797 | 56 | 270.962108 | -29.392694 | 0.0260 | 1.0 | 0.375000 | 0.0 | ... | 0.655494 | 12.358619 | 12.329497 | 0.212287 | 0.163109 | 0.493573 | 0.656682 | 12.536672 | 12.542938 | 6.877899 | ||
866880 | 32780000894698 | 57 | 270.987062 | -29.378722 | 0.0310 | 1.0 | 0.350877 | 0.0 | ... | 0.743494 | 10.892837 | 10.854592 | 0.297559 | 0.164110 | 0.580572 | 0.744682 | 11.089152 | 11.097934 | 5.883032 | ||
866881 | 32780000894881 | 55 | 272.064012 | -29.893197 | 0.0285 | 1.0 | 0.381818 | 0.0 | ... | 0.464515 | 13.531632 | 13.505489 | 0.203314 | 0.096373 | 0.369172 | 0.465545 | 13.667806 | 13.673903 | 5.310308 | ||
866882 | 32780000900244 | 56 | 272.123942 | -29.912258 | 0.0280 | 1.0 | 0.267857 | 0.0 | ... | 0.706600 | 12.044150 | 12.008327 | 0.273388 | 0.158705 | 0.549095 | 0.707800 | 12.229126 | 12.236734 | 0.922049 |
866883 rows × 73 columns
Well we have a lot of imformation to play here. Let’s check if we have some multiple types of sources
[7]:
df.groupby("vs_type").id.count()
[7]:
vs_type
857450
BLAP 1
CV-DN 5
Cep-1 1
Cep-F 1
ECL-C 729
ECL-ELL 486
ECL-NC 3246
LPV-Mira 104
LPV-OSARG 3820
LPV-SRV 592
RRLyr-RRab 289
RRLyr-RRc 145
RRLyr-RRd 3
SP_ECL-C 1
SP_ECL-NC 1
T2Cep-BLHer 6
T2Cep-RVTau 2
T2Cep-WVir 1
Name: id, dtype: int64
Well 41 RRab stars (and more than 334K of unknow sources)
Well we have a lot to use here, lets make some plots.
Form now on, yo simple have a big pandas dataframe to manipulate.
All the methods of carpyncho.Carpyncho
client are well documented and you can acces it whit the ‘?’ command in Jupyter
[8]:
client.get_catalog?
Signature: client.get_catalog(tile, catalog, force=False)
Docstring:
Retrieve a catalog from the carpyncho dataset.
Parameters
----------
tile: str
The name of the tile.
catalog:
The name of the catalog.
force: bool (default=False)
If its True, the cached version of the catalog is ignored and
redownloaded. Try to always set force to False.
Returns
-------
pandas.DataFrame:
The columns of the DataFrame changes between the different catalog.
Raises
------
ValueError:
If the tile or the catalog is not found.
IOError:
If the checksum not match.
File: ~/proyectos/carpyncho-py/src/carpyncho.py
Type: method
[9]:
import datetime as dt
dt.datetime.now()
[9]:
datetime.datetime(2022, 6, 13, 0, 1, 52, 183122)