carpyncho module¶

Python client for Carpyncho VVV dataset collection.

This code access as a Pandas DataFrame all the data of the web version of Carpyncho https://carpyncho.github.io/.

class carpyncho.Carpyncho(cache_path: str = PosixPath('/home/docs/carpyncho_py_data/_cache_'), cache_expire: float = None, parquet_engine: str = 'auto', index_url: str = 'https://raw.githubusercontent.com/carpyncho/carpyncho-py/master/data/index.json')[source]¶

Bases: object

Client to access the Carpyncho VVV dataset collection.

This code access as a Pandas Dataframe all the data of the web version of Carpyncho. https://carpyncho.github.io/.

Parameters:

cache (diskcache.Cache, diskcache.Fanout,) – or None (default: None) Any instance of diskcache.Cache, diskcache.Fanout or None (Default). If it’s None a diskcache.Cache istance is created with the parameter directory = carpyncho.DEFAULT_CACHE_DIR. More information: http://www.grantjenks.com/docs/diskcache
cache_expire (float or None (default=``None``)) – Seconds until item expires (default None, no expiry) More information: http://www.grantjenks.com/docs/diskcache
parquet_engine (str (default=”auto”)) – Default Parquet library to use. Remotely carpyncho stores all the data as compresses parquet files; When the download happend a this must be parsed. If ‘auto’, then the option io.parquet.engine is used. The default io.parquet.engine behavior is to try ‘pyarrow’, falling back to ‘fastparquet’ if ‘pyarrow’ is unavailable.

cache¶: Return the internal cache of the client the internal cache.

cache_expire = None¶: Default timeout of the catalog-cache. Try to always set to None (default), the catalogs are big and mostly never change.

cache_path = None¶: Location of the catalog cache

catalog_info(tile, catalog)[source]¶

Retrieve the information about a given catalog.

Parameters:	tile (str) – The name of the tile. catalog – The name of the catalog.
Returns:	The entire information of the given catalog file. This include url, md5 checksum, size in bytes, number of total records, etc.
Return type:	dict
Raises:	ValueError: – If the tile or the catalog is not found.

get_catalog(tile, catalog, force=False)[source]¶

Retrieve a catalog from the carpyncho dataset.

Parameters:	tile (str) – The name of the tile. catalog – The name of the catalog. force (bool (default=False)) – If its True, the cached version of the catalog is ignored and redownloaded. Try to always set force to False.
Returns:	The columns of the DataFrame changes between the different catalog.
Return type:	pandas.DataFrame
Raises:	ValueError: – If the tile or the catalog is not found. IOError: – If the checksum not match.

has_catalog(tile, catalog)[source]¶

Check if a given catalog and tile exists.

Parameters:	tile (str) – The name of the tile. catalog – The name of the catalog.
Returns:	True if the convination tile+catalog exists.
Return type:	bool

index_¶: Structure of the Carpyncho dataset information as a Python-dict.

index_url = None¶: Location of the carpyncho index (usefull for development)

list_catalogs(tile)[source]¶

Retrieve the available catalogs for a given tile.

Parameters:	tile (str) – The name of the tile to retrieve the catalogs.
Returns:	The names of available catalogs in the given tile.
Return type:	tuple of str
Raises:	ValueError: – If the tile is not found.

list_tiles()[source]¶: Retrieve available tiles with catalogs as a tuple of str.

parquet_engine = None¶: Default Parquet library to use.

retrieve_index(reset)[source]¶

Access the remote index of the Carpyncho-Dataset.

The index is stored internally for 1 hr.

Parameters:	reset (bool) – If its True the entire cache is ignored and a new index is donwloaded and cached.
Returns:
Return type:	dict with the index structure.

carpyncho.CARPYNCHOPY_DATA_PATH = PosixPath('/home/docs/carpyncho_py_data')¶: Where carpyncho gonna store the entire data.