carpyncho module¶
Python client for Carpyncho VVV dataset collection.
This code access as a Pandas DataFrame all the data of the web version of Carpyncho https://carpyncho.github.io/.
-
class
carpyncho.
Carpyncho
(cache_path: str = PosixPath('/home/docs/carpyncho_py_data/_cache_'), cache_expire: float = None, parquet_engine: str = 'auto', index_url: str = 'https://raw.githubusercontent.com/carpyncho/carpyncho-py/master/data/index.json')[source]¶ Bases:
object
Client to access the Carpyncho VVV dataset collection.
This code access as a Pandas Dataframe all the data of the web version of Carpyncho. https://carpyncho.github.io/.
Parameters: - cache (
diskcache.Cache
,diskcache.Fanout
,) – orNone
(default:None
) Any instance ofdiskcache.Cache
,diskcache.Fanout
orNone
(Default). If it’sNone
adiskcache.Cache
istance is created with the parameterdirectory = carpyncho.DEFAULT_CACHE_DIR
. More information: http://www.grantjenks.com/docs/diskcache - cache_expire (
float
or None (default=``None``)) – Seconds until item expires (defaultNone
, no expiry) More information: http://www.grantjenks.com/docs/diskcache - parquet_engine (
str
(default=”auto”)) – Default Parquet library to use. Remotely carpyncho stores all the data as compresses parquet files; When the download happend a this must be parsed. If ‘auto’, then the option io.parquet.engine is used. The default io.parquet.engine behavior is to try ‘pyarrow’, falling back to ‘fastparquet’ if ‘pyarrow’ is unavailable.
-
cache
¶ Return the internal cache of the client the internal cache.
-
cache_expire
= None¶ Default timeout of the catalog-cache. Try to always set to None (default), the catalogs are big and mostly never change.
-
cache_path
= None¶ Location of the catalog cache
-
catalog_info
(tile, catalog)[source]¶ Retrieve the information about a given catalog.
Parameters: - tile (str) – The name of the tile.
- catalog – The name of the catalog.
Returns: The entire information of the given catalog file. This include url, md5 checksum, size in bytes, number of total records, etc.
Return type: dict
Raises: ValueError: – If the tile or the catalog is not found.
-
get_catalog
(tile, catalog, force=False)[source]¶ Retrieve a catalog from the carpyncho dataset.
Parameters: - tile (str) – The name of the tile.
- catalog – The name of the catalog.
- force (bool (default=False)) – If its True, the cached version of the catalog is ignored and redownloaded. Try to always set force to False.
Returns: The columns of the DataFrame changes between the different catalog.
Return type: pandas.DataFrame
Raises: - ValueError: – If the tile or the catalog is not found.
- IOError: – If the checksum not match.
-
has_catalog
(tile, catalog)[source]¶ Check if a given catalog and tile exists.
Parameters: - tile (str) – The name of the tile.
- catalog – The name of the catalog.
Returns: True if the convination tile+catalog exists.
Return type: bool
-
index_
¶ Structure of the Carpyncho dataset information as a Python-dict.
-
index_url
= None¶ Location of the carpyncho index (usefull for development)
-
list_catalogs
(tile)[source]¶ Retrieve the available catalogs for a given tile.
Parameters: tile (str) – The name of the tile to retrieve the catalogs. Returns: The names of available catalogs in the given tile. Return type: tuple of str Raises: ValueError: – If the tile is not found.
-
parquet_engine
= None¶ Default Parquet library to use.
- cache (
-
carpyncho.
CARPYNCHOPY_DATA_PATH
= PosixPath('/home/docs/carpyncho_py_data')¶ Where carpyncho gonna store the entire data.