End User
These are reference class and function definitions likely to be useful to everyone.
|
Create a Catalog object |
Dict of driver: DataSource class |
|
|
Add runtime driver definition to list of registered drivers |
|
Remove runtime registered driver |
|
Read CSV files into dataframes |
Read textfiles as sequence of lines |
|
Read JSON files as a single dictionary or list |
|
Read a JSONL (https://jsonlines.org/) file and return a list of objects, each being valid json object (e.g. |
|
|
Read numpy binary files into an array |
|
Read Zarr format files into an array |
|
Catalog as described by a single YAML file |
|
Catalog as described by a multiple YAML files |
|
A catalog of the members of a Zarr group. |
|
Top level GUI panel |
- intake.open_catalog(uri=None, **kwargs)
Create a Catalog object
New in V2: if the URL is a single file, and loading it as a V1 catalog fails because of the stated version, it will be opened again as a V2 catalog. This will mean reading the file twice, so calling
from_yaml_file
directly ie better.Can load YAML catalog files, connect to an intake server, or create any arbitrary Catalog subclass instance. In the general case, the user should supply
driver=
with a value from the plugins registry which has a container type of catalog. File locations can generally be remote, if specifying a URL protocol.The default behaviour if not specifying the driver is as follows:
if
uri
is a single string ending in “yml” or “yaml”, open it as a catalog fileif
uri
is a list of strings, a string containing a glob character (“*”) or a string not ending in “y(a)ml”, open as a set of catalog files. In the latter case, assume it is a directory.if
uri
begins with protocol"intake:"
, connect to a remote Intake serverif
uri
isNone
or missing, create a base Catalog object without entries.
- Parameters
- uri: str or pathlib.Path
Designator for the location of the catalog.
- kwargs:
passed to subclass instance, see documentation of the individual catalog classes. For example,
yaml_files_cat
(when specifying multiple uris or a glob string) takes the additional parameterflatten=True|False
, specifying whether all data sources are merged in a single namespace, or each file becomes a sub-catalog.
See also
intake.open_yaml_files_cat
,intake.open_yaml_file_cat
intake.open_intake_remote
- intake.registry
Mapping from plugin names to the DataSource classes that implement them. These are the names that should appear in the
driver:
key of each source definition in a catalog. See Plugin Directory for more details.
- intake.open_
Set of functions, one for each plugin, for direct opening of a data source. The names are derived from the names of the plugins in the registry at import time.
- class intake.interface.gui.GUI(cats=None)
Top level GUI panel
This class is responsible for coordinating the inputs and outputs of various sup-panels and their effects on each other.
- Parameters
- cats: dict of catalogs
catalogs used to initalize the cat panel, {display_name: cat_object}
- property cats
Cats that have been selected from the cat sub-panel
- property sources
Sources that have been selected from the source sub-panel
Source classes
- class intake.source.csv.CSVSource(*args, **kwargs)
Read CSV files into dataframes
Backward compatibility for V1 catalogs.
- __init__(urlpath, storage_options=None, metadata=None, **kwargs)
- discover()
Open resource and populate the source attributes.
- read()
Load entire dataset into a container and return it
- read_partition(i)
Return a part of the data corresponding to i-th partition.
By default, assumes i should be an integer between zero and npartitions; override for more complex indexing schemes.
- to_dask()
Return a dask container for this data source
- class intake.source.zarr.ZarrArraySource(*args, **kwargs)
Read Zarr format files into an array
Zarr is an numerical array storage format which works particularly well with remote and parallel access. For specifics of the format, see https://zarr.readthedocs.io/en/stable/
- __init__(urlpath, storage_options=None, component=None, metadata=None)
- Parameters
- urlpathstr
Location of data file(s), possibly including protocol information
- storage_optionsdict
Passed on to storage backend for remote files
- componentstr or None
If None, assume the URL points to an array. If given, assume the URL points to a group, and descend the group to find the array at this location in the hierarchy; components are separated by the “/” character.
- discover()
Open resource and populate the source attributes.
- read()
Load entire dataset into a container and return it
- read_partition(i)
Return a part of the data corresponding to i-th partition.
By default, assumes i should be an integer between zero and npartitions; override for more complex indexing schemes.
- to_dask()
Return a dask container for this data source
- class intake.source.textfiles.TextFilesSource(*args, **kwargs)
Read textfiles as sequence of lines
Prototype of sources reading sequential data.
Takes a set of files, and returns an iterator over the text in each of them. The files can be local or remote. Extra parameters for encoding, etc., go into
storage_options
.- __init__(urlpath, text_mode=True, text_encoding='utf8', compression=None, decoder=None, metadata=None, storage_options=None)
- Parameters
- urlpathstr or list(str)
Target files. Can be a glob-path (with “*”) and include protocol specified (e.g., “s3://”). Can also be a list of absolute paths.
- text_modebool
Whether to open the file in text mode, recoding binary characters on the fly
- text_encodingstr
If text_mode is True, apply this encoding. UTF* is by far the most common
- compressionstr or None
If given, decompress the file with the given codec on load. Can be something like “gzip”, “bz2”, or to try to guess from the filename, ‘infer’
- decoderfunction, str or None
Use this to decode the contents of files. If None, you will get a list of lines of text/bytes. If a function, it must operate on an open file-like object or a bytes/str instance, and return a list
- storage_options: dict
Options to pass to the file reader backend, including text-specific encoding arguments, and parameters specific to the remote file-system driver, if using.
- discover()
Open resource and populate the source attributes.
- read()
Load entire dataset into a container and return it
- read_partition(i)
Return a part of the data corresponding to i-th partition.
By default, assumes i should be an integer between zero and npartitions; override for more complex indexing schemes.
- to_dask()
Return a dask container for this data source
- class intake.source.jsonfiles.JSONFileSource(*args, **kwargs)
Read JSON files as a single dictionary or list
The files can be local or remote. Extra parameters for encoding, etc., go into
storage_options
.- __init__(urlpath: str, text_mode: bool = True, text_encoding: str = 'utf8', compression: Optional[str] = None, read: bool = True, metadata: Optional[dict] = None, storage_options: Optional[dict] = None)
- Parameters
- urlpathstr
Target file. Can include protocol specified (e.g., “s3://”).
- text_modebool
Whether to open the file in text mode, recoding binary characters on the fly
- text_encodingstr
If text_mode is True, apply this encoding. UTF* is by far the most common
- compressionstr or None
If given, decompress the file with the given codec on load. Can be something like “zip”, “gzip”, “bz2”, or to try to guess from the filename, ‘infer’
- storage_options: dict
Options to pass to the file reader backend, including text-specific encoding arguments, and parameters specific to the remote file-system driver, if using.
- discover()
Open resource and populate the source attributes.
- read()
Load entire dataset into a container and return it
- class intake.source.jsonfiles.JSONLinesFileSource(*args, **kwargs)
Read a JSONL (https://jsonlines.org/) file and return a list of objects, each being valid json object (e.g. a dictionary or list)
- __init__(urlpath: str, text_mode: bool = True, text_encoding: str = 'utf8', compression: Optional[str] = None, read: bool = True, metadata: Optional[dict] = None, storage_options: Optional[dict] = None)
- Parameters
- urlpathstr
Target file. Can include protocol specified (e.g., “s3://”).
- text_modebool
Whether to open the file in text mode, recoding binary characters on the fly
- text_encodingstr
If text_mode is True, apply this encoding. UTF* is by far the most common
- compressionstr or None
If given, decompress the file with the given codec on load. Can be something like “zip”, “gzip”, “bz2”, or to try to guess from the filename, ‘infer’.
- storage_options: dict
Options to pass to the file reader backend, including text-specific encoding arguments, and parameters specific to the remote file-system driver, if using.
- discover()
Open resource and populate the source attributes.
- read()
Load entire dataset into a container and return it
- class intake.source.npy.NPySource(*args, **kwargs)
Read numpy binary files into an array
Prototype source showing example of working with arrays
Each file becomes one or more partitions, but partitioning within a file is only along the largest dimension, to ensure contiguous data.
- __init__(path, storage_options=None, metadata=None)
The parameters dtype and shape will be determined from the first file, if not given.
- Parameters
- path: str of list of str
Location of data file(s), possibly including glob and protocol information
- storage_options: dict
Passed to file-system backend.
- discover()
Open resource and populate the source attributes.
- read()
Load entire dataset into a container and return it
- read_partition(i)
Return a part of the data corresponding to i-th partition.
By default, assumes i should be an integer between zero and npartitions; override for more complex indexing schemes.
- to_dask()
Return a dask container for this data source
- class intake.catalog.local.YAMLFileCatalog(*args, **kwargs)
Catalog as described by a single YAML file
- __init__(path=None, text=None, autoreload=True, **kwargs)
- Parameters
- path: str
Location of the file to parse (can be remote)
- text: str (DEPRECATED)
YAML contents of catalog, takes precedence over path
- autoreloadbool
Whether to watch the source file for changes; make False if you want an editable Catalog
- reload()
Reload catalog if sufficient time has passed
- walk(sofar=None, prefix=None, depth=2)
Get all entries in this catalog and sub-catalogs
- Parameters
- sofar: dict or None
Within recursion, use this dict for output
- prefix: list of str or None
Names of levels already visited
- depth: int
Number of levels to descend; needed to truncate circular references and for cleaner output
- Returns
- Dict where the keys are the entry names in dotted syntax, and the
- values are entry instances.
- class intake.catalog.local.YAMLFilesCatalog(*args, **kwargs)
Catalog as described by a multiple YAML files
- __init__(path, flatten=True, **kwargs)
- Parameters
- path: str
Location of the files to parse (can be remote), including possible glob (*) character(s). Can also be list of paths, without glob characters.
- flatten: bool (True)
Whether to list all entries in the cats at the top level (True) or create sub-cats from each file (False).
- reload()
Reload catalog if sufficient time has passed
- walk(sofar=None, prefix=None, depth=2)
Get all entries in this catalog and sub-catalogs
- Parameters
- sofar: dict or None
Within recursion, use this dict for output
- prefix: list of str or None
Names of levels already visited
- depth: int
Number of levels to descend; needed to truncate circular references and for cleaner output
- Returns
- Dict where the keys are the entry names in dotted syntax, and the
- values are entry instances.
- class intake.catalog.zarr.ZarrGroupCatalog(*args, **kwargs)
A catalog of the members of a Zarr group.
- __init__(urlpath, storage_options=None, component=None, metadata=None, consolidated=False, name=None)
- Parameters
- urlpathstr
Location of data file(s), possibly including protocol information
- storage_optionsdict, optional
Passed on to storage backend for remote files
- componentstr, optional
If None, build a catalog from the root group. If given, build the catalog from the group at this location in the hierarchy.
- metadatadict, optional
Catalog metadata. If not provided, will be populated from Zarr group attributes.
- consolidatedbool, optional
If True, assume Zarr metadata has been consolidated.
- reload()
Reload catalog if sufficient time has passed
- walk(sofar=None, prefix=None, depth=2)
Get all entries in this catalog and sub-catalogs
- Parameters
- sofar: dict or None
Within recursion, use this dict for output
- prefix: list of str or None
Names of levels already visited
- depth: int
Number of levels to descend; needed to truncate circular references and for cleaner output
- Returns
- Dict where the keys are the entry names in dotted syntax, and the
- values are entry instances.