End User

These are reference class and function definitions likely to be useful to everyone.

`intake.open_catalog`([uri])	Create a Catalog object
`intake.registry`	Dict of driver: DataSource class
`intake.register_driver`(name, value[, ...])	Add runtime driver definition to list of registered drivers
`intake.unregister_driver`(name)	Remove runtime registered driver
`intake.source.csv.CSVSource`(args, *kwargs)	Read CSV files into dataframes
`intake.source.textfiles.TextFilesSource`(...)	Read textfiles as sequence of lines
`intake.source.jsonfiles.JSONFileSource`(...)	Read JSON files as a single dictionary or list
`intake.source.jsonfiles.JSONLinesFileSource`(...)	Read a JSONL (https://jsonlines.org/) file and return a list of objects, each being valid json object (e.g. a dictionary or list).
`intake.source.npy.NPySource`(args, *kwargs)	Read numpy binary files into an array
`intake.source.zarr.ZarrArraySource`(*args, ...)	Read Zarr format files into an array
`intake.catalog.local.YAMLFileCatalog`(*args, ...)	Catalog as described by a single YAML file
`intake.catalog.local.YAMLFilesCatalog`(*args, ...)	Catalog as described by a multiple YAML files
`intake.catalog.zarr.ZarrGroupCatalog`(*args, ...)	A catalog of the members of a Zarr group.

intake.open_catalog(uri=None, **kwargs)

Create a Catalog object

New in V2: if the URL is a single file, and loading it as a V1 catalog fails because of the stated version, it will be opened again as a V2 catalog. This will mean reading the file twice, so calling from_yaml_file directly ie better.

Can load YAML catalog files, connect to an intake server, or create any arbitrary Catalog subclass instance. In the general case, the user should supply driver= with a value from the plugins registry which has a container type of catalog. File locations can generally be remote, if specifying a URL protocol.

The default behaviour if not specifying the driver is as follows:

if uri is a single string ending in “yml” or “yaml”, open it as a catalog file
if uri is a list of strings, a string containing a glob character (“*”) or a string not ending in “y(a)ml”, open as a set of catalog files. In the latter case, assume it is a directory.
if uri begins with protocol "intake:", connect to a remote Intake server
if uri is None or missing, create a base Catalog object without entries.

Parameters:

uri: str or pathlib.Path: Designator for the location of the catalog.
kwargs:: passed to subclass instance, see documentation of the individual catalog classes. For example, yaml_files_cat (when specifying multiple uris or a glob string) takes the additional parameter flatten=True|False, specifying whether all data sources are merged in a single namespace, or each file becomes a sub-catalog.

Source classes

class intake.source.csv.CSVSource(*args, **kwargs)

Read CSV files into dataframes

Backward compatibility for V1 catalogs.

__init__(urlpath, storage_options=None, metadata=None, **kwargs)

discover(): Open resource and populate the source attributes.

read(): Load entire dataset into a container and return it

read_partition(i)

Return a part of the data corresponding to i-th partition.

By default, assumes i should be an integer between zero and npartitions; override for more complex indexing schemes.

to_dask(): Return a dask container for this data source

class intake.source.zarr.ZarrArraySource(*args, **kwargs)

Read Zarr format files into an array

Zarr is an numerical array storage format which works particularly well with remote and parallel access. For specifics of the format, see https://zarr.readthedocs.io/en/stable/

__init__(urlpath, storage_options=None, component=None, metadata=None)

Parameters:

urlpathstr: Location of data file(s), possibly including protocol information
storage_optionsdict: Passed on to storage backend for remote files
componentstr or None: If None, assume the URL points to an array. If given, assume the URL points to a group, and descend the group to find the array at this location in the hierarchy; components are separated by the “/” character.

discover(): Open resource and populate the source attributes.

read(): Load entire dataset into a container and return it

read_partition(i)

Return a part of the data corresponding to i-th partition.

By default, assumes i should be an integer between zero and npartitions; override for more complex indexing schemes.

to_dask(): Return a dask container for this data source

class intake.source.textfiles.TextFilesSource(*args, **kwargs)

Read textfiles as sequence of lines

Prototype of sources reading sequential data.

Takes a set of files, and returns an iterator over the text in each of them. The files can be local or remote. Extra parameters for encoding, etc., go into storage_options.

__init__(urlpath, text_mode=True, text_encoding='utf8', compression=None, decoder=None, metadata=None, storage_options=None)

Parameters:

urlpathstr or list(str): Target files. Can be a glob-path (with “*”) and include protocol specified (e.g., “s3://”). Can also be a list of absolute paths.
text_modebool: Whether to open the file in text mode, recoding binary characters on the fly
text_encodingstr: If text_mode is True, apply this encoding. UTF* is by far the most common
compressionstr or None: If given, decompress the file with the given codec on load. Can be something like “gzip”, “bz2”, or to try to guess from the filename, ‘infer’
decoderfunction, str or None: Use this to decode the contents of files. If None, you will get a list of lines of text/bytes. If a function, it must operate on an open file-like object or a bytes/str instance, and return a list
storage_options: dict: Options to pass to the file reader backend, including text-specific encoding arguments, and parameters specific to the remote file-system driver, if using.

discover(): Open resource and populate the source attributes.

read(): Load entire dataset into a container and return it

read_partition(i)

Return a part of the data corresponding to i-th partition.

By default, assumes i should be an integer between zero and npartitions; override for more complex indexing schemes.

to_dask(): Return a dask container for this data source

class intake.source.jsonfiles.JSONFileSource(*args, **kwargs)

Read JSON files as a single dictionary or list

The files can be local or remote. Extra parameters for encoding, etc., go into storage_options.

__init__(urlpath: str, text_mode: bool = True, text_encoding: str = 'utf8', compression: str = None, read: bool = True, metadata: dict = None, storage_options: dict = None)

Parameters:

urlpathstr: Target file. Can include protocol specified (e.g., “s3://”).
text_modebool: Whether to open the file in text mode, recoding binary characters on the fly
text_encodingstr: If text_mode is True, apply this encoding. UTF* is by far the most common
compressionstr or None: If given, decompress the file with the given codec on load. Can be something like “zip”, “gzip”, “bz2”, or to try to guess from the filename, ‘infer’
storage_options: dict: Options to pass to the file reader backend, including text-specific encoding arguments, and parameters specific to the remote file-system driver, if using.

discover(): Open resource and populate the source attributes.

read(): Load entire dataset into a container and return it

class intake.source.jsonfiles.JSONLinesFileSource(*args, **kwargs)

Read a JSONL (https://jsonlines.org/) file and return a list of objects, each being valid json object (e.g. a dictionary or list)

__init__(urlpath: str, text_mode: bool = True, text_encoding: str = 'utf8', compression: str = None, read: bool = True, metadata: dict = None, storage_options: dict = None)

Parameters:

urlpathstr: Target file. Can include protocol specified (e.g., “s3://”).
text_modebool: Whether to open the file in text mode, recoding binary characters on the fly
text_encodingstr: If text_mode is True, apply this encoding. UTF* is by far the most common
compressionstr or None: If given, decompress the file with the given codec on load. Can be something like “zip”, “gzip”, “bz2”, or to try to guess from the filename, ‘infer’.
storage_options: dict: Options to pass to the file reader backend, including text-specific encoding arguments, and parameters specific to the remote file-system driver, if using.

discover(): Open resource and populate the source attributes.

head(nrows: int = 100): return the first nrows lines from the file

read(): Load entire dataset into a container and return it

class intake.source.npy.NPySource(*args, **kwargs)

Read numpy binary files into an array

Prototype source showing example of working with arrays

Each file becomes one or more partitions, but partitioning within a file is only along the largest dimension, to ensure contiguous data.

__init__(path, storage_options=None, metadata=None)

The parameters dtype and shape will be determined from the first file, if not given.

Parameters:

path: str of list of str: Location of data file(s), possibly including glob and protocol information
storage_options: dict: Passed to file-system backend.

discover(): Open resource and populate the source attributes.

read(): Load entire dataset into a container and return it

read_partition(i)

Return a part of the data corresponding to i-th partition.

By default, assumes i should be an integer between zero and npartitions; override for more complex indexing schemes.

to_dask(): Return a dask container for this data source

class intake.catalog.local.YAMLFileCatalog(*args, **kwargs)

Catalog as described by a single YAML file

__init__(path=None, text=None, autoreload=True, **kwargs)

Parameters:

path: str: Location of the file to parse (can be remote)
text: str (DEPRECATED): YAML contents of catalog, takes precedence over path
autoreloadbool: Whether to watch the source file for changes; make False if you want an editable Catalog

reload(): Reload catalog if sufficient time has passed

walk(sofar=None, prefix=None, depth=2)

Get all entries in this catalog and sub-catalogs

Parameters:

sofar: dict or None: Within recursion, use this dict for output
prefix: list of str or None: Names of levels already visited
depth: int: Number of levels to descend; needed to truncate circular references and for cleaner output

Returns:

Dict where the keys are the entry names in dotted syntax, and the
values are entry instances.

class intake.catalog.local.YAMLFilesCatalog(*args, **kwargs)

Catalog as described by a multiple YAML files

__init__(path, flatten=True, **kwargs)

Parameters:

path: str: Location of the files to parse (can be remote), including possible glob (*) character(s). Can also be list of paths, without glob characters.
flatten: bool (True): Whether to list all entries in the cats at the top level (True) or create sub-cats from each file (False).

reload(): Reload catalog if sufficient time has passed

walk(sofar=None, prefix=None, depth=2)

Get all entries in this catalog and sub-catalogs

Parameters:

sofar: dict or None: Within recursion, use this dict for output
prefix: list of str or None: Names of levels already visited
depth: int: Number of levels to descend; needed to truncate circular references and for cleaner output

Returns:

Dict where the keys are the entry names in dotted syntax, and the
values are entry instances.

class intake.catalog.zarr.ZarrGroupCatalog(*args, **kwargs)

A catalog of the members of a Zarr group.

__init__(urlpath, storage_options=None, component=None, metadata=None, consolidated=False, name=None)

Parameters:

urlpathstr: Location of data file(s), possibly including protocol information
storage_optionsdict, optional: Passed on to storage backend for remote files
componentstr, optional: If None, build a catalog from the root group. If given, build the catalog from the group at this location in the hierarchy.
metadatadict, optional: Catalog metadata. If not provided, will be populated from Zarr group attributes.
consolidatedbool, optional: If True, assume Zarr metadata has been consolidated.

reload(): Reload catalog if sufficient time has passed

walk(sofar=None, prefix=None, depth=2)

Get all entries in this catalog and sub-catalogs

Parameters:

sofar: dict or None: Within recursion, use this dict for output
prefix: list of str or None: Names of levels already visited
depth: int: Number of levels to descend; needed to truncate circular references and for cleaner output

Returns:

Dict where the keys are the entry names in dotted syntax, and the
values are entry instances.