End User

These are reference class and function definitions likely to be useful to everyone.

intake.open_catalog([uri])

Create a Catalog object

intake.registry

Dict of driver: DataSource class

intake.register_driver(name, value[, ...])

Add runtime driver definition to list of registered drivers

intake.unregister_driver(name)

Remove runtime registered driver

intake.source.csv.CSVSource(*args, **kwargs)

Read CSV files into dataframes

intake.source.textfiles.TextFilesSource(...)

Read textfiles as sequence of lines

intake.source.jsonfiles.JSONFileSource(...)

Read JSON files as a single dictionary or list

intake.source.jsonfiles.JSONLinesFileSource(...)

Read a JSONL (https://jsonlines.org/) file and return a list of objects, each being valid json object (e.g.

intake.source.npy.NPySource(*args, **kwargs)

Read numpy binary files into an array

intake.source.zarr.ZarrArraySource(*args, ...)

Read Zarr format files into an array

intake.catalog.local.YAMLFileCatalog(*args, ...)

Catalog as described by a single YAML file

intake.catalog.local.YAMLFilesCatalog(*args, ...)

Catalog as described by a multiple YAML files

intake.catalog.zarr.ZarrGroupCatalog(*args, ...)

A catalog of the members of a Zarr group.

intake.interface.gui.GUI([cats])

Top level GUI panel

intake.open_catalog(uri=None, **kwargs)

Create a Catalog object

New in V2: if the URL is a single file, and loading it as a V1 catalog fails because of the stated version, it will be opened again as a V2 catalog. This will mean reading the file twice, so calling from_yaml_file directly ie better.

Can load YAML catalog files, connect to an intake server, or create any arbitrary Catalog subclass instance. In the general case, the user should supply driver= with a value from the plugins registry which has a container type of catalog. File locations can generally be remote, if specifying a URL protocol.

The default behaviour if not specifying the driver is as follows:

  • if uri is a single string ending in “yml” or “yaml”, open it as a catalog file

  • if uri is a list of strings, a string containing a glob character (“*”) or a string not ending in “y(a)ml”, open as a set of catalog files. In the latter case, assume it is a directory.

  • if uri begins with protocol "intake:", connect to a remote Intake server

  • if uri is None or missing, create a base Catalog object without entries.

Parameters
uri: str or pathlib.Path

Designator for the location of the catalog.

kwargs:

passed to subclass instance, see documentation of the individual catalog classes. For example, yaml_files_cat (when specifying multiple uris or a glob string) takes the additional parameter flatten=True|False, specifying whether all data sources are merged in a single namespace, or each file becomes a sub-catalog.

See also

intake.open_yaml_files_cat, intake.open_yaml_file_cat
intake.open_intake_remote
intake.registry

Mapping from plugin names to the DataSource classes that implement them. These are the names that should appear in the driver: key of each source definition in a catalog. See Plugin Directory for more details.

intake.open_

Set of functions, one for each plugin, for direct opening of a data source. The names are derived from the names of the plugins in the registry at import time.

class intake.interface.gui.GUI(cats=None)

Top level GUI panel

This class is responsible for coordinating the inputs and outputs of various sup-panels and their effects on each other.

Parameters
cats: dict of catalogs

catalogs used to initalize the cat panel, {display_name: cat_object}

property cats

Cats that have been selected from the cat sub-panel

property sources

Sources that have been selected from the source sub-panel

Source classes

class intake.source.csv.CSVSource(*args, **kwargs)

Read CSV files into dataframes

Backward compatibility for V1 catalogs.

__init__(urlpath, storage_options=None, metadata=None, **kwargs)
discover()

Open resource and populate the source attributes.

read()

Load entire dataset into a container and return it

read_partition(i)

Return a part of the data corresponding to i-th partition.

By default, assumes i should be an integer between zero and npartitions; override for more complex indexing schemes.

to_dask()

Return a dask container for this data source

class intake.source.zarr.ZarrArraySource(*args, **kwargs)

Read Zarr format files into an array

Zarr is an numerical array storage format which works particularly well with remote and parallel access. For specifics of the format, see https://zarr.readthedocs.io/en/stable/

__init__(urlpath, storage_options=None, component=None, metadata=None)
Parameters
urlpathstr

Location of data file(s), possibly including protocol information

storage_optionsdict

Passed on to storage backend for remote files

componentstr or None

If None, assume the URL points to an array. If given, assume the URL points to a group, and descend the group to find the array at this location in the hierarchy; components are separated by the “/” character.

discover()

Open resource and populate the source attributes.

read()

Load entire dataset into a container and return it

read_partition(i)

Return a part of the data corresponding to i-th partition.

By default, assumes i should be an integer between zero and npartitions; override for more complex indexing schemes.

to_dask()

Return a dask container for this data source

class intake.source.textfiles.TextFilesSource(*args, **kwargs)

Read textfiles as sequence of lines

Prototype of sources reading sequential data.

Takes a set of files, and returns an iterator over the text in each of them. The files can be local or remote. Extra parameters for encoding, etc., go into storage_options.

__init__(urlpath, text_mode=True, text_encoding='utf8', compression=None, decoder=None, metadata=None, storage_options=None)
Parameters
urlpathstr or list(str)

Target files. Can be a glob-path (with “*”) and include protocol specified (e.g., “s3://”). Can also be a list of absolute paths.

text_modebool

Whether to open the file in text mode, recoding binary characters on the fly

text_encodingstr

If text_mode is True, apply this encoding. UTF* is by far the most common

compressionstr or None

If given, decompress the file with the given codec on load. Can be something like “gzip”, “bz2”, or to try to guess from the filename, ‘infer’

decoderfunction, str or None

Use this to decode the contents of files. If None, you will get a list of lines of text/bytes. If a function, it must operate on an open file-like object or a bytes/str instance, and return a list

storage_options: dict

Options to pass to the file reader backend, including text-specific encoding arguments, and parameters specific to the remote file-system driver, if using.

discover()

Open resource and populate the source attributes.

read()

Load entire dataset into a container and return it

read_partition(i)

Return a part of the data corresponding to i-th partition.

By default, assumes i should be an integer between zero and npartitions; override for more complex indexing schemes.

to_dask()

Return a dask container for this data source

class intake.source.jsonfiles.JSONFileSource(*args, **kwargs)

Read JSON files as a single dictionary or list

The files can be local or remote. Extra parameters for encoding, etc., go into storage_options.

__init__(urlpath: str, text_mode: bool = True, text_encoding: str = 'utf8', compression: Optional[str] = None, read: bool = True, metadata: Optional[dict] = None, storage_options: Optional[dict] = None)
Parameters
urlpathstr

Target file. Can include protocol specified (e.g., “s3://”).

text_modebool

Whether to open the file in text mode, recoding binary characters on the fly

text_encodingstr

If text_mode is True, apply this encoding. UTF* is by far the most common

compressionstr or None

If given, decompress the file with the given codec on load. Can be something like “zip”, “gzip”, “bz2”, or to try to guess from the filename, ‘infer’

storage_options: dict

Options to pass to the file reader backend, including text-specific encoding arguments, and parameters specific to the remote file-system driver, if using.

discover()

Open resource and populate the source attributes.

read()

Load entire dataset into a container and return it

class intake.source.jsonfiles.JSONLinesFileSource(*args, **kwargs)

Read a JSONL (https://jsonlines.org/) file and return a list of objects, each being valid json object (e.g. a dictionary or list)

__init__(urlpath: str, text_mode: bool = True, text_encoding: str = 'utf8', compression: Optional[str] = None, read: bool = True, metadata: Optional[dict] = None, storage_options: Optional[dict] = None)
Parameters
urlpathstr

Target file. Can include protocol specified (e.g., “s3://”).

text_modebool

Whether to open the file in text mode, recoding binary characters on the fly

text_encodingstr

If text_mode is True, apply this encoding. UTF* is by far the most common

compressionstr or None

If given, decompress the file with the given codec on load. Can be something like “zip”, “gzip”, “bz2”, or to try to guess from the filename, ‘infer’.

storage_options: dict

Options to pass to the file reader backend, including text-specific encoding arguments, and parameters specific to the remote file-system driver, if using.

discover()

Open resource and populate the source attributes.

head(nrows: int = 100)

return the first nrows lines from the file

read()

Load entire dataset into a container and return it

class intake.source.npy.NPySource(*args, **kwargs)

Read numpy binary files into an array

Prototype source showing example of working with arrays

Each file becomes one or more partitions, but partitioning within a file is only along the largest dimension, to ensure contiguous data.

__init__(path, storage_options=None, metadata=None)

The parameters dtype and shape will be determined from the first file, if not given.

Parameters
path: str of list of str

Location of data file(s), possibly including glob and protocol information

storage_options: dict

Passed to file-system backend.

discover()

Open resource and populate the source attributes.

read()

Load entire dataset into a container and return it

read_partition(i)

Return a part of the data corresponding to i-th partition.

By default, assumes i should be an integer between zero and npartitions; override for more complex indexing schemes.

to_dask()

Return a dask container for this data source

class intake.catalog.local.YAMLFileCatalog(*args, **kwargs)

Catalog as described by a single YAML file

__init__(path=None, text=None, autoreload=True, **kwargs)
Parameters
path: str

Location of the file to parse (can be remote)

text: str (DEPRECATED)

YAML contents of catalog, takes precedence over path

autoreloadbool

Whether to watch the source file for changes; make False if you want an editable Catalog

reload()

Reload catalog if sufficient time has passed

walk(sofar=None, prefix=None, depth=2)

Get all entries in this catalog and sub-catalogs

Parameters
sofar: dict or None

Within recursion, use this dict for output

prefix: list of str or None

Names of levels already visited

depth: int

Number of levels to descend; needed to truncate circular references and for cleaner output

Returns
Dict where the keys are the entry names in dotted syntax, and the
values are entry instances.
class intake.catalog.local.YAMLFilesCatalog(*args, **kwargs)

Catalog as described by a multiple YAML files

__init__(path, flatten=True, **kwargs)
Parameters
path: str

Location of the files to parse (can be remote), including possible glob (*) character(s). Can also be list of paths, without glob characters.

flatten: bool (True)

Whether to list all entries in the cats at the top level (True) or create sub-cats from each file (False).

reload()

Reload catalog if sufficient time has passed

walk(sofar=None, prefix=None, depth=2)

Get all entries in this catalog and sub-catalogs

Parameters
sofar: dict or None

Within recursion, use this dict for output

prefix: list of str or None

Names of levels already visited

depth: int

Number of levels to descend; needed to truncate circular references and for cleaner output

Returns
Dict where the keys are the entry names in dotted syntax, and the
values are entry instances.
class intake.catalog.zarr.ZarrGroupCatalog(*args, **kwargs)

A catalog of the members of a Zarr group.

__init__(urlpath, storage_options=None, component=None, metadata=None, consolidated=False, name=None)
Parameters
urlpathstr

Location of data file(s), possibly including protocol information

storage_optionsdict, optional

Passed on to storage backend for remote files

componentstr, optional

If None, build a catalog from the root group. If given, build the catalog from the group at this location in the hierarchy.

metadatadict, optional

Catalog metadata. If not provided, will be populated from Zarr group attributes.

consolidatedbool, optional

If True, assume Zarr metadata has been consolidated.

reload()

Reload catalog if sufficient time has passed

walk(sofar=None, prefix=None, depth=2)

Get all entries in this catalog and sub-catalogs

Parameters
sofar: dict or None

Within recursion, use this dict for output

prefix: list of str or None

Names of levels already visited

depth: int

Number of levels to descend; needed to truncate circular references and for cleaner output

Returns
Dict where the keys are the entry names in dotted syntax, and the
values are entry instances.