Base Classes

This is a reference API class listing, useful mainly for developers.

intake.source.base.DataSourceBase(*args, ...)

An object which can produce data

intake.source.base.DataSource(*args, **kwargs)

A Data Source will all optional functionality

intake.source.base.PatternMixin()

Helper class to provide file-name parsing abilities to a driver class

intake.container.base.RemoteSource(*args, ...)

Base class for all DataSources living on an Intake server

intake.catalog.Catalog(*args, **kwargs)

Manages a hierarchy of data sources as a collective unit.

intake.catalog.entry.CatalogEntry(*args, ...)

A single item appearing in a catalog

intake.catalog.local.UserParameter(*args, ...)

A user-settable item that is passed to a DataSource upon instantiation.

intake.auth.base.BaseAuth(*args, **kwargs)

Base class for authorization

intake.source.cache.BaseCache(driver, spec)

Provides utilities for managing cached data files.

intake.source.derived.AliasSource(*args, ...)

Refer to another named source, unmodified

intake.source.base.Schema(**kwargs)

Holds details of data description for any type of data-source

intake.container.persist.PersistStore(*args, ...)

Specialised catalog for persisted data-sources

class intake.source.base.DataSource(*args, **kwargs)

A Data Source will all optional functionality

When subclassed, child classes will have the base data source functionality, plus caching, plotting and persistence abilities.

plot

Accessor for HVPlot methods. See Plotting for more details.

class intake.catalog.Catalog(*args, **kwargs)

Manages a hierarchy of data sources as a collective unit.

A catalog is a set of available data sources for an individual entity (remote server, local file, or a local directory of files). This can be expanded to include a collection of subcatalogs, which are then managed as a single unit.

A catalog is created with a single URI or a group of URIs. A URI can either be a URL or a file path.

Each catalog in the hierarchy is responsible for caching the most recent refresh time to prevent overeager queries.

Attributes
metadatadict

Arbitrary information to carry along with the data source specs.

configure_new(**kwargs)

Create a new instance of this source with altered arguments

Enables the picking of options and re-evaluating templates from any user-parameters associated with this source, or overriding any of the init arguments.

Returns a new data source instance. The instance will be recreated from the original entry definition in a catalog if this source was originally created from a catalog.

discover()

Open resource and populate the source attributes.

filter(func)

Create a Catalog of a subset of entries based on a condition

Warning

This function operates on CatalogEntry objects not DataSource objects.

Note

Note that, whatever specific class this is performed on, the return instance is a Catalog. The entries are passed unmodified, so they will still reference the original catalog instance and include its details such as directory,.

Parameters
funcfunction

This should take a CatalogEntry and return True or False. Those items returning True will be included in the new Catalog, with the same entry names

Returns
Catalog

New catalog with Entries that still refer to their parents

force_reload()

Imperative reload data now

classmethod from_dict(entries, **kwargs)

Create Catalog from the given set of entries

Parameters
entriesdict-like

A mapping of name:entry which supports dict-like functionality, e.g., is derived from collections.abc.Mapping.

kwargspassed on the constructor

Things like metadata, name; see __init__.

Returns
Catalog instance
get(**kwargs)

Create a new instance of this source with altered arguments

Enables the picking of options and re-evaluating templates from any user-parameters associated with this source, or overriding any of the init arguments.

Returns a new data source instance. The instance will be recreated from the original entry definition in a catalog if this source was originally created from a catalog.

property gui

Source GUI, with parameter selection and plotting

items()

Get an iterator over (key, source) tuples for the catalog entries.

keys()

Entry names in this catalog as an iterator (alias for __iter__)

pop(key)

Remove entry from catalog and return it

This relies on the _entries attribute being mutable, which it normally is. Note that if a catalog automatically reloads, any entry removed here may soon reappear

Parameters
keystr

Key to give the entry in the cat

reload()

Reload catalog if sufficient time has passed

save(url, storage_options=None)

Output this catalog to a file as YAML

Parameters
urlstr

Location to save to, perhaps remote

storage_optionsdict

Extra arguments for the file-system

serialize()

Produce YAML version of this catalog.

Note that this is not the same as .yaml(), which produces a YAML block referring to this catalog.

values()

Get an iterator over the sources for catalog entries.

walk(sofar=None, prefix=None, depth=2)

Get all entries in this catalog and sub-catalogs

Parameters
sofar: dict or None

Within recursion, use this dict for output

prefix: list of str or None

Names of levels already visited

depth: int

Number of levels to descend; needed to truncate circular references and for cleaner output

Returns
Dict where the keys are the entry names in dotted syntax, and the
values are entry instances.
class intake.catalog.entry.CatalogEntry(*args, **kwargs)

A single item appearing in a catalog

This is the base class, used by local entries (i.e., read from a YAML file) and by remote entries (read from a server).

describe()

Get a dictionary of attributes of this entry.

Returns: dict with keys
name: str

The name of the catalog entry.

containerstr

kind of container used by this data source

descriptionstr

Markdown-friendly description of data source

direct_accessstr

Mode of remote access: forbid, allow, force

user_parameterslist[dict]

List of user parameters defined by this entry

get(**user_parameters)

Open the data source.

Equivalent to calling the catalog entry like a function.

Note: entry(), entry.attr, entry[item] check for persisted sources, but directly calling .get() will always ignore the persisted store (equivalent to self._pmode=='never').

Parameters
user_parametersdict

Values for user-configurable parameters for this data source

Returns
DataSource
property has_been_persisted

For the source created with the given args, has it been persisted?

property plots

List custom associated quick-plots

class intake.container.base.RemoteSource(*args, **kwargs)

Base class for all DataSources living on an Intake server

to_dask()

Return a dask container for this data source

class intake.catalog.local.UserParameter(*args, **kwargs)

A user-settable item that is passed to a DataSource upon instantiation.

For string parameters, default may include special functions func(args), which may be expanded from environment variables or by executing a shell command.

Parameters
name: str

the key that appears in the DataSource argument strings

description: str

narrative text

type: str

one of list (COERSION_RULES)

default: type value

same type as type. It a str, may include special functions env, shell, client_env, client_shell.

min, max: type value

for validation of user input

allowed: list of type

for validation of user input

describe()

Information about this parameter

expand_defaults(client=False, getenv=True, getshell=True)

Compile env, client_env, shell and client_shell commands

validate(value)

Does value meet parameter requirements?

class intake.auth.base.BaseAuth(*args, **kwargs)

Base class for authorization

Subclass this and override the methods to implement a new type of auth.

This basic class allows all access.

allow_access(header, source, catalog)

Is the given HTTP header allowed to access given data source

Parameters
header: dict

The HTTP header from the incoming request

source: CatalogEntry

The data source the user wants to access.

catalog: Catalog

The catalog object containing this data source.

allow_connect(header)

Is the requests header given allowed to talk to the server

Parameters
header: dict

The HTTP header from the incoming request

get_case_insensitive(dictionary, key, default=None)

Case-insensitive search of a dictionary for key.

Returns the value if key match is found, otherwise default.

class intake.source.cache.BaseCache(driver, spec, catdir=None, cache_dir=None, storage_options={})

Provides utilities for managing cached data files.

Providers of caching functionality should derive from this, and appear as entries in registry. The principle methods to override are _make_files() and _load() and _from_metadata().

clear_all()

Clears all cache and metadata.

clear_cache(urlpath)

Clears cache and metadata for a given urlpath.

Parameters
urlpath: str, location of data

May be a local path, or remote path if including a protocol specifier such as 's3://'. May include glob wildcards.

get_metadata(urlpath)
Parameters
urlpath: str, location of data

May be a local path, or remote path if including a protocol specifier such as 's3://'. May include glob wildcards.

Returns
Metadata (dict) about a given urlpath.
load(urlpath, output=None, **kwargs)

Downloads data from a given url, generates a hashed filename, logs metadata, and caches it locally.

Parameters
urlpath: str, location of data

May be a local path, or remote path if including a protocol specifier such as 's3://'. May include glob wildcards.

output: bool

Whether to show progress bars; turn off for testing

Returns
List of local cache_paths to be opened instead of the remote file(s). If
caching is disable, the urlpath is returned.
class intake.source.derived.AliasSource(*args, **kwargs)

Refer to another named source, unmodified

The purpose of an Alias is to be able to refer to other source(s) in the same catalog or an external catalog, perhaps leaving the choice of which target to load up to the user. This source makes no sense outside of a catalog.

The “target” for an aliased data source will normally be a string. In the simple case, it is the name of a data source in the same catalog. However, we use the syntax “catalog:source” to refer to sources in other catalogs, where the part before “:” will be passed to intake.open_catalog, together with any keyword arguments from cat_kwargs.

In this case, the output of the target source is not modified, but this class acts as a prototype ‘derived’ source for processing the output of some standard driver.

After initial discovery, the source’s container and other details will be updated from the target; initially, the AliasSource container is not any standard.

discover()

Open resource and populate the source attributes.

read()

Load entire dataset into a container and return it

read_chunked()

Return iterator over container fragments of data source

read_partition(i)

Return a part of the data corresponding to i-th partition.

By default, assumes i should be an integer between zero and npartitions; override for more complex indexing schemes.

to_dask()

Return a dask container for this data source

class intake.source.base.PatternMixin

Helper class to provide file-name parsing abilities to a driver class

class intake.source.base.Schema(**kwargs)

Holds details of data description for any type of data-source

This should always be pickleable, so that it can be sent from a server to a client, and contain all information needed to recreate a RemoteSource on the client.

class intake.container.persist.PersistStore(*args, **kwargs)

Specialised catalog for persisted data-sources

add(key, source)

Add the persisted source to the store under the given key

keystr

The unique token of the un-persisted, original source

sourceDataSource instance

The thing to add to the persisted catalogue, referring to persisted data

backtrack(source)

Given a unique key in the store, recreate original source

get_tok(source)

Get string token from object

Strings are assumed to already be a token; if source or entry, see if it is a persisted thing (“original_tok” is in its metadata), else generate its own token.

needs_refresh(source)

Has the (persisted) source expired in the store

Will return True if the source is not in the store at all, if it’s TTL is set to None, or if more seconds have passed than the TTL.

refresh(key)

Recreate and re-persist the source for the given unique ID

remove(source, delfiles=True)

Remove a dataset from the persist store

sourcestr or DataSource or Lo

If a str, this is the unique ID of the original source, which is the key of the persisted dataset within the store. If a source, can be either the original or the persisted source.

delfilesbool

Whether to remove the on-disc artifact