Base Classes
This is a reference API class listing, useful mainly for developers.
|
An object which can produce data |
|
A Data Source will all optional functionality |
|
Manages a hierarchy of data sources as a collective unit. |
|
A single item appearing in a catalog |
|
A user-settable item that is passed to a DataSource upon instantiation. |
|
Refer to another named source, unmodified |
- class intake.source.base.DataSource(*args, **kwargs)
A Data Source will all optional functionality
When subclassed, child classes will have the base data source functionality, plus caching, plotting and persistence abilities.
- class intake.catalog.Catalog(*args, **kwargs)
Manages a hierarchy of data sources as a collective unit.
A catalog is a set of available data sources for an individual entity (remote server, local file, or a local directory of files). This can be expanded to include a collection of subcatalogs, which are then managed as a single unit.
A catalog is created with a single URI or a group of URIs. A URI can either be a URL or a file path.
Each catalog in the hierarchy is responsible for caching the most recent refresh time to prevent overeager queries.
- Attributes
- metadatadict
Arbitrary information to carry along with the data source specs.
- configure_new(**kwargs)
Create a new instance of this source with altered arguments
Enables the picking of options and re-evaluating templates from any user-parameters associated with this source, or overriding any of the init arguments.
Returns a new data source instance. The instance will be recreated from the original entry definition in a catalog if this source was originally created from a catalog.
- discover()
Open resource and populate the source attributes.
- filter(func)
Create a Catalog of a subset of entries based on a condition
Warning
This function operates on CatalogEntry objects not DataSource objects.
Note
Note that, whatever specific class this is performed on, the return instance is a Catalog. The entries are passed unmodified, so they will still reference the original catalog instance and include its details such as directory,.
- Parameters
- funcfunction
This should take a CatalogEntry and return True or False. Those items returning True will be included in the new Catalog, with the same entry names
- Returns
- Catalog
New catalog with Entries that still refer to their parents
- force_reload()
Imperative reload data now
- classmethod from_dict(entries, **kwargs)
Create Catalog from the given set of entries
- Parameters
- entriesdict-like
A mapping of name:entry which supports dict-like functionality, e.g., is derived from
collections.abc.Mapping
.- kwargspassed on the constructor
Things like metadata, name; see
__init__
.
- Returns
- Catalog instance
- get(**kwargs)
Create a new instance of this source with altered arguments
Enables the picking of options and re-evaluating templates from any user-parameters associated with this source, or overriding any of the init arguments.
Returns a new data source instance. The instance will be recreated from the original entry definition in a catalog if this source was originally created from a catalog.
- items()
Get an iterator over (key, source) tuples for the catalog entries.
- keys()
Entry names in this catalog as an iterator (alias for __iter__)
- pop(key)
Remove entry from catalog and return it
This relies on the _entries attribute being mutable, which it normally is. Note that if a catalog automatically reloads, any entry removed here may soon reappear
- Parameters
- keystr
Key to give the entry in the cat
- reload()
Reload catalog if sufficient time has passed
- save(url, storage_options=None)
Output this catalog to a file as YAML
- Parameters
- urlstr
Location to save to, perhaps remote
- storage_optionsdict
Extra arguments for the file-system
- serialize()
Produce YAML version of this catalog.
Note that this is not the same as
.yaml()
, which produces a YAML block referring to this catalog.
- values()
Get an iterator over the sources for catalog entries.
- walk(sofar=None, prefix=None, depth=2)
Get all entries in this catalog and sub-catalogs
- Parameters
- sofar: dict or None
Within recursion, use this dict for output
- prefix: list of str or None
Names of levels already visited
- depth: int
Number of levels to descend; needed to truncate circular references and for cleaner output
- Returns
- Dict where the keys are the entry names in dotted syntax, and the
- values are entry instances.
- class intake.catalog.entry.CatalogEntry(*args, **kwargs)
A single item appearing in a catalog
This is the base class, used by local entries (i.e., read from a YAML file) and by remote entries (read from a server).
- describe()
Get a dictionary of attributes of this entry.
- Returns: dict with keys
- name: str
The name of the catalog entry.
- containerstr
kind of container used by this data source
- descriptionstr
Markdown-friendly description of data source
- direct_accessstr
Mode of remote access: forbid, allow, force
- user_parameterslist[dict]
List of user parameters defined by this entry
- get(**user_parameters)
Open the data source.
Equivalent to calling the catalog entry like a function.
- Parameters
- user_parametersdict
Values for user-configurable parameters for this data source
- Returns
- DataSource
- property plots
List custom associated quick-plots
- class intake.catalog.local.UserParameter(*args, **kwargs)
A user-settable item that is passed to a DataSource upon instantiation.
For string parameters, default may include special functions
func(args)
, which may be expanded from environment variables or by executing a shell command.- Parameters
- name: str
the key that appears in the DataSource argument strings
- description: str
narrative text
- type: str
one of list
(COERSION_RULES)
- default: type value
same type as
type
. It a str, may include special functions env, shell, client_env, client_shell.- min, max: type value
for validation of user input
- allowed: list of type
for validation of user input
- describe()
Information about this parameter
- expand_defaults(client=False, getenv=True, getshell=True)
Compile env, client_env, shell and client_shell commands
- validate(value)
Does value meet parameter requirements?
- class intake.source.derived.AliasSource(*args, **kwargs)
Refer to another named source, unmodified
The purpose of an Alias is to be able to refer to other source(s) in the same catalog or an external catalog, perhaps leaving the choice of which target to load up to the user. This source makes no sense outside of a catalog.
The “target” for an aliased data source will normally be a string. In the simple case, it is the name of a data source in the same catalog. However, we use the syntax “catalog:source” to refer to sources in other catalogs, where the part before “:” will be passed to intake.open_catalog, together with any keyword arguments from cat_kwargs.
In this case, the output of the target source is not modified, but this class acts as a prototype ‘derived’ source for processing the output of some standard driver.
After initial discovery, the source’s container and other details will be updated from the target; initially, the AliasSource container is not any standard.
- discover()
Open resource and populate the source attributes.
- read()
Load entire dataset into a container and return it
- read_chunked()
Return iterator over container fragments of data source
- read_partition(i)
Return a part of the data corresponding to i-th partition.
By default, assumes i should be an integer between zero and npartitions; override for more complex indexing schemes.
- to_dask()
Return a dask container for this data source
- class intake.source.base.Schema