API Reference
User Functions
|
Intake's dict-like config system |
|
Show which Intake data types can apply to the given details |
|
Create pipeline from given URL to desired output type |
|
Find possible conversion paths from start to end types |
|
A collection of data and reader descriptions. |
|
Defines some data: class and arguments. |
A serialisable description of a reader or pipeline |
|
Show which readers claim to support the given data instance or a superclass |
|
Attempt to construct a reader instance by finding one that matches the function call |
- class intake.config.Config(filename=None, **kwargs)
Intake’s dict-like config system
Instance
intake.conf
is globally used throughout the package- get(key, default=None)
Return the value for key if key is in the dictionary, else default.
- load(fn=None)
Update global config from YAML file
If fn is None, looks in global config directory, which is either defined by the INTAKE_CONF_DIR env-var or is ~/.intake/ .
- load_env()
Analyse environment variables and update conf accordingly
- reset()
Set conf values back to defaults
- save(fn=None)
Save current configuration to file as YAML
Uses
self.filename
for target location
- set(update_dict=None, **kw)
Change config values within a context or for the session
- values: dict
This can be deeply nested to set only leaf values
See also:
intake.readers.utils.nested_keys_to_dict
Examples
Value resets after context ends
>>> with intake.conf.set(mybval=5): ... ...
Set for whole session
>>> intake.conf.set(myval=5)
Set only a single leaf value within a nested dict
>>> intake.conf.set(intake.readers.utils.nested_keys_to_dict({"deep.2.key": True})
- intake.readers.datatypes.recommend(url: Optional[str] = None, mime: Optional[str] = None, head: bool = True, storage_options=None, ignore: Optional[set[str]] = None) set[intake.readers.datatypes.BaseData]
Show which Intake data types can apply to the given details
- Parameters
- url: str
Location of data
- mime: str
MIME type, usually “x/y” form
- head: bytes | bool | None
A small number of bytes from the file head, for seeking magic bytes. If it is True, fetch these bytes from th given URL/storage_options and use them. If None, only fetch bytes if there is no match by mime type or path, if False, don’t fetch at all.
- storage_options: dict | None
If passing a URL which might be a remote file, storage_options can be used by fsspec.
- ignore: set | None
Don’t include these in the output
- Returns
- set of matching datatype classes.
- intake.readers.convert.auto_pipeline(url: str | intake.readers.datatypes.BaseData, outtype: str | tuple[str], storage_options: Optional[dict] = None, avoid: Optional[list[str]] = None) Pipeline
Create pipeline from given URL to desired output type
Will search for the shortest conversion path from the inferred data-type to the output.
- Parameters
- url: input data, usually a location/URL, but maybe a data instance
- outtype: pattern to match to possible output types
- storage_options: if url is a remote str, these are kwargs that fsspec may need to
access it
- avoid: don’t consider readers whose names match any of these strings
- class intake.readers.entry.Catalog(entries: Optional[Union[Iterable[ReaderDescription], Mapping]] = None, aliases: Optional[dict[str, int]] = None, data: Optional[Union[Iterable[DataDescription], Mapping]] = None, user_parameters: Optional[dict[str, intake.readers.user_parameters.BaseUserParameter]] = None, parameter_overrides: Optional[dict[str, Any]] = None, metadata: Optional[dict] = None)
A collection of data and reader descriptions.
- add_entry(entry, name=None, clobber=True)
Add entry/reader (and its requirements) in-place, with optional alias
- delete(name, recursive=False)
Remove named entity (data/entry) from catalog
We do not check whether any other entity in the catalog refers to what is being deleted, so you can break other entries this way.
- Parameters
- recursive: bool
Also removed data/entries references by the given one, and those they refer to in turn.
- extract_parameter(item: str, name: str, path: ~typing.Optional[str] = None, value: ~typing.Optional[~typing.Any] = None, cls=<class 'intake.readers.user_parameters.SimpleUserParameter'>, store_to: ~typing.Optional[str] = None, **kw)
Descend into data & reader descriptions to create a user_parameter
There are two ways to fund and replace values by a template:
if
path
is given, the kwargs will be walked to this location e.g., “field.0.special_value” -> kwargs[“field”][0][“special_value”]if
value
is given, all kwargs will be recursively walked, looking for values that equal that given.
Matched values will be replaced by a template string like
"{name}"
, and a user_parameter of classcls
will be placed in the location given bystore_to
(could be “data”, “catalog”).
- classmethod from_dict(data)
Assemble catalog from dict representation
- static from_yaml_file(path: str, **kwargs)
Load YAML representation into a new Catalog instance
- storage_options:
kwargs to pass to fsspec for opening the file to read; can pass as storage_options= or will pick up any unused kwargs for simplicity
- get_entity(item: str)
Get the objects by reference
Use this method if you want to change the catalog in-place
item can be an entry in .aliases, in which case the original wil be returned, or a key in .entries, .user_parameters or .data. The entity in question is returned without processing.
- give_name(tok: str, name: str, clobber=True)
Give an alias to a dataset
- tok:
a key in the .entries dict
- move_parameter(from_entity: str, to_entity: str, parameter_name: str) Catalog
Move user-parameter from between entry/data
entity is an alias name or entry/data token
- promote_parameter_name(parameter_name: str, level: str = 'cat') Catalog
Find and promote given named parameter, assuming they are all identical
- parameter_name:
the key string referring to the parameter
- level: cat | data
If the parameter is found in a reader, it can be promoted to the data it depends on. Parameters in a data description can only be promoted to a catalog global.
- search(expr) Catalog
Make new catalog with a subset of this catalog
The new catalog will have those entries which pass the filter expr, which is an instance of intake.readers.search.BaseSearch (i.e., has a method like filter(entry) -> bool).
In the special case that expr is just a string, the Text search expression will be used.
- class intake.readers.entry.DataDescription(datatype: str, kwargs: Optional[dict] = None, metadata: Optional[dict] = None, user_parameters: Optional[dict] = None)
Defines some data: class and arguments. This may be laoded in a number of ways
A DataDescription normally resides in a Catalog, and can contain templated arguments. When there are user_parameters, these will also be applied to any reader that depends on this data.
- get_kwargs(user_parameters: Optional[dict[str | intake.readers.user_parameters.BaseUserParameter]] = None, **kwargs) dict[str, Any]
Get set of kwargs for given reader, based on prescription, new args and user parameters
Here, user_parameters is intended to come from the containing catalog. To provide values for a user parameter, include it by name in kwargs
- class intake.readers.entry.ReaderDescription(reader: str, kwargs: Optional[dict[str, Any]] = None, user_parameters: Optional[dict[str | intake.readers.user_parameters.BaseUserParameter]] = None, metadata: Optional[dict] = None, output_instance: Optional[str] = None)
A serialisable description of a reader or pipeline
This class is typically stored inside Catalogs, and can contain templated arguments which get evaluated at the time that it is accessed from a Catalog.
- check_imports()
Are the packages listed in the “imports” key of the metadata available?
- extract_parameter(name: str, path=None, value=None, cls=<class 'intake.readers.user_parameters.SimpleUserParameter'>, **kw)
Creates new version of the description
Creates new instance, since the token will in general change
- classmethod from_dict(data)
Recreate instance from the results of to_dict()
- get_kwargs(user_parameters=None, **kwargs) dict[str, Any]
Get set of kwargs for given reader, based on prescription, new args and user parameters
Here, user_parameters is intended to come from the containing catalog. To provide values for a user parameter, include it by name in kwargs
- to_cat(name=None)
Create a Catalog containing only this entry
- intake.readers.readers.recommend(data)
Show which readers claim to support the given data instance or a superclass
The ordering is more specific readers first
- intake.readers.readers.reader_from_call(func: str, *args, join_lines=False, **kwargs) BaseReader
Attempt to construct a reader instance by finding one that matches the function call
Fails for readers that don’t define a func, probably because it depends on the file type or needs a dynamic instance to be a method of.
- Parameters
- func: callable | str
If a callable, pass args and kwargs as you would have done to execute the function. If a string, it should look like
"func(arg1, args2, kwarg1, **kw)"
, i.e., a normal python call but as a string. In the latter case, args and kwargs are ignored
Base Classes
These may be subclassed by developers
|
Prototype dataset definition |
|
|
Converts from one object type to another |
|
A set of functions as an accessor on a Reader, producing a Pipeline |
|
Prototype for a single term in a search expression |
|
The base class allows for any default without checking/coercing |
- class intake.readers.datatypes.BaseData(metadata: Optional[dict[str, Any]] = None)
Prototype dataset definition
- auto_pipeline(outtype: str | tuple[str])
Find a pipeline to transform from this to the given output type
- magic: set[bytes | tuple] = {}
binary patterns, usually at the file head; each item identifies this data type
- property possible_outputs
Map of importable readers to the expected output class of each
- property possible_readers
List of reader classes for this type, grouped by importability
- to_entry()
Create DataDescription version of this, for placing in a Catalog
- to_reader(outtype: Optional[str] = None, reader: Optional[str] = None, **kw)
Find an appropriate reader for this data
If neither
outtype
orreader
is passed, the first importable reader will be picked.See also
.possible_outputs
- Parameters
- outtype: string to match against the output classes of potential readers
- reader: string to match against the class names of the readers
- class intake.readers.readers.BaseReader(*args, metadata: Optional[dict] = None, output_instance: Optional[str] = None, **kwargs)
- property data
The BaseData this reader depends on, if it has one
- discover(**kwargs)
Part of the data
The intent is to return a minimal dataset, but for some readers and conditions this may be up to the whole of the data. Output type is the same as for read().
- classmethod doc()
Doc associated with loading function
- implements: set[intake.readers.datatypes.BaseData] = {}
datatype(s) this applies to
- read(*args, **kwargs)
Produce data artefact
Any of the arguments encoded in the data instance can be overridden.
Output type is given by the .output_instance attribute
- to_cat(name=None)
Create a Catalog containing on this reader
- to_entry()
Create an entry version of this, ready to be inserted into a Catalog
- class intake.readers.convert.BaseConverter(*args, metadata: Optional[dict] = None, output_instance: Optional[str] = None, **kwargs)
Converts from one object type to another
Most often, subclasses call a single function on the data, but arbitrary complex transforms are possible. This is designed to be one step in a Pipeline.
.run() will be called on the output object from the previous stage, subclasses will wither override that, or just provide a func=.
- run(x, *args, **kwargs)
Execute a conversion stage on the output object from another stage
Subclasses may override this
- class intake.readers.namespaces.Namespace(reader)
A set of functions as an accessor on a Reader, producing a Pipeline
- class intake.readers.search.SearchBase
Prototype for a single term in a search expression
The method filter() is meant to be overridden in subclasses.
- filter(entry: ReaderDescription) bool
Does the given ReaderDescription entry match the query?
- class intake.readers.user_parameters.BaseUserParameter(default, description='')
The base class allows for any default without checking/coercing
- coerce(value)
Change given type to one that matches this parameter’s intent
- default
the value to use without user input
- description
what is the function of this parameter
- set_default(value)
Change the default, if it validates
- to_dict()
Dictionary representation of the instances contents
- validate(value) bool
Is the given value allowed by this parameter?
Exceptions are treated as False
- with_default(value)
A new instance with different default, if it validates
(original object is left unchanged)
Data Classes
|
Advanced Scientific Data Format |
|
Structured record passing file format |
|
Human-readable tabular format, Comma Separated Values |
|
Datatypes that are groupings of other data |
|
An API endpoint capable of describing Intake catalogs |
|
Intake catalog expressed as YAML |
|
Imaging data usually from medical scans |
|
Indexed set of parquet files with servioning and diffs |
|
The well-known spreadsheet app's file format |
|
Tabular or array data in text/binary format common in astronomy |
|
Deprecated tabular format from the Arrow project |
|
Tabular format based on Arrow IPC |
|
Datatypes loaded from files, local or remote |
|
One of the filetpes at https://gdal.org/drivers/raster/index.html |
|
One of the filetypes at https://gdal.org/drivers/vector/index.html |
|
"Gridded" file format commonly used in meteo forecasting |
|
Geo data (position and geometries) within JSON |
|
Geo data (position and geometries) in a SQLite DB file |
|
Hierarchical tree of ND-arrays, widely used scientific file format |
|
An identifier registered on handle registry |
|
|
|
Indexed set of parquet files with servioning and diffs |
|
Image format with good compression for the internet |
|
Nested record format as readable text, very common over HTTP |
|
Keras model parameter set |
|
A value that can be embedded directly to YAML (text, dict, list) |
|
A single array in a .mat file |
|
Text format for sparse array |
|
Collection of ND-arrays with coordinates, scientific file format |
|
Medical imaging or volume data file |
|
Simple array format |
|
Columnar-optimized tabular binary file format |
|
Earth-science oriented searchable HTTP API |
|
Portable Network Graphics, common image format |
|
Column-optimized binary format |
|
Python pickle, arbitrary serialized object |
|
Monitoring metric query service |
|
Source code file |
|
A C or FORTRAN N-dimensional array buffer without metadata |
|
Serialized model made by sklearn |
|
Query on a database-like service |
|
Database data stored in files |
|
Data assets related to geo data, either as static JSON or a searchable API |
|
Datatypes loaded from some service |
|
Geo data (position and geometries) in a set of related binary files |
|
Tensorflow record file, ready for machine learning |
|
Datasets on a THREDDS server |
|
Image format commonly used for large data |
|
Any text file |
|
Service exposing versioned, chunked and potentially sparse arrays |
|
Data access service for data-aware portals and data science tools |
|
|
|
Waveform/sound file |
|
Extensible Markup Language file |
|
Human-readable JSON/object-like format |
|
Cloud optimised, chunked N-dimensional file format |
Reader Classes
Includes readers, transformers, converters and output classes.
|
Finds the earthdata datasets that contain some data in the given query bounds |
|
Read particular earthdata dataset by ID and parameter bounds |
|
Datasets from HuggingfaceHub |
|
Example datasets from sklearn.datasets |
|
Uses SQLAlchemy to get the list of tables at some SQL URL |
|
Searches stacindex.org for known public STAC data sources |
|
Create a Catalog from a STAC endpoint or file |
|
Get stac objects matching a search spec from a STAC endpoint |
|
Reimplementation of "StackBandsSource" from intake-stac |
|
Read from THREDDS endpoint |
|
Datasets from the TensorFlow public registry |
|
Creates a catalog of Tiled datasets from a root URL |
|
Standard example PyTorch datasets |
|
|
Converts from one object type to another |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Call given arbitrary function |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Holds a list of transforms/conversions to be enacted in sequence |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Implemented only if an attribute was not already chosen. |
|
|
|
|
|
|
|
|
|
|
|
|
|
creates one of several output file types |
|
Take a matplotlib figure and save to PNG file |
|
Save a single array into a single binary file |
|
|
|
|
|
|
|
|
|
good for including "peek" at data in entries' metadata |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Requires a directory with .npy files and an "info" pickle file |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The contents of file(s) as bytes |
|
|
|
Convenience superclass for readers of files |
|
|
|
|
|
Dereference handle (hdl:) identifiers |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Retry (part of) a pipeline until it returns without exception |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Equivalent of x[item] |
|
Call named method on object |
|
|
|
|
|