Other Classes¶
Cache Types¶
|
Cache specific set of files |
|
Cache a complete directory tree |
|
Cache files extracted from downloaded compressed source |
|
Use the DAT protocol to replicate data |
|
Utility class for managing persistent metadata stored in the Intake config directory. |
- class intake.source.cache.FileCache(driver, spec, catdir=None, cache_dir=None, storage_options={})¶
Cache specific set of files
Input is a single file URL, URL with glob characters or list of URLs. Output is a specific set of local files.
- class intake.source.cache.DirCache(driver, spec, catdir=None, cache_dir=None, storage_options={})¶
Cache a complete directory tree
Input is a directory root URL, plus a
depth
parameter for how many levels of subdirectories to search. All regular files will be copied. Output is the resultant local directory tree.
- class intake.source.cache.CompressedCache(driver, spec, catdir=None, cache_dir=None, storage_options={})¶
Cache files extracted from downloaded compressed source
For one or more remote compressed files, downloads to local temporary dir and extracts all contained files to local cache. Input is URL(s) (including globs) pointing to remote compressed files, plus optional
decomp
, which is “infer” by default (guess from file extension) or one of the key strings inintake.source.decompress.decomp
. Optionalregex_filter
parameter is used to load only the extracted files that match the pattern. Output is the list of extracted files.
- class intake.source.cache.DATCache(driver, spec, catdir=None, cache_dir=None, storage_options={})¶
Use the DAT protocol to replicate data
For details of the protocol, see https://docs.datproject.org/ The executable
dat
must be available.Since in this case, it is not possible to access the remote files directly, this cache mechanism takes no parameters. The expectation is that the url passed by the driver is of the form:
dat://<dat hash>/file_pattern
where the file pattern will typically be a glob string like “*.json”.
- class intake.source.cache.CacheMetadata(*args, **kwargs)¶
Utility class for managing persistent metadata stored in the Intake config directory.
- keys() a set-like object providing a view on D's keys ¶
- pop(k[, d]) v, remove specified key and return the corresponding value. ¶
If key is not found, d is returned if given, otherwise KeyError is raised.
- update([E, ]**F) None. Update D from mapping/iterable E and F. ¶
If E present and has a .keys() method, does: for k in E: D[k] = E[k] If E present and lacks .keys() method, does: for (k, v) in E: D[k] = v In either case, this is followed by: for k, v in F.items(): D[k] = v
Auth¶
|
A very simple auth mechanism using a shared secret |
Matching client auth plugin to SecretAuth |
- class intake.auth.secret.SecretAuth(*args, **kwargs)¶
A very simple auth mechanism using a shared secret
- Parameters
- secret: str
The string that must be matched in the requests. If None, a random UUID is generated and logged.
- key: str
Header entry in which to seek the secret
- allow_access(header, source, catalog)¶
Is the given HTTP header allowed to access given data source
- Parameters
- header: dict
The HTTP header from the incoming request
- source: CatalogEntry
The data source the user wants to access.
- catalog: Catalog
The catalog object containing this data source.
- allow_connect(header)¶
Is the requests header given allowed to talk to the server
- Parameters
- header: dict
The HTTP header from the incoming request
- class intake.auth.secret.SecretClientAuth(secret, key='intake-secret')¶
Matching client auth plugin to SecretAuth
- Parameters
- secret: str
The string that must be included requests.
- key: str
HTTP Header key for the shared secret
- get_headers()¶
Returns a dictionary of HTTP headers for the remote catalog request.
Containers¶
Dataframe on an Intake server |
|
|
nd-array on an Intake server |
Sequence-of-things source on an Intake server |
- class intake.container.dataframe.RemoteDataFrame(*args, **kwargs)¶
Dataframe on an Intake server
- read()¶
Load entire dataset into a container and return it
- to_dask()¶
Return a dask container for this data source
- class intake.container.ndarray.RemoteArray(*args, **kwargs)¶
nd-array on an Intake server
- read()¶
Load entire dataset into a container and return it
- read_partition(i)¶
Return a part of the data corresponding to i-th partition.
By default, assumes i should be an integer between zero and npartitions; override for more complex indexing schemes.
- to_dask()¶
Return a dask container for this data source
Server¶
Main intake-server tornado application |
|
Basic info about the server |
|
Stores DataSources requested by some user |
|
Open or stream data source |
- class intake.cli.server.server.IntakeServer(catalog)¶
Main intake-server tornado application
- class intake.cli.server.server.ServerInfoHandler(application: Application, request: HTTPServerRequest, **kwargs: Any)¶
Basic info about the server
- initialize(cache, catalog, auth)¶
- class intake.cli.server.server.SourceCache¶
Stores DataSources requested by some user
- peek(uuid)¶
Get the source but do not change the last access time
- class intake.cli.server.server.ServerSourceHandler(application: Application, request: HTTPServerRequest, **kwargs: Any)¶
Open or stream data source
The requests “action” field (open|read) specified what the request wants to do. Open caches the source and created an ID for it, read uses that ID to reference the source and read a partition.
- get()¶
Access one source’s info.
This is for direct access to an entry by name for random access, which is useful to the client when the whole catalog has not first been listed and pulled locally (e.g., in the case of pagination).
- initialize(catalog, cache, auth)¶