Taking the pain out of data access and distribution
Intake is a lightweight package for finding, investigating, loading and disseminating data. It will appeal to different groups for some of the reasons below, but is useful for all and acts as a common platform that everyone can use to smooth the progression of data from developers and providers to users.
- Intake loads the data for a range of formats and types (see Plugin Directory) into containers you already use, like Pandas dataframes, Python lists, NumPy arrays, and more
- Intake loads, and gets out of your way
- GUI, search and introspect data-sets in Catalogs: quickly find what you need to do your work
- Install data-sets and automatically get requirements
- Leverage cloud resources and distributed computing.
- Simple spec to define data sources
- Single point-of truth, no more copy&paste
- Distribute data using packages, shared files or a server
- Update definitions in-place
- Parametrise user options
- Make use of additional functionality like filename parsing and caching.
Create catalogs out of established departmental practices
Provide data access credentials via Intake parameters
Use server-client architecture as gatekeeper:
- add authentication methods
- add monitoring point; track the data-sets being accessed.
Hook Intake into proprietary data access systems.
For a brief demonstration and tutorial, which you can execute locally, go to Quickstart. For a general description of all of the components of Intake and how they fit together, fo to Overview. Finally, for some notebooks using Intake and articles about Intake, go to Examples. These and other documentation pages will make reference to concepts that are defined in the Glossary.