Taking the pain out of data access and distribution
Intake is a lightweight package for finding, investigating, loading and disseminating data. It will appeal to different groups for some of the reasons below, but is useful for all and acts as a common platform that everyone can use to smooth the progression of data from developers and providers to users.
Intake contains the following main components. You do not need to use them all! The library is modular, only use the parts you need:
A set of data loaders (Drivers) with a common interface, so that you can investigate or load anything, from local or remote, with the exact same call, and turning into data structures that you already know how to manipulate, such as arrays and data-frames.
A Cataloging system (Catalogs) for listing data sources, their metadata and parameters, and referencing which of the Drivers should load each. The catalogs for a hierarchical, searchable structure, which can be backed by files, Intake servers or third-party data services
Sets of convenience functions to apply to various data sources, such as data-set persistence, automatic concatenation and metadata inference and the ability to distribute catalogs and data sources using simple packaging abstractions.
A GUI layer accessible in the Jupyter notebook or as a standalone webserver, which allows you to find and navigate catalogs, investigate data sources, and plot either predefined visualisations or interactively find the right view yourself
A client-server protocol to allow for arbitrary data cataloging services or to serve the data itself, with a pluggable auth model.
Intake loads the data for a range of formats and types (see Plugin Directory) into containers you already use, like Pandas dataframes, Python lists, NumPy arrays, and more
Intake loads, then gets out of your way
GUI search and introspect data-sets in Catalogs: quickly find what you need to do your work
Install data-sets and automatically get requirements
Leverage cloud resources and distributed computing.
See the executable tutorial:
Simple spec to define data sources
Single point of truth, no more copy&paste
Distribute data using packages, shared files or a server
Update definitions in-place
Parametrise user options
Make use of additional functionality like filename parsing and caching.
See the executable tutorial:
Create catalogs out of established departmental practices
Provide data access credentials via Intake parameters
Use server-client architecture as gatekeeper:
add authentication methods
add monitoring point; track the data-sets being accessed.
Hook Intake into proprietary data access systems.
The Start here document contains the sections that all users new to Intake should read through. Use Cases - I want to… shows specific problems that Intake solves. For a brief demonstration, which you can execute locally, go to Quickstart. For a general description of all of the components of Intake and how they fit together, go to Overview. Finally, for some notebooks using Intake and articles about Intake, go to Examples and intake-examples. These and other documentation pages will make reference to concepts that are defined in the Glossary.