Resource

Intake

Project for creating and connecting data catalogues

Intake solves a related set of problems:

Python API standards for loading data (such as DB-API 2.0) are optimized for transactional databases and query results that are processed one row at a time.

Libraries that do load data in bulk tend to each have their own API for doing so, which adds friction when switching data formats.

Loading data into a distributed data structure (like those found in Dask and Spark) often requires writing a separate loader.

Abstractions often focus on just one data model (tabular, n-dimensional array, or semi-structured), when many projects need to work with multiple kinds of data.

Intake has the explicit goal of not defining a computational expression system. Intake plugins load the data into containers (e.g., arrays or data-frames) that provide their data processing features. As a result, it is very easy to make a new Intake plugin with a relatively small amount of Python.

Recommended by loleg

Components Back to the future