Project for creating and connecting data catalogues
Intake solves a related set of problems:
Python API standards for loading data (such as DB-API 2.0) are optimized for transactional databases and query results that are processed one row at a time.
Libraries that do load data in bulk tend to each have their own API for doing so, which adds friction when switching data formats.
Loading data into a distributed data structure (like those found in Dask and Spark) often requires writing a separate loader.
Abstractions often focus on just one data model (tabular, n-dimensional array, or semi-structured), when many projects need to work with multiple kinds of data.
Intake has the explicit goal of not defining a computational expression system. Intake plugins load the data into containers (e.g., arrays or data-frames) that provide their data processing features. As a result, it is very easy to make a new Intake plugin with a relatively small amount of Python.
Recommended by loleg