Access OffsetsDB Data#

OffsetsDB provides a detailed view of carbon offset credits and projects. You can access the data in various formats or directly through Python using our data package.

Important

By downloading or accessing the OffsetsDB data archives, you agree to the Terms of Data Access.

CSV & Parquet Zipped Files#

Download the latest version of OffsetsDB in CSV:

Download the latest version of OffsetsDB in Parquet:

Citation#

Please cite OffsetsDB as:

CarbonPlan (2024) “OffsetsDB” https://carbonplan.org/research/offsets-db

Accessing The Full Data Archive Through Python#

For more dynamic and programmatic access to OffsetsDB, you can use our Python data package. This package allows you to load and interact with the data directly in your Python environment. With the data package, you can access the data in a variety of formats including CSV (for raw data) and Parquet (for processed data).

Installation#

To get started, install the offsets_db_data package. Ensure you have Python installed on your system, and then run:

python -m pip install offsets-db-data

Using the Data Catalog#

Once installed, you can access the data through an Intake catalog. This catalog provides a high-level interface to the OffsetsDB datasets.

Loading the Catalog

import pandas as pd
pd.options.display.max_columns = 5
from offsets_db_data.data import catalog

# Display the catalog
print(catalog)
<Intake catalog: offsets_db_data>

Available Data#

The catalog includes different datasets, like credits and projects

Getting Descriptive Information About a Dataset#

You can get information about a dataset using the describe() method. For example, to get information about the ‘credits’ dataset:

catalog['credits'].describe()
{'name': 'credits',
 'container': 'dataframe',
 'plugin': ['parquet'],
 'driver': ['parquet'],
 'description': 'OffsetsDB processed and transformed data',
 'direct_access': 'forbid',
 'user_parameters': [{'name': 'date',
   'description': 'date of the data to load',
   'type': 'str',
   'default': '2024-02-13'}],
 'metadata': {},
 'args': {'urlpath': 's3://carbonplan-offsets-db/final/{{ date }}/credits-augmented.parquet',
  'storage_options': {'anon': True},
  'engine': 'fastparquet'}}

Accessing Specific Datasets#

You can access individual datasets within the catalog. For example, to access the ‘credits’ dataset:

# Access the 'credits' dataset
credits = catalog['credits']

# Read the data into a pandas DataFrame
credits_df = credits.read()
credits_df.head()
project_id quantity transaction_date transaction_type vintage
0 VCS1 12630 2009-03-26 00:00:00+00:00 issuance 2007
1 VCS1 9074 2014-01-21 00:00:00+00:00 issuance 2006
2 VCS10 153460 2009-04-22 00:00:00+00:00 issuance 2006
3 VCS10 368968 2009-04-22 00:00:00+00:00 issuance 2007
4 VCS10 505908 2009-04-22 00:00:00+00:00 issuance 2008

Similarly, to access the ‘projects’ dataset:

# Access the 'projects' dataset
projects = catalog['projects']

# Read the data into a pandas DataFrame
projects_df = projects.read()
projects_df.head()
category country ... retired status
0 [energy-efficiency] Madagascar ... 0 unknown
1 [energy-efficiency] Madagascar ... 0 unknown
2 [energy-efficiency] Madagascar ... 0 unknown
3 [energy-efficiency] Madagascar ... 0 unknown
4 [energy-efficiency] Madagascar ... 0 unknown

5 rows × 15 columns

Calling projects.read() and credits.read() without specifying a date, will return the data downloaded and processed on 2024-02-13.

To load data for a specific date, you can specify the date as a string in the format YYYY-MM-DD. For example:

projects_df = catalog['projects'](date='2024-02-07').read()
projects_df.head()
category country ... retired status
0 [forest] Peru ... 0 listed
1 [renewable-energy, ghg-management] China ... 0 listed
2 [unknown] Cameroon ... 0 listed
3 [forest] Kenya ... 0 listed
4 [agriculture] Brazil ... 0 listed

5 rows × 15 columns

Note

If you specify a date for which the data is not available, the package will raise a PermissionError: Access Denied.