API Reference#
This page provides an autogenerated summary of offsets-db-data’s API. For more details and examples, refer to the relevant chapters in the main part of teh documentation.
Registry Specific Functions#
The following functions are specific to a given registry and are grouped under each registry’s module. We currently support the following registries:
APX registries
Verra#
- offsets_db_data.vcs.add_vcs_compliance_projects(df: DataFrame) DataFrame [source]#
Add details about two compliance projects to projects database.
- Parameters:
df (pd.DataFrame) – A pandas DataFrame containing project data with a ‘project_id’ column.
- Returns:
df – A pandas DataFrame with two additional rows, describing two projects from the mostly unused Verra compliance registry portal.
- Return type:
pd.DataFrame
- offsets_db_data.vcs.add_vcs_project_id(df: DataFrame) DataFrame [source]#
Add a prefix ‘VCS’ to each project ID in the DataFrame.
- Parameters:
df (pd.DataFrame) – Input DataFrame with Verra project data.
- Returns:
DataFrame with updated ‘project_id’ column, containing the prefixed project IDs.
- Return type:
pd.DataFrame
- offsets_db_data.vcs.add_vcs_project_url(df: DataFrame) DataFrame [source]#
Create a URL for each project based on its Verra project ID.
- Parameters:
df (pd.DataFrame) – Input DataFrame with Verra project data.
- Returns:
DataFrame with a new ‘project_url’ column, containing the generated URLs for each project.
- Return type:
pd.DataFrame
- offsets_db_data.vcs.calculate_vcs_issuances(df: DataFrame) DataFrame [source]#
Logic to calculate verra transactions from prepocessed transaction data
Verra allows rolling/partial issuances. This requires inferring vintage issuance from Total Vintage Quantity
- Parameters:
df (pd.DataFrame) – Input DataFrame with preprocessed transaction data.
- Returns:
DataFrame containing only issuance transactions with deduplicated and renamed columns.
- Return type:
pd.DataFrame
- offsets_db_data.vcs.calculate_vcs_retirements(df: DataFrame) DataFrame [source]#
Calculate retirements and cancellations for Verra transactions. The data does not allow distinguishing between retirements and cancellations.
- Parameters:
df (pd.DataFrame) – Input DataFrame with Verra transaction data.
- Returns:
DataFrame containing only retirement transactions with renamed columns.
- Return type:
pd.DataFrame
- offsets_db_data.vcs.determine_vcs_transaction_type(df: DataFrame, *, date_column: str) DataFrame [source]#
Determine the transaction type for Verra transactions based on a specified date column. Transactions with non-null date values are labeled as ‘retirement’, else as ‘issuance’.
- Parameters:
df (pd.DataFrame) – Input DataFrame with transaction data.
date_column (str) – Name of the column in the DataFrame used to determine the transaction type.
- Returns:
DataFrame with a new ‘transaction_type’ column indicating the type of each transaction.
- Return type:
pd.DataFrame
- offsets_db_data.vcs.generate_vcs_project_ids(df: DataFrame, *, prefix: str) DataFrame [source]#
Generate Verra project IDs by concatenating a specified prefix with the ‘ID’ column of the DataFrame.
- Parameters:
df (pd.DataFrame) – Input DataFrame containing Verra project data.
prefix (str) – Prefix string to prepend to each project ID.
- Returns:
DataFrame with a new ‘project_id’ column, containing the generated project IDs.
- Return type:
pd.DataFrame
- offsets_db_data.vcs.process_vcs_credits(df: DataFrame, *, download_type: str = 'transactions', registry_name: str = 'verra', prefix: str = 'VCS', arb: DataFrame | None = None) DataFrame [source]#
Process Verra credits data, including generation of project IDs, determination of transaction types, setting transaction dates, and various data transformations and validations.
- Parameters:
df (pd.DataFrame) – Input DataFrame with raw credits data.
download_type (str, optional) – Type of download operation performed (default is ‘transactions’).
registry_name (str, optional) – Name of the registry (default is ‘verra’).
prefix (str, optional) – Prefix for generating project IDs (default is ‘VCS’).
arb (pd.DataFrame | None, optional) – DataFrame for additional data merging (default is None).
- Returns:
Processed DataFrame with Verra credits data.
- Return type:
pd.DataFrame
- offsets_db_data.vcs.process_vcs_projects(df: DataFrame, *, credits: DataFrame, registry_name: str = 'verra', download_type: str = 'projects') DataFrame [source]#
Process Verra projects data, including renaming, adding, and validating columns, and merging with credits data.
- Parameters:
df (pd.DataFrame) – Input DataFrame with raw projects data.
credits (pd.DataFrame) – DataFrame containing credits data for merging.
registry_name (str, optional) – Name of the registry (default is ‘verra’).
download_type (str, optional) – Type of download operation performed (default is ‘projects’).
- Returns:
Processed DataFrame with harmonized and validated Verra projects data.
- Return type:
pd.DataFrame
- offsets_db_data.vcs.set_vcs_transaction_dates(df: DataFrame, *, date_column: str, fallback_column: str) DataFrame [source]#
Set the transaction dates in a DataFrame, using a primary date column and a fallback column.
- Parameters:
- Returns:
DataFrame with a new ‘transaction_date’ column, containing the determined dates.
- Return type:
pd.DataFrame
- offsets_db_data.vcs.set_vcs_vintage_year(df: DataFrame, *, date_column: str) DataFrame [source]#
Set the vintage year for Verra transactions based on a date column formatted as ‘%d/%m/%Y’.
- Parameters:
df (pd.DataFrame) – Input DataFrame with transaction data.
date_column (str) – Name of the column containing date information to extract the vintage year from.
- Returns:
DataFrame with a new ‘vintage’ column, containing the vintage year of each transaction.
- Return type:
pd.DataFrame
Gold Standard#
- offsets_db_data.gld.add_gld_project_id(df: DataFrame, *, prefix: str) DataFrame [source]#
Add Gold Standard project IDs to the DataFrame
- Parameters:
df (pd.DataFrame) – Input DataFrame containing credits data.
prefix (str) – Prefix string to prepend to each project ID.
- Returns:
DataFrame with a new ‘project_id’ column, containing the generated project IDs.
- Return type:
pd.DataFrame
- offsets_db_data.gld.add_gld_project_url(df: DataFrame) DataFrame [source]#
Add url for gold standard projects
gs project ids are different from the id used in gold standard urls.
- Parameters:
df (pd.DataFrame) – Input DataFrame containing Gold Standard project data.
- Returns:
DataFrame with a new ‘project_url’ column, containing URLs for each project.
- Return type:
pd.DataFrame
- offsets_db_data.gld.determine_gld_transaction_type(df: DataFrame, *, download_type: str) DataFrame [source]#
Assign a transaction type to each record in the DataFrame based on the download type for Gold Standard transactions.
- Parameters:
df (pd.DataFrame) – Input DataFrame containing transaction data.
download_type (str) – Type of transaction (‘issuances’, ‘retirements’) to determine the transaction type.
- Returns:
DataFrame with a new ‘transaction_type’ column, containing assigned transaction types based on download_type.
- Return type:
pd.DataFrame
- offsets_db_data.gld.process_gld_credits(df: DataFrame, *, download_type: str, registry_name: str = 'gold-standard', prefix: str = 'GLD', arb: DataFrame | None = None) DataFrame [source]#
Process Gold Standard credits data by renaming columns, setting registry, determining transaction types, adding project IDs, converting date columns, aggregating issuances (if applicable), and validating the schema.
- Parameters:
df (pd.DataFrame) – Input DataFrame with raw Gold Standard credits data.
download_type (str) – Type of download (‘issuances’ or ‘retirements’).
registry_name (str, optional) – Name of the registry for setting and mapping columns (default is ‘gold-standard’).
prefix (str, optional) – Prefix for generating project IDs (default is ‘GLD’).
arb (pd.DataFrame | None, optional) – Additional DataFrame for data merging (default is None).
- Returns:
Processed DataFrame with Gold Standard credits data.
- Return type:
pd.DataFrame
- offsets_db_data.gld.process_gld_projects(df: DataFrame, *, credits: DataFrame, registry_name: str = 'gold-standard', prefix: str = 'GLD') DataFrame [source]#
Process Gold Standard projects data, including renaming, adding, and validating columns, harmonizing statuses, and merging with credits data.
- Parameters:
df (pd.DataFrame) – Input DataFrame with raw Gold Standard projects data.
credits (pd.DataFrame) – DataFrame containing credits data for merging.
registry_name (str, optional) – Name of the registry for specific processing steps (default is ‘gold-standard’).
prefix (str, optional) – Prefix for generating project IDs (default is ‘GLD’).
- Returns:
Processed DataFrame with harmonized and validated Gold Standard projects data.
- Return type:
pd.DataFrame
APX Registries#
Functionality for APX registries is currently grouped under the `apx`` module.
- offsets_db_data.apx.add_project_url(df: DataFrame, *, registry_name: str) DataFrame [source]#
Add a project URL to each record in the DataFrame based on the registry name and project ID.
- Parameters:
df (pd.DataFrame) – Input DataFrame containing project data.
registry_name (str) – Name of the registry (‘american-carbon-registry’, ‘climate-action-reserve’, ‘art-trees’).
- Returns:
DataFrame with a new ‘project_url’ column, containing URLs for each project.
- Return type:
pd.DataFrame
- offsets_db_data.apx.determine_transaction_type(df: DataFrame, *, download_type: str) DataFrame [source]#
Assign a transaction type to each record in the DataFrame based on the download type.
- Parameters:
df (pd.DataFrame) – Input DataFrame containing transaction data.
download_type (str) – Type of transaction (‘issuances’, ‘retirements’, ‘cancellations’) to determine the transaction type.
- Returns:
DataFrame with a new ‘transaction_type’ column, containing assigned transaction types based on download_type.
- Return type:
pd.DataFrame
- offsets_db_data.apx.harmonize_acr_status(row: Series) str [source]#
Derive single project status for CAR and ACR projects
Raw CAR and ACR data has two status columns – one for compliance status, one for voluntary. Handle and harmonize.
- Parameters:
row (pd.Series) – A row from a pandas DataFrame
- Returns:
value – The status of the project
- Return type:
- offsets_db_data.apx.process_apx_credits(df: DataFrame, *, download_type: str, registry_name: str, arb: DataFrame | None = None) DataFrame [source]#
Process APX credits data by setting registry, determining transaction types, renaming columns, converting date columns, aggregating issuances (if applicable), and validating the schema.
- Parameters:
df (pd.DataFrame) – Input DataFrame with raw APX credits data.
download_type (str) – Type of download (‘issuances’, ‘retirements’, etc.).
registry_name (str) – Name of the registry for setting and mapping columns.
arb (pd.DataFrame | None, optional) – Additional DataFrame for data merging (default is None).
- Returns:
Processed DataFrame with APX credits data.
- Return type:
pd.DataFrame
- offsets_db_data.apx.process_apx_projects(df: DataFrame, *, credits: DataFrame, registry_name: str) DataFrame [source]#
Process APX projects data, including renaming, adding, and validating columns, harmonizing statuses, and merging with credits data.
- Parameters:
df (pd.DataFrame) – Input DataFrame with raw projects data.
credits (pd.DataFrame) – DataFrame containing credits data for merging.
registry_name (str) – Name of the registry for specific processing steps.
- Returns:
Processed DataFrame with harmonized and validated APX projects data.
- Return type:
pd.DataFrame
ARB Data Functions#
The following functions are specific to the ARB data.
- offsets_db_data.arb.process_arb(df: DataFrame) DataFrame [source]#
Process ARB (Air Resources Board) data by renaming columns, handling nulls, interpolating vintages, and transforming the data structure for transactions.
- Parameters:
df (pd.DataFrame) – Input DataFrame containing raw ARB data.
- Returns:
data – Processed DataFrame with ARB data. Columns include ‘opr_id’, ‘vintage’, ‘issued_at’ (interpolated), various credit transaction types, and quantities. The DataFrame is also validated against a predefined schema for credit data.
- Return type:
pd.DataFrame
Notes
The function renames columns for readability and standardization.
It interpolates missing vintage values and handles NaNs in ‘issuance’ column.
Retirement transactions are derived based on compliance period dates.
The DataFrame is melted to restructure credit data.
Zero retirement events are dropped as they are considered artifacts.
A prefix is added to ‘project_id’ to indicate the source.
The ‘registry’ column is derived based on the project_id prefix.
The ‘vintage’ column is converted to integer type.
Finally, the data is converted to datetime where necessary and validated against a predefined schema.
Common Functions#
The following functions are common to all registries.
- offsets_db_data.common.add_missing_columns(df: DataFrame, *, schema: DataFrameSchema) DataFrame [source]#
Add any missing columns to the DataFrame and initialize them with None.
- Parameters:
df (pd.DataFrame) – Input DataFrame.
schema (pa.DataFrameSchema) – Pandera schema to validate against.
- Returns:
DataFrame with all specified columns, adding missing ones initialized to None.
- Return type:
pd.DataFrame
- offsets_db_data.common.clean_and_convert_numeric_columns(df: DataFrame, *, columns: list[str]) DataFrame [source]#
Clean and convert specified columns to numeric format in the DataFrame.
- offsets_db_data.common.convert_to_datetime(df: DataFrame, *, columns: list, utc: bool = True, **kwargs: Any) DataFrame [source]#
Convert specified columns in the DataFrame to datetime format.
- Parameters:
- Returns:
DataFrame with specified columns converted to datetime format.
- Return type:
pd.DataFrame
- offsets_db_data.common.load_column_mapping(*, registry_name: str, download_type: str, mapping_path: str) dict [source]#
- offsets_db_data.common.load_protocol_mapping(path: UPath = PosixUPath('/home/docs/checkouts/readthedocs.org/user_builds/offsets-db-data/envs/latest/lib/python3.12/site-packages/offsets_db_data/configs/all-protocol-mapping.json')) dict [source]#
- offsets_db_data.common.load_registry_project_column_mapping(*, registry_name: str, file_path: UPath = PosixUPath('/home/docs/checkouts/readthedocs.org/user_builds/offsets-db-data/envs/latest/lib/python3.12/site-packages/offsets_db_data/configs/projects-raw-columns-mapping.json')) dict [source]#
- offsets_db_data.common.set_registry(df: DataFrame, registry_name: str) DataFrame [source]#
Set the registry name for each record in the DataFrame.
- Parameters:
df (pd.DataFrame) – Input DataFrame.
registry_name (str) – Name of the registry to set.
- Returns:
DataFrame with a new ‘registry’ column set to the specified registry name.
- Return type:
pd.DataFrame
- offsets_db_data.common.validate(df: DataFrame, schema: DataFrameSchema) DataFrame [source]#
Validate the DataFrame against a given Pandera schema.
- Parameters:
df (pd.DataFrame) – Input DataFrame.
schema (pa.DataFrameSchema) – Pandera schema to validate against.
- Returns:
DataFrame with columns sorted according to the schema and validated against it.
- Return type:
pd.DataFrame
- offsets_db_data.credits.aggregate_issuance_transactions(df: DataFrame) DataFrame [source]#
Aggregate issuance transactions by summing the quantity for each combination of project ID, transaction date, and vintage.
- Parameters:
df (pd.DataFrame) – Input DataFrame containing issuance transaction data.
- Returns:
DataFrame with aggregated issuance transactions, filtered to include only those with a positive quantity.
- Return type:
pd.DataFrame
- offsets_db_data.credits.filter_and_merge_transactions(df: DataFrame, arb_data: DataFrame, project_id_column: str = 'project_id') DataFrame [source]#
Filter transactions based on project ID intersection with ARB data and merge the filtered transactions.
- Parameters:
df (pd.DataFrame) – Input DataFrame with transaction data.
arb_data (pd.DataFrame) – DataFrame containing ARB issuance data.
project_id_column (str, optional) – The name of the column containing project IDs (default is ‘project_id’).
- Returns:
DataFrame with transactions from the input DataFrame, excluding those present in ARB data, merged with relevant ARB transactions.
- Return type:
pd.DataFrame
- offsets_db_data.credits.handle_non_issuance_transactions(df: DataFrame) DataFrame [source]#
Filter the DataFrame to include only non-issuance transactions.
- Parameters:
df (pd.DataFrame) – Input DataFrame containing transaction data.
- Returns:
DataFrame containing only transactions where ‘transaction_type’ is not ‘issuance’.
- Return type:
pd.DataFrame
- offsets_db_data.credits.merge_with_arb(credits: DataFrame, *, arb: DataFrame) DataFrame [source]#
ARB issuance table contains the authorative version of all credit transactions for ARB projects. This function drops all registry crediting data and, isntead, patches in data from the ARB issuance table.
- Parameters:
credits (pd.DataFrame) – Pandas dataframe containing registry credit data
arb (pd.DataFrame) – Pandas dataframe containing ARB issuance data
- Returns:
Pandas dataframe containing merged credit and ARB data
- Return type:
pd.DataFrame
- offsets_db_data.projects.add_category(df: DataFrame, *, protocol_mapping: dict) DataFrame [source]#
Add a category to each record in the DataFrame based on its protocol.
- Parameters:
df (pd.DataFrame) – Input DataFrame containing protocol data.
protocol_mapping (dict) – Dictionary mapping protocol strings to categories.
- Returns:
DataFrame with a new ‘category’ column, derived from the protocol information.
- Return type:
pd.DataFrame
- offsets_db_data.projects.add_first_issuance_and_retirement_dates(projects: DataFrame, *, credits: DataFrame) DataFrame [source]#
Add the first issuance date of carbon credits to each project in the projects DataFrame.
- Parameters:
credits (pd.DataFrame) – A pandas DataFrame containing credit issuance data with columns ‘project_id’, ‘transaction_date’, and ‘transaction_type’.
projects (pd.DataFrame) – A pandas DataFrame containing project data with a ‘project_id’ column.
- Returns:
projects – A pandas DataFrame which is the original projects DataFrame with two additional columns ‘first_issuance_at’ representing the first issuance date of each project and ‘first_retirement_at’ representing the first retirement date of each project.
- Return type:
pd.DataFrame
- offsets_db_data.projects.add_is_compliance_flag(df: DataFrame) DataFrame [source]#
Add a compliance flag to the DataFrame based on the protocol.
- Parameters:
df (pd.DataFrame) – Input DataFrame containing protocol data.
- Returns:
DataFrame with a new ‘is_compliance’ column, indicating if the protocol starts with ‘arb-‘.
- Return type:
pd.DataFrame
- offsets_db_data.projects.add_retired_and_issued_totals(projects: DataFrame, *, credits: DataFrame) DataFrame [source]#
Add total quantities of issued and retired credits to each project.
- Parameters:
projects (pd.DataFrame) – DataFrame containing project data.
credits (pd.DataFrame) – DataFrame containing credit transaction data.
- Returns:
DataFrame with two new columns: ‘issued’ and ‘retired’, representing the total quantities of issued and retired credits.
- Return type:
pd.DataFrame
- offsets_db_data.projects.find_protocol(*, search_string: str, inverted_protocol_mapping: dict[str, list[str]]) list[str] [source]#
Match known strings of project methodologies to internal topology
Unmatched strings are passed through to the database, until such time that we update mapping data.
- offsets_db_data.projects.get_protocol_category(*, protocol_strs: list[str] | str, protocol_mapping: dict) list[str] [source]#
Get category based on protocol string
- offsets_db_data.projects.harmonize_country_names(df: DataFrame, *, country_column: str = 'country') DataFrame [source]#
Harmonize country names in the DataFrame to standardized country names.
- Parameters:
df (pd.DataFrame) – Input DataFrame with country data.
country_column (str, optional) – The name of the column containing country names to be harmonized (default is ‘country’).
- Returns:
DataFrame with harmonized country names in the specified column.
- Return type:
pd.DataFrame
- offsets_db_data.projects.harmonize_status_codes(df: DataFrame, *, status_column: str = 'status') DataFrame [source]#
Harmonize project status codes across registries
Excludes ACR, as it requires special treatment across two columns
- Parameters:
df (pd.DataFrame) – Input DataFrame with project status data.
status_column (str, optional) – Name of the column containing status codes to harmonize (default is ‘status’).
- Returns:
DataFrame with harmonized project status codes.
- Return type:
pd.DataFrame
- offsets_db_data.projects.map_protocol(df: DataFrame, *, inverted_protocol_mapping: dict, original_protocol_column: str = 'original_protocol') DataFrame [source]#
Map protocols in the DataFrame to standardized names based on an inverted protocol mapping.
- Parameters:
df (pd.DataFrame) – Input DataFrame containing protocol data.
inverted_protocol_mapping (dict) – Dictionary mapping protocol strings to standardized protocol names.
original_protocol_column (str, optional) – Name of the column containing original protocol information (default is ‘original_protocol’).
- Returns:
DataFrame with a new ‘protocol’ column, containing mapped protocol names.
- Return type:
pd.DataFrame
- offsets_db_data.registry.get_registry_from_project_id(project_id: str) str [source]#
Retrieve the full registry name from a project ID using a predefined abbreviation mapping.
- Parameters:
project_id (str) – The project ID whose registry needs to be identified.
- Returns:
The full name of the registry corresponding to the abbreviation in the project ID.
- Return type:
Notes
The function expects the first three characters of the project ID to be the abbreviation of the registry.
It uses a predefined mapping (REGISTRY_ABBR_MAP) to convert the abbreviation to the full registry name.
The project ID is converted to lowercase to ensure case-insensitive matching.
The function raises a KeyError if the abbreviation is not found in REGISTRY_ABBR_MAP.