Environmental Metadata

Get environmental metadata for geographic coordinates.

What It Does

Get environmental metadata values for slots like annual_precpt, annual_temp, elev, depth, etc.

Give it GPS coordinates → Get back values with units and provenance. Compatible with NMDC submission-schema and other applications.

Quick Example

from biosample_enricher.environmental_metadata import get_environmental_metadata

# Get climate data for San Francisco
result = get_environmental_metadata(
    lat=37.7749,
    lon=-122.4194,
    slots=["annual_precpt", "annual_temp"]
)

# Use the values in your NMDC submission
print(result["values"])
# {'annual_precpt': 519.3, 'annual_temp': 14.1}

# Check which data sources were used
print(result["metadata"]["climate_normals"]["providers_used"])
# ['meteostat', 'nasa_power']

What Values Can You Get?

Currently Supported (✅ Ready to Use)

Climate Data (30-year averages, no datetime needed):

Slot Name

Description

Units

Type

annual_precpt

Annual precipitation (30-year average)

millimeters/year

float

annual_temp

Annual temperature (30-year average)

degrees Celsius

float

Providers: meteostat, nasa_power (automatically queries both and averages results)

Elevation Data (no datetime needed):

Slot Name

Description

Units

Type

elev

Elevation above sea level

meters

float

Providers: USGS, Google Maps, Open Topo Data, OSM (tries multiple sources)

Partially Implemented (⚠️ Use with Caution)

Weather Data (requires datetime - collection date/time):

Slot Name

Description

Units

Type

temp

Temperature at collection time

degrees Celsius

float

air_temp

Air temperature (alias for temp)

degrees Celsius

float

humidity

Relative humidity

g/m³

string

wind_speed

Wind speed

m/s

string

wind_direction

Wind direction

degrees

string

solar_irradiance

Solar radiation

W/m²

string

Warning

Weather slots require datetime_obj parameter. Data availability depends on location and date. Not all slots may return values for all locations/times.

Marine Data (no datetime needed):

Slot Name

Description

Units

Type

depth

Water depth (negative for underwater)

meters

string

Warning

Marine providers (GEBCO, ESA CCI, NOAA) are marked as unreliable in Issue #181. Data quality varies significantly by location.

Soil Data (no datetime needed):

Slot Name

Description

Units

Type

ph

Soil pH

pH units

float

soil_type

USDA soil texture class

text

string

Warning

SoilGrids provider has intermittent failures (Issue #184). Success rate varies by location.

Not Yet Implemented (❌ Future Work)

These submission-schema slots are not yet supported:

  • cur_vegetation - Current vegetation type (Issue #194)

  • flooding - Flooding history (Issue #192)

  • fire - Fire history

  • extreme_event - Extreme weather events

  • Many others…

See the submission-schema documentation for the complete list of slots.

See also

Issue #193: Add submission-schema extraction helpers for more slots

Parameters

biosample_enricher.environmental_metadata.get_environmental_metadata(lat, lon, slots, datetime_obj=None, providers=None, strategy='mean')[source]

Get NMDC submission-schema values for specified slots.

Parameters:
  • lat (float) – Latitude in decimal degrees (required) Valid range: -90 to 90

  • lon (float) – Longitude in decimal degrees (required) Valid range: -180 to 180

  • slots (list[str]) –

    Slot names to retrieve (required) Must be from supported slots listed above. Cannot be empty.

    Examples:

    # Climate only
    slots=["annual_precpt", "annual_temp"]
    
    # Mix climate and elevation
    slots=["annual_precpt", "elev"]
    
    # Weather (requires datetime_obj)
    slots=["temp", "humidity", "wind_speed"]
    

  • datetime_obj (datetime) –

    Collection date/time (optional) Required for weather slots (temp, air_temp, humidity, etc.) Not used for climate, elevation, marine, or soil slots

    Example:

    from datetime import datetime
    datetime_obj=datetime(2023, 7, 15, 14, 30)
    

  • providers (list[str]) –

    Specific providers to use (optional) If None (default), queries all available providers.

    Valid providers by slot category:

    • Climate slots: ["meteostat", "nasa_power"]

    • Elevation slots: ["usgs", "google", "open_topo_data", "osm"]

    Examples:

    # Use only meteostat for climate
    providers=["meteostat"]
    
    # Use only USGS for elevation
    providers=["usgs"]
    

  • strategy (str) –

    How to combine values from multiple providers (optional) Default is "mean".

    Valid values (from CONSENSUS_STRATEGIES):

    • "mean": Average across all successful providers (default, most reliable)

    • "median": Middle value when sorted (robust to outliers)

    • "first": Use first successful provider in priority order (fastest)

    • "best_quality": Use provider with best quality metric (closest station, highest resolution)

    See Consensus Strategies for detailed descriptions and usage guidance.

    Example:

    # Use median to handle outliers
    result = get_environmental_metadata(
        lat=46.8523, lon=-121.7603,
        slots=["elev"],
        strategy="median"
    )
    

Returns:

Dictionary with two keys:

  • "values": Dict mapping slot names to submission-ready values

    • Values are in the correct units for submission-schema

    • Slots that failed to retrieve data are omitted (not None)

    • Types match submission-schema requirements (mostly float, some string)

  • "metadata": Dict with provider information for transparency

    • Shows which data sources contributed to each value

    • Includes provider-specific results for comparison

    • Lists any providers that failed with error messages

Return type:

dict[str, Any]

Raises:
  • ValueError – If latitude is outside -90 to 90

  • ValueError – If longitude is outside -180 to 180

  • ValueError – If slots list is empty

  • ValueError – If slots contains unsupported slot names (Error message will list all supported slots)

  • ValueError – If providers contains invalid provider names for requested slots (Error message will list valid providers)

Return Value Structure

The function returns a dictionary with this structure:

{
    "values": {
        "annual_precpt": 519.3,      # float: mm/year
        "annual_temp": 14.1,          # float: °C
        "elev": 52.4                  # float: meters
        # Missing/failed slots are omitted
    },
    "metadata": {
        "climate_normals": {          # Only present if climate slots requested
            "providers_used": ["meteostat", "nasa_power"],
            "consensus_strategy": "consensus",  # How values were combined
            "provider_results": {
                "meteostat": {
                    "annual_precpt": 453.1,
                    "annual_temp": 14.2,
                    "period": "1991-2020",
                    "station_distance_km": 3.2
                },
                "nasa_power": {
                    "annual_precpt": 585.5,
                    "annual_temp": 14.0,
                    "period": "2001-2020"
                }
            },
            "failed_providers": {}     # Dict of {provider: error_message}
        }
        # "weather", "elevation", "marine", "soil" metadata added as implemented
    }
}

Note

Missing slots are omitted, not set to None. Always check with if "slot_name" in result["values"] before accessing values.

Examples

Basic Climate Data

from biosample_enricher.environmental_metadata import get_environmental_metadata

# Get 30-year climate averages for a location
result = get_environmental_metadata(
    lat=42.3601,   # Boston
    lon=-71.0589,
    slots=["annual_precpt", "annual_temp"]
)

# Use the values
precip = result["values"]["annual_precpt"]  # 1090.2 mm/year
temp = result["values"]["annual_temp"]      # 10.8 °C

# Check data quality by comparing providers
providers = result["metadata"]["climate_normals"]["provider_results"]
for name, data in providers.items():
    print(f"{name}: {data['annual_precpt']:.1f} mm/year")
# meteostat: 1089.3 mm/year
# nasa_power: 1091.1 mm/year

Mixing Multiple Slot Types

# Get climate + elevation in one call
result = get_environmental_metadata(
    lat=40.7128,   # New York City
    lon=-74.0060,
    slots=["annual_precpt", "annual_temp", "elev"]
)

values = result["values"]
print(f"Elevation: {values['elev']} m")
print(f"Annual rain: {values['annual_precpt']} mm/year")
print(f"Annual temp: {values['annual_temp']} °C")

Using Specific Providers

# Use only meteostat for climate (not NASA POWER)
result = get_environmental_metadata(
    lat=51.5074,   # London
    lon=-0.1278,
    slots=["annual_precpt", "annual_temp"],
    providers=["meteostat"]  # Only use this provider
)

metadata = result["metadata"]["climate_normals"]
print(metadata["providers_used"])  # ['meteostat']

Weather Data (Requires Datetime)

from datetime import datetime

# Get weather at sample collection time
result = get_environmental_metadata(
    lat=34.0522,   # Los Angeles
    lon=-118.2437,
    slots=["temp", "humidity", "wind_speed"],
    datetime_obj=datetime(2023, 7, 15, 14, 30)  # Required!
)

# Check what data was available
if "temp" in result["values"]:
    print(f"Temperature: {result['values']['temp']} °C")
else:
    print("Temperature data not available for this location/time")

Error Handling

# Handle invalid inputs gracefully
try:
    result = get_environmental_metadata(
        lat=37.7749,
        lon=-122.4194,
        slots=["annual_precpt", "invalid_slot_name"]
    )
except ValueError as e:
    print(f"Error: {e}")
    # Error: Unsupported slot(s): ['invalid_slot_name'].
    # Supported slots: ['air_temp', 'annual_precpt', 'annual_temp', ...]

# Check for missing data
result = get_environmental_metadata(
    lat=37.7749,
    lon=-122.4194,
    slots=["annual_precpt", "depth"]  # depth may not be available on land
)

if "annual_precpt" in result["values"]:
    print(f"Got precipitation: {result['values']['annual_precpt']}")

if "depth" not in result["values"]:
    print("Depth data not available (probably on land)")

Quick Reference

All Constants at a Glance

These constants are available for programmatic access:

from biosample_enricher.environmental_metadata import (
    # Slot categories
    ALL_SUPPORTED_SLOTS,   # All slots combined
    CLIMATE_SLOTS,         # {'annual_precpt', 'annual_temp'}
    WEATHER_SLOTS,         # {'temp', 'air_temp', 'humidity', 'wind_speed', 'wind_direction', 'solar_irradiance'}
    ELEVATION_SLOTS,       # {'elev'}
    MARINE_SLOTS,          # {'depth'}
    SOIL_SLOTS,            # {'ph', 'soil_type'}

    # Provider names
    CLIMATE_PROVIDERS,     # {'meteostat', 'nasa_power'}
    ELEVATION_PROVIDERS,   # {'usgs', 'google', 'open_topo_data', 'osm'}

    # Consensus strategies
    CONSENSUS_STRATEGIES,  # {'mean', 'median', 'first', 'best_quality'}
)

Slots by Category

Category

Slots

Providers

Datetime Required?

Climate

annual_precpt, annual_temp

meteostat, nasa_power

No

Weather

temp, air_temp, humidity, wind_speed, wind_direction, solar_irradiance

meteostat, open_meteo

Yes

Elevation

elev

usgs, google, open_topo_data, osm

No

Marine

depth

gebco, noaa

No

Soil

ph, soil_type

soilgrids, usda_nrcs

No

Consensus Strategies

When multiple providers return data, values are combined using a consensus strategy:

Strategy

Description

When to Use

mean

Arithmetic average across all providers (default)

General use, most reliable

median

Middle value when sorted

When outliers are possible

first

Use first successful provider

When speed matters

best_quality

Use provider with best quality metric

Advanced use with quality scores

Slot Status (Reliability)

Status

Slots

Notes

Ready

annual_precpt, annual_temp, elev

Production-ready, reliable data

Caution

temp, ph, depth

May have gaps or provider issues

Experimental

air_temp, humidity, wind_speed, wind_direction, solar_irradiance, soil_type

Limited testing, may change

Copy-Paste Slot Lists

For convenience, here are the slot names ready to copy:

All slots (comma-separated):

annual_precpt, annual_temp, elev, temp, air_temp, humidity, wind_speed, wind_direction, solar_irradiance, depth, ph, soil_type

Production-ready slots only:

annual_precpt, annual_temp, elev

Climate + Elevation (most common):

annual_precpt, annual_temp, elev

Checking Available Slots and Providers

from biosample_enricher.environmental_metadata import (
    ALL_SUPPORTED_SLOTS,
    CLIMATE_SLOTS,
    WEATHER_SLOTS,
    ELEVATION_SLOTS,
    MARINE_SLOTS,
    SOIL_SLOTS,
    CLIMATE_PROVIDERS,
    ELEVATION_PROVIDERS,
    CONSENSUS_STRATEGIES,
)

# See all supported slots
print("All slots:", sorted(ALL_SUPPORTED_SLOTS))

# See slots by category
print("Climate:", sorted(CLIMATE_SLOTS))
print("Weather:", sorted(WEATHER_SLOTS))
print("Elevation:", sorted(ELEVATION_SLOTS))
print("Marine:", sorted(MARINE_SLOTS))
print("Soil:", sorted(SOIL_SLOTS))

# See available providers
print("Climate providers:", sorted(CLIMATE_PROVIDERS))
print("Elevation providers:", sorted(ELEVATION_PROVIDERS))

# See consensus strategies
print("Strategies:", sorted(CONSENSUS_STRATEGIES))

Limitations and Known Issues

Current Limitations

  1. Limited slot coverage: Only 13 of ~200+ submission-schema slots are supported

  2. No bulk operations: Must call once per biosample (no batch processing yet)

  3. Weather data gaps: Historical weather not available for all locations/times

  4. Provider reliability: Some providers have intermittent failures (see issues below)

  5. No caching control: Cannot disable or clear HTTP cache from this function

Known Issues

Warning

Provider Reliability Issues - Please review these before relying on data:

  • Issue #181: Marine providers (GEBCO, ESA CCI, NOAA) incomplete/unreliable

  • Issue #182: MODIS vegetation provider uses mock data only

  • Issue #183: USGS elevation provider unreliable (marked flaky)

  • Issue #184: SoilGrids provider intermittent failures (marked flaky)

Climate and elevation data are generally reliable. Marine and soil data quality varies.

Future Development

These features are planned but not yet implemented:

  • More submission-schema slots (Issue #193)

  • Vegetation data from land cover (Issue #194)

  • Flooding history (Issue #192)

  • Batch processing for multiple biosamples

  • Quality scores and confidence intervals

  • Custom provider selection strategies beyond “consensus”

See Also

For Advanced Users:

For Understanding the Code:

External Resources: