Environmental Metadata

Get environmental metadata for geographic coordinates.

What It Does

Get environmental metadata values for slots like annual_precpt, annual_temp, elev, depth, etc.

Give it GPS coordinates → Get back values with units and provenance. Compatible with NMDC submission-schema and other applications.

Quick Example

from biosample_enricher.environmental_metadata import get_environmental_metadata

# Get climate data for San Francisco
result = get_environmental_metadata(
    lat=37.7749,
    lon=-122.4194,
    slots=["annual_precpt", "annual_temp"]
)

# Use the values in your NMDC submission
print(result["values"])
# {'annual_precpt': 519.3, 'annual_temp': 14.1}

# Check which data sources were used
print(result["metadata"]["climate_normals"]["providers_used"])
# ['meteostat', 'nasa_power']

What Values Can You Get?

Currently Supported (✅ Ready to Use)

Climate Data (30-year averages, no datetime needed):

Slot Name	Description	Units	Type
`annual_precpt`	Annual precipitation (30-year average)	millimeters/year	float
`annual_temp`	Annual temperature (30-year average)	degrees Celsius	float

Providers: meteostat, nasa_power (automatically queries both and averages results)

Elevation Data (no datetime needed):

Slot Name	Description	Units	Type
`elev`	Elevation above sea level	meters	float

Providers: USGS, Google Maps, Open Topo Data, OSM (tries multiple sources)

Partially Implemented (⚠️ Use with Caution)

Weather Data (requires datetime - collection date/time):

Slot Name	Description	Units	Type
`temp`	Temperature at collection time	degrees Celsius	float
`air_temp`	Air temperature (alias for temp)	degrees Celsius	float
`humidity`	Relative humidity	g/m³	string
`wind_speed`	Wind speed	m/s	string
`wind_direction`	Wind direction	degrees	string
`solar_irradiance`	Solar radiation	W/m²	string

Warning

Weather slots require datetime_obj parameter. Data availability depends on location and date. Not all slots may return values for all locations/times.

Marine Data (no datetime needed):

Slot Name	Description	Units	Type
`depth`	Water depth (negative for underwater)	meters	string

Warning

Marine providers (GEBCO, ESA CCI, NOAA) are marked as unreliable in Issue #181. Data quality varies significantly by location.

Soil Data (no datetime needed):

Slot Name	Description	Units	Type
`ph`	Soil pH	pH units	float
`soil_type`	USDA soil texture class	text	string

Warning

SoilGrids provider has intermittent failures (Issue #184). Success rate varies by location.

Not Yet Implemented (❌ Future Work)

These submission-schema slots are not yet supported:

cur_vegetation - Current vegetation type (Issue #194)
flooding - Flooding history (Issue #192)
fire - Fire history
extreme_event - Extreme weather events
Many others…

See the submission-schema documentation for the complete list of slots.

Parameters

biosample_enricher.environmental_metadata.get_environmental_metadata(lat, lon, slots, datetime_obj=None, providers=None, strategy='mean')[source]

Get NMDC submission-schema values for specified slots.

Parameters:

lat (float) – Latitude in decimal degrees (required) Valid range: -90 to 90
lon (float) – Longitude in decimal degrees (required) Valid range: -180 to 180

slots (list[str]) –

Slot names to retrieve (required) Must be from supported slots listed above. Cannot be empty.

Examples:

# Climate only
slots=["annual_precpt", "annual_temp"]

# Mix climate and elevation
slots=["annual_precpt", "elev"]

# Weather (requires datetime_obj)
slots=["temp", "humidity", "wind_speed"]

datetime_obj (datetime) –
Collection date/time (optional) Required for weather slots (temp, air_temp, humidity, etc.) Not used for climate, elevation, marine, or soil slots

Example:
```
from datetime import datetime
datetime_obj=datetime(2023, 7, 15, 14, 30)
```
providers (list[str]) –
Specific providers to use (optional) If None (default), queries all available providers.

Valid providers by slot category:
- Climate slots: ["meteostat", "nasa_power"]
- Elevation slots: ["usgs", "google", "open_topo_data", "osm"]
Examples:
```
# Use only meteostat for climate
providers=["meteostat"]

# Use only USGS for elevation
providers=["usgs"]
```
strategy (str) –
How to combine values from multiple providers (optional) Default is "mean".

Valid values (from CONSENSUS_STRATEGIES):
- "mean": Average across all successful providers (default, most reliable)
- "median": Middle value when sorted (robust to outliers)
- "first": Use first successful provider in priority order (fastest)
- "best_quality": Use provider with best quality metric (closest station, highest resolution)
See Consensus Strategies for detailed descriptions and usage guidance.

Example:
```
# Use median to handle outliers
result = get_environmental_metadata(
    lat=46.8523, lon=-121.7603,
    slots=["elev"],
    strategy="median"
)
```

Returns:

Dictionary with two keys:

"values": Dict mapping slot names to submission-ready values
- Values are in the correct units for submission-schema
- Slots that failed to retrieve data are omitted (not None)
- Types match submission-schema requirements (mostly float, some string)
"metadata": Dict with provider information for transparency
- Shows which data sources contributed to each value
- Includes provider-specific results for comparison
- Lists any providers that failed with error messages

Return type:

dict[str, Any]

Raises:

ValueError – If latitude is outside -90 to 90
ValueError – If longitude is outside -180 to 180
ValueError – If slots list is empty
ValueError – If slots contains unsupported slot names (Error message will list all supported slots)
ValueError – If providers contains invalid provider names for requested slots (Error message will list valid providers)

Return Value Structure

The function returns a dictionary with this structure:

{
    "values": {
        "annual_precpt": 519.3,      # float: mm/year
        "annual_temp": 14.1,          # float: °C
        "elev": 52.4                  # float: meters
        # Missing/failed slots are omitted
    },
    "metadata": {
        "climate_normals": {          # Only present if climate slots requested
            "providers_used": ["meteostat", "nasa_power"],
            "consensus_strategy": "consensus",  # How values were combined
            "provider_results": {
                "meteostat": {
                    "annual_precpt": 453.1,
                    "annual_temp": 14.2,
                    "period": "1991-2020",
                    "station_distance_km": 3.2
                },
                "nasa_power": {
                    "annual_precpt": 585.5,
                    "annual_temp": 14.0,
                    "period": "2001-2020"
                }
            },
            "failed_providers": {}     # Dict of {provider: error_message}
        }
        # "weather", "elevation", "marine", "soil" metadata added as implemented
    }
}

Note

Missing slots are omitted, not set to None. Always check with if "slot_name" in result["values"] before accessing values.

Examples

Basic Climate Data

from biosample_enricher.environmental_metadata import get_environmental_metadata

# Get 30-year climate averages for a location
result = get_environmental_metadata(
    lat=42.3601,   # Boston
    lon=-71.0589,
    slots=["annual_precpt", "annual_temp"]
)

# Use the values
precip = result["values"]["annual_precpt"]  # 1090.2 mm/year
temp = result["values"]["annual_temp"]      # 10.8 °C

# Check data quality by comparing providers
providers = result["metadata"]["climate_normals"]["provider_results"]
for name, data in providers.items():
    print(f"{name}: {data['annual_precpt']:.1f} mm/year")
# meteostat: 1089.3 mm/year
# nasa_power: 1091.1 mm/year

Mixing Multiple Slot Types

# Get climate + elevation in one call
result = get_environmental_metadata(
    lat=40.7128,   # New York City
    lon=-74.0060,
    slots=["annual_precpt", "annual_temp", "elev"]
)

values = result["values"]
print(f"Elevation: {values['elev']} m")
print(f"Annual rain: {values['annual_precpt']} mm/year")
print(f"Annual temp: {values['annual_temp']} °C")

Using Specific Providers

# Use only meteostat for climate (not NASA POWER)
result = get_environmental_metadata(
    lat=51.5074,   # London
    lon=-0.1278,
    slots=["annual_precpt", "annual_temp"],
    providers=["meteostat"]  # Only use this provider
)

metadata = result["metadata"]["climate_normals"]
print(metadata["providers_used"])  # ['meteostat']

Weather Data (Requires Datetime)

from datetime import datetime

# Get weather at sample collection time
result = get_environmental_metadata(
    lat=34.0522,   # Los Angeles
    lon=-118.2437,
    slots=["temp", "humidity", "wind_speed"],
    datetime_obj=datetime(2023, 7, 15, 14, 30)  # Required!
)

# Check what data was available
if "temp" in result["values"]:
    print(f"Temperature: {result['values']['temp']} °C")
else:
    print("Temperature data not available for this location/time")

Error Handling

# Handle invalid inputs gracefully
try:
    result = get_environmental_metadata(
        lat=37.7749,
        lon=-122.4194,
        slots=["annual_precpt", "invalid_slot_name"]
    )
except ValueError as e:
    print(f"Error: {e}")
    # Error: Unsupported slot(s): ['invalid_slot_name'].
    # Supported slots: ['air_temp', 'annual_precpt', 'annual_temp', ...]

# Check for missing data
result = get_environmental_metadata(
    lat=37.7749,
    lon=-122.4194,
    slots=["annual_precpt", "depth"]  # depth may not be available on land
)

if "annual_precpt" in result["values"]:
    print(f"Got precipitation: {result['values']['annual_precpt']}")

if "depth" not in result["values"]:
    print("Depth data not available (probably on land)")

Quick Reference

All Constants at a Glance

These constants are available for programmatic access:

from biosample_enricher.environmental_metadata import (
    # Slot categories
    ALL_SUPPORTED_SLOTS,   # All slots combined
    CLIMATE_SLOTS,         # {'annual_precpt', 'annual_temp'}
    WEATHER_SLOTS,         # {'temp', 'air_temp', 'humidity', 'wind_speed', 'wind_direction', 'solar_irradiance'}
    ELEVATION_SLOTS,       # {'elev'}
    MARINE_SLOTS,          # {'depth'}
    SOIL_SLOTS,            # {'ph', 'soil_type'}

    # Provider names
    CLIMATE_PROVIDERS,     # {'meteostat', 'nasa_power'}
    ELEVATION_PROVIDERS,   # {'usgs', 'google', 'open_topo_data', 'osm'}

    # Consensus strategies
    CONSENSUS_STRATEGIES,  # {'mean', 'median', 'first', 'best_quality'}
)

Slots by Category

Category	Slots	Providers	Datetime Required?
Climate	`annual_precpt`, `annual_temp`	meteostat, nasa_power	No
Weather	`temp`, `air_temp`, `humidity`, `wind_speed`, `wind_direction`, `solar_irradiance`	meteostat, open_meteo	Yes
Elevation	`elev`	usgs, google, open_topo_data, osm	No
Marine	`depth`	gebco, noaa	No
Soil	`ph`, `soil_type`	soilgrids, usda_nrcs	No

Consensus Strategies

When multiple providers return data, values are combined using a consensus strategy:

Strategy	Description	When to Use
`mean`	Arithmetic average across all providers (default)	General use, most reliable
`median`	Middle value when sorted	When outliers are possible
`first`	Use first successful provider	When speed matters
`best_quality`	Use provider with best quality metric	Advanced use with quality scores

Slot Status (Reliability)

Status	Slots	Notes
Ready	`annual_precpt`, `annual_temp`, `elev`	Production-ready, reliable data
Caution	`temp`, `ph`, `depth`	May have gaps or provider issues
Experimental	`air_temp`, `humidity`, `wind_speed`, `wind_direction`, `solar_irradiance`, `soil_type`	Limited testing, may change

Copy-Paste Slot Lists

For convenience, here are the slot names ready to copy:

All slots (comma-separated):

annual_precpt, annual_temp, elev, temp, air_temp, humidity, wind_speed, wind_direction, solar_irradiance, depth, ph, soil_type

Production-ready slots only:

annual_precpt, annual_temp, elev

Climate + Elevation (most common):

annual_precpt, annual_temp, elev

Checking Available Slots and Providers

from biosample_enricher.environmental_metadata import (
    ALL_SUPPORTED_SLOTS,
    CLIMATE_SLOTS,
    WEATHER_SLOTS,
    ELEVATION_SLOTS,
    MARINE_SLOTS,
    SOIL_SLOTS,
    CLIMATE_PROVIDERS,
    ELEVATION_PROVIDERS,
    CONSENSUS_STRATEGIES,
)

# See all supported slots
print("All slots:", sorted(ALL_SUPPORTED_SLOTS))

# See slots by category
print("Climate:", sorted(CLIMATE_SLOTS))
print("Weather:", sorted(WEATHER_SLOTS))
print("Elevation:", sorted(ELEVATION_SLOTS))
print("Marine:", sorted(MARINE_SLOTS))
print("Soil:", sorted(SOIL_SLOTS))

# See available providers
print("Climate providers:", sorted(CLIMATE_PROVIDERS))
print("Elevation providers:", sorted(ELEVATION_PROVIDERS))

# See consensus strategies
print("Strategies:", sorted(CONSENSUS_STRATEGIES))

Limitations and Known Issues

Current Limitations

Limited slot coverage: Only 13 of ~200+ submission-schema slots are supported
No bulk operations: Must call once per biosample (no batch processing yet)
Weather data gaps: Historical weather not available for all locations/times
Provider reliability: Some providers have intermittent failures (see issues below)
No caching control: Cannot disable or clear HTTP cache from this function

Known Issues

Warning

Provider Reliability Issues - Please review these before relying on data:

Issue #181: Marine providers (GEBCO, ESA CCI, NOAA) incomplete/unreliable
Issue #182: MODIS vegetation provider uses mock data only
Issue #183: USGS elevation provider unreliable (marked flaky)
Issue #184: SoilGrids provider intermittent failures (marked flaky)

Climate and elevation data are generally reliable. Marine and soil data quality varies.

Future Development

These features are planned but not yet implemented:

More submission-schema slots (Issue #193)
Vegetation data from land cover (Issue #194)
Flooding history (Issue #192)
Batch processing for multiple biosamples
Quality scores and confidence intervals
Custom provider selection strategies beyond “consensus”

Environmental Metadata

What It Does

Quick Example

What Values Can You Get?

Currently Supported (✅ Ready to Use)

Partially Implemented (⚠️ Use with Caution)

Not Yet Implemented (❌ Future Work)

Parameters

Return Value Structure

Examples

Basic Climate Data

Mixing Multiple Slot Types

Using Specific Providers

Weather Data (Requires Datetime)

Error Handling

Quick Reference

All Constants at a Glance

Slots by Category

Consensus Strategies

Slot Status (Reliability)

Copy-Paste Slot Lists

Checking Available Slots and Providers

Limitations and Known Issues

Current Limitations

Known Issues

Future Development

See Also