Environmental Metadata
Get environmental metadata for geographic coordinates.
What It Does
Get environmental metadata values for slots like annual_precpt, annual_temp, elev, depth, etc.
Give it GPS coordinates → Get back values with units and provenance. Compatible with NMDC submission-schema and other applications.
Quick Example
from biosample_enricher.environmental_metadata import get_environmental_metadata
# Get climate data for San Francisco
result = get_environmental_metadata(
lat=37.7749,
lon=-122.4194,
slots=["annual_precpt", "annual_temp"]
)
# Use the values in your NMDC submission
print(result["values"])
# {'annual_precpt': 519.3, 'annual_temp': 14.1}
# Check which data sources were used
print(result["metadata"]["climate_normals"]["providers_used"])
# ['meteostat', 'nasa_power']
What Values Can You Get?
Currently Supported (✅ Ready to Use)
Climate Data (30-year averages, no datetime needed):
Slot Name |
Description |
Units |
Type |
|---|---|---|---|
|
Annual precipitation (30-year average) |
millimeters/year |
float |
|
Annual temperature (30-year average) |
degrees Celsius |
float |
Providers: meteostat, nasa_power (automatically queries both and averages results)
Elevation Data (no datetime needed):
Slot Name |
Description |
Units |
Type |
|---|---|---|---|
|
Elevation above sea level |
meters |
float |
Providers: USGS, Google Maps, Open Topo Data, OSM (tries multiple sources)
Partially Implemented (⚠️ Use with Caution)
Weather Data (requires datetime - collection date/time):
Slot Name |
Description |
Units |
Type |
|---|---|---|---|
|
Temperature at collection time |
degrees Celsius |
float |
|
Air temperature (alias for temp) |
degrees Celsius |
float |
|
Relative humidity |
g/m³ |
string |
|
Wind speed |
m/s |
string |
|
Wind direction |
degrees |
string |
|
Solar radiation |
W/m² |
string |
Warning
Weather slots require datetime_obj parameter. Data availability depends on location and date.
Not all slots may return values for all locations/times.
Marine Data (no datetime needed):
Slot Name |
Description |
Units |
Type |
|---|---|---|---|
|
Water depth (negative for underwater) |
meters |
string |
Warning
Marine providers (GEBCO, ESA CCI, NOAA) are marked as unreliable in Issue #181. Data quality varies significantly by location.
Soil Data (no datetime needed):
Slot Name |
Description |
Units |
Type |
|---|---|---|---|
|
Soil pH |
pH units |
float |
|
USDA soil texture class |
text |
string |
Warning
SoilGrids provider has intermittent failures (Issue #184). Success rate varies by location.
Not Yet Implemented (❌ Future Work)
These submission-schema slots are not yet supported:
cur_vegetation- Current vegetation type (Issue #194)flooding- Flooding history (Issue #192)fire- Fire historyextreme_event- Extreme weather eventsMany others…
See the submission-schema documentation for the complete list of slots.
See also
Issue #193: Add submission-schema extraction helpers for more slots
Parameters
- biosample_enricher.environmental_metadata.get_environmental_metadata(lat, lon, slots, datetime_obj=None, providers=None, strategy='mean')[source]
Get NMDC submission-schema values for specified slots.
- Parameters:
lat (float) – Latitude in decimal degrees (required) Valid range: -90 to 90
lon (float) – Longitude in decimal degrees (required) Valid range: -180 to 180
Slot names to retrieve (required) Must be from supported slots listed above. Cannot be empty.
Examples:
# Climate only slots=["annual_precpt", "annual_temp"] # Mix climate and elevation slots=["annual_precpt", "elev"] # Weather (requires datetime_obj) slots=["temp", "humidity", "wind_speed"]
datetime_obj (datetime) –
Collection date/time (optional) Required for weather slots (temp, air_temp, humidity, etc.) Not used for climate, elevation, marine, or soil slots
Example:
from datetime import datetime datetime_obj=datetime(2023, 7, 15, 14, 30)
Specific providers to use (optional) If None (default), queries all available providers.
Valid providers by slot category:
Climate slots:
["meteostat", "nasa_power"]Elevation slots:
["usgs", "google", "open_topo_data", "osm"]
Examples:
# Use only meteostat for climate providers=["meteostat"] # Use only USGS for elevation providers=["usgs"]
strategy (str) –
How to combine values from multiple providers (optional) Default is
"mean".Valid values (from
CONSENSUS_STRATEGIES):"mean": Average across all successful providers (default, most reliable)"median": Middle value when sorted (robust to outliers)"first": Use first successful provider in priority order (fastest)"best_quality": Use provider with best quality metric (closest station, highest resolution)
See Consensus Strategies for detailed descriptions and usage guidance.
Example:
# Use median to handle outliers result = get_environmental_metadata( lat=46.8523, lon=-121.7603, slots=["elev"], strategy="median" )
- Returns:
Dictionary with two keys:
"values": Dict mapping slot names to submission-ready valuesValues are in the correct units for submission-schema
Slots that failed to retrieve data are omitted (not None)
Types match submission-schema requirements (mostly float, some string)
"metadata": Dict with provider information for transparencyShows which data sources contributed to each value
Includes provider-specific results for comparison
Lists any providers that failed with error messages
- Return type:
- Raises:
ValueError – If latitude is outside -90 to 90
ValueError – If longitude is outside -180 to 180
ValueError – If slots list is empty
ValueError – If slots contains unsupported slot names (Error message will list all supported slots)
ValueError – If providers contains invalid provider names for requested slots (Error message will list valid providers)
Return Value Structure
The function returns a dictionary with this structure:
{
"values": {
"annual_precpt": 519.3, # float: mm/year
"annual_temp": 14.1, # float: °C
"elev": 52.4 # float: meters
# Missing/failed slots are omitted
},
"metadata": {
"climate_normals": { # Only present if climate slots requested
"providers_used": ["meteostat", "nasa_power"],
"consensus_strategy": "consensus", # How values were combined
"provider_results": {
"meteostat": {
"annual_precpt": 453.1,
"annual_temp": 14.2,
"period": "1991-2020",
"station_distance_km": 3.2
},
"nasa_power": {
"annual_precpt": 585.5,
"annual_temp": 14.0,
"period": "2001-2020"
}
},
"failed_providers": {} # Dict of {provider: error_message}
}
# "weather", "elevation", "marine", "soil" metadata added as implemented
}
}
Note
Missing slots are omitted, not set to None. Always check with if "slot_name" in result["values"]
before accessing values.
Examples
Basic Climate Data
from biosample_enricher.environmental_metadata import get_environmental_metadata
# Get 30-year climate averages for a location
result = get_environmental_metadata(
lat=42.3601, # Boston
lon=-71.0589,
slots=["annual_precpt", "annual_temp"]
)
# Use the values
precip = result["values"]["annual_precpt"] # 1090.2 mm/year
temp = result["values"]["annual_temp"] # 10.8 °C
# Check data quality by comparing providers
providers = result["metadata"]["climate_normals"]["provider_results"]
for name, data in providers.items():
print(f"{name}: {data['annual_precpt']:.1f} mm/year")
# meteostat: 1089.3 mm/year
# nasa_power: 1091.1 mm/year
Mixing Multiple Slot Types
# Get climate + elevation in one call
result = get_environmental_metadata(
lat=40.7128, # New York City
lon=-74.0060,
slots=["annual_precpt", "annual_temp", "elev"]
)
values = result["values"]
print(f"Elevation: {values['elev']} m")
print(f"Annual rain: {values['annual_precpt']} mm/year")
print(f"Annual temp: {values['annual_temp']} °C")
Using Specific Providers
# Use only meteostat for climate (not NASA POWER)
result = get_environmental_metadata(
lat=51.5074, # London
lon=-0.1278,
slots=["annual_precpt", "annual_temp"],
providers=["meteostat"] # Only use this provider
)
metadata = result["metadata"]["climate_normals"]
print(metadata["providers_used"]) # ['meteostat']
Weather Data (Requires Datetime)
from datetime import datetime
# Get weather at sample collection time
result = get_environmental_metadata(
lat=34.0522, # Los Angeles
lon=-118.2437,
slots=["temp", "humidity", "wind_speed"],
datetime_obj=datetime(2023, 7, 15, 14, 30) # Required!
)
# Check what data was available
if "temp" in result["values"]:
print(f"Temperature: {result['values']['temp']} °C")
else:
print("Temperature data not available for this location/time")
Error Handling
# Handle invalid inputs gracefully
try:
result = get_environmental_metadata(
lat=37.7749,
lon=-122.4194,
slots=["annual_precpt", "invalid_slot_name"]
)
except ValueError as e:
print(f"Error: {e}")
# Error: Unsupported slot(s): ['invalid_slot_name'].
# Supported slots: ['air_temp', 'annual_precpt', 'annual_temp', ...]
# Check for missing data
result = get_environmental_metadata(
lat=37.7749,
lon=-122.4194,
slots=["annual_precpt", "depth"] # depth may not be available on land
)
if "annual_precpt" in result["values"]:
print(f"Got precipitation: {result['values']['annual_precpt']}")
if "depth" not in result["values"]:
print("Depth data not available (probably on land)")
Quick Reference
All Constants at a Glance
These constants are available for programmatic access:
from biosample_enricher.environmental_metadata import (
# Slot categories
ALL_SUPPORTED_SLOTS, # All slots combined
CLIMATE_SLOTS, # {'annual_precpt', 'annual_temp'}
WEATHER_SLOTS, # {'temp', 'air_temp', 'humidity', 'wind_speed', 'wind_direction', 'solar_irradiance'}
ELEVATION_SLOTS, # {'elev'}
MARINE_SLOTS, # {'depth'}
SOIL_SLOTS, # {'ph', 'soil_type'}
# Provider names
CLIMATE_PROVIDERS, # {'meteostat', 'nasa_power'}
ELEVATION_PROVIDERS, # {'usgs', 'google', 'open_topo_data', 'osm'}
# Consensus strategies
CONSENSUS_STRATEGIES, # {'mean', 'median', 'first', 'best_quality'}
)
Slots by Category
Category |
Slots |
Providers |
Datetime Required? |
|---|---|---|---|
Climate |
|
meteostat, nasa_power |
No |
Weather |
|
meteostat, open_meteo |
Yes |
Elevation |
|
usgs, google, open_topo_data, osm |
No |
Marine |
|
gebco, noaa |
No |
Soil |
|
soilgrids, usda_nrcs |
No |
Consensus Strategies
When multiple providers return data, values are combined using a consensus strategy:
Strategy |
Description |
When to Use |
|---|---|---|
|
Arithmetic average across all providers (default) |
General use, most reliable |
|
Middle value when sorted |
When outliers are possible |
|
Use first successful provider |
When speed matters |
|
Use provider with best quality metric |
Advanced use with quality scores |
Slot Status (Reliability)
Status |
Slots |
Notes |
|---|---|---|
Ready |
|
Production-ready, reliable data |
Caution |
|
May have gaps or provider issues |
Experimental |
|
Limited testing, may change |
Copy-Paste Slot Lists
For convenience, here are the slot names ready to copy:
All slots (comma-separated):
annual_precpt, annual_temp, elev, temp, air_temp, humidity, wind_speed, wind_direction, solar_irradiance, depth, ph, soil_type
Production-ready slots only:
annual_precpt, annual_temp, elev
Climate + Elevation (most common):
annual_precpt, annual_temp, elev
Checking Available Slots and Providers
from biosample_enricher.environmental_metadata import (
ALL_SUPPORTED_SLOTS,
CLIMATE_SLOTS,
WEATHER_SLOTS,
ELEVATION_SLOTS,
MARINE_SLOTS,
SOIL_SLOTS,
CLIMATE_PROVIDERS,
ELEVATION_PROVIDERS,
CONSENSUS_STRATEGIES,
)
# See all supported slots
print("All slots:", sorted(ALL_SUPPORTED_SLOTS))
# See slots by category
print("Climate:", sorted(CLIMATE_SLOTS))
print("Weather:", sorted(WEATHER_SLOTS))
print("Elevation:", sorted(ELEVATION_SLOTS))
print("Marine:", sorted(MARINE_SLOTS))
print("Soil:", sorted(SOIL_SLOTS))
# See available providers
print("Climate providers:", sorted(CLIMATE_PROVIDERS))
print("Elevation providers:", sorted(ELEVATION_PROVIDERS))
# See consensus strategies
print("Strategies:", sorted(CONSENSUS_STRATEGIES))
Limitations and Known Issues
Current Limitations
Limited slot coverage: Only 13 of ~200+ submission-schema slots are supported
No bulk operations: Must call once per biosample (no batch processing yet)
Weather data gaps: Historical weather not available for all locations/times
Provider reliability: Some providers have intermittent failures (see issues below)
No caching control: Cannot disable or clear HTTP cache from this function
Known Issues
Warning
Provider Reliability Issues - Please review these before relying on data:
Issue #181: Marine providers (GEBCO, ESA CCI, NOAA) incomplete/unreliable
Issue #182: MODIS vegetation provider uses mock data only
Issue #183: USGS elevation provider unreliable (marked flaky)
Issue #184: SoilGrids provider intermittent failures (marked flaky)
Climate and elevation data are generally reliable. Marine and soil data quality varies.
Future Development
These features are planned but not yet implemented:
More submission-schema slots (Issue #193)
Vegetation data from land cover (Issue #194)
Flooding history (Issue #192)
Batch processing for multiple biosamples
Quality scores and confidence intervals
Custom provider selection strategies beyond “consensus”
See Also
For Advanced Users:
Services - Low-level service APIs for more control
Providers - Individual provider documentation
Provider Reliability - Provider stability and quality metrics
For Understanding the Code:
API Index - Complete API reference
Architecture - System design and data flow
External Resources:
NMDC Submission Schema - Official schema documentation
GOLD Biosample Fields - Related biosample metadata standard