Architecture
Core Design
The biosample-enricher architecture focuses on one primary use case: retrieving environmental metadata from geographic coordinates.
Key Components
get_environmental_metadata()
The main entry point that orchestrates all data retrieval:
from biosample_enricher.environmental_metadata import get_environmental_metadata
result = get_environmental_metadata(
lat=37.7749,
lon=-122.4194,
slots=["annual_precpt", "annual_temp"]
)
See Environmental Metadata for complete documentation.
Multi-Provider System
Each data type (climate, elevation, etc.) has multiple providers:
Climate normals: meteostat, nasa_power
Elevation: USGS, Google, Open Topo Data, OSM
Weather: meteostat, open-meteo
Marine: GEBCO, ESA CCI, NOAA
The system automatically:
Queries multiple providers in parallel
Validates and normalizes responses
Computes consensus values (median or mean)
Returns metadata about which providers were used
See Providers for provider comparison.
HTTP Caching
All external API calls go through a centralized caching layer using requests-cache:
MongoDB primary backend (with SQLite fallback)
Coordinate canonicalization (4 decimal places)
Configurable cache control via
read_from_cache/write_to_cacheparametersAutomatic TTL management
Located in: biosample_enricher/http_cache.py
Data Flow
User calls
get_environmental_metadata(lat, lon, slots)Dispatcher routes each slot to appropriate service (climate, elevation, etc.)
Service queries multiple providers via cached HTTP client
Providers return data in standardized format
Aggregator computes consensus values
Result returns values + metadata
Example result structure:
{
"values": {
"annual_precpt": 519.3,
"annual_temp": 14.1
},
"metadata": {
"climate_normals": {
"providers_used": ["meteostat", "nasa_power"],
"provider_results": {
"meteostat": {"annual_precpt": 520.1, "annual_temp": 14.0},
"nasa_power": {"annual_precpt": 518.5, "annual_temp": 14.2}
}
}
}
}
Design Principles
One way to do it:
get_environmental_metadata()is THE functionFail gracefully: Missing providers don’t break the system
Cache aggressively: Minimize API calls
Type safety: Full type annotations with mypy strict mode
Test thoroughly: Unit, integration, and network test categories
Future Development
Archived code (in archived/ directory) includes:
MongoDB biosample adapters
Service-specific CLIs
Metrics evaluation framework
Demo scripts
These may be restored when needed. See archived/README.md for restoration instructions.