Services

Elevation Service

Main elevation service orchestrator.

class biosample_enricher.elevation.service.ElevationService(google_api_key=None, enable_google=True, enable_usgs=True, enable_osm=True, enable_open_topo_data=True, osm_endpoint='https://api.open-elevation.com/api/v1/lookup', open_topo_data_endpoint='https://api.opentopodata.org/v1')[source]

Bases: object

Orchestrates elevation lookups across multiple providers.

Parameters:
  • google_api_key (str | None)

  • enable_google (bool)

  • enable_usgs (bool)

  • enable_osm (bool)

  • enable_open_topo_data (bool)

  • osm_endpoint (str)

  • open_topo_data_endpoint (str)

__init__(google_api_key=None, enable_google=True, enable_usgs=True, enable_osm=True, enable_open_topo_data=True, osm_endpoint='https://api.open-elevation.com/api/v1/lookup', open_topo_data_endpoint='https://api.opentopodata.org/v1')[source]

Initialize the elevation service.

Parameters:
  • google_api_key (str | None) – Google API key (if None, reads from env)

  • enable_google (bool) – Whether to enable Google provider

  • enable_usgs (bool) – Whether to enable USGS provider

  • enable_osm (bool) – Whether to enable OSM provider

  • enable_open_topo_data (bool) – Whether to enable Open Topo Data provider

  • osm_endpoint (str) – OSM provider endpoint URL

  • open_topo_data_endpoint (str) – Open Topo Data endpoint URL

classmethod from_env()[source]

Create elevation service from environment variables.

Returns:

ElevationService – Configured elevation service

classify_coordinates(lat, lon)[source]

Classify coordinates for provider routing.

Parameters:
  • lat (float) – Latitude in decimal degrees

  • lon (float) – Longitude in decimal degrees

Returns:

CoordinateClassification – Coordinate classification

classify_biosample_location(lat, lon)[source]

Classify a biosample location for routing and metadata.

This method provides biosample-specific classification that can be stored with the sample metadata for efficient provider routing.

Parameters:
  • lat (float) – Latitude in decimal degrees

  • lon (float) – Longitude in decimal degrees

Returns:

dict – Dictionary with classification metadata

select_providers(classification, preferred=None)[source]

Select providers based on coordinate classification.

Parameters:
  • classification (CoordinateClassification) – Coordinate classification result

  • preferred (list[str] | None) – Preferred provider names in order

Returns:

list[ElevationProvider] – List of providers in priority order

get_elevation(request, *, read_from_cache=True, write_to_cache=True, timeout_s=20.0)[source]

Get elevation observations from multiple providers.

Parameters:
  • request (ElevationRequest) – Elevation request

  • read_from_cache (bool) – Whether to read from cache

  • write_to_cache (bool) – Whether to write to cache

  • timeout_s (float) – Request timeout in seconds

Returns:

list[Observation] – List of elevation observations

get_best_elevation(observations)[source]

Select the best elevation from multiple observations.

Parameters:

observations (list[Observation]) – List of elevation observations

Returns:

ElevationResult | None – Best elevation result, or None if no valid observations

create_output_envelope(subject_id, observations, read_from_cache=True, write_to_cache=True)[source]

Create output envelope with observations.

Parameters:
  • subject_id (str) – Subject identifier

  • observations (list[Observation]) – List of observations

  • read_from_cache (bool) – Whether cache was used for reading

  • write_to_cache (bool) – Whether cache was used for writing

Returns:

OutputEnvelope – Output envelope

Soil Service

Soil enrichment service orchestration.

class biosample_enricher.soil.service.SoilService[source]

Bases: object

Multi-provider soil enrichment service.

Orchestrates multiple soil data providers with intelligent cascading: - US locations: USDA NRCS SDA primary, SoilGrids fallback - Global locations: SoilGrids primary

Provides static soil site characterization including taxonomy, properties, and texture classification.

__init__()[source]

Initialize soil service with providers.

enrich_location(latitude, longitude, depth_cm='0-5cm')[source]

Enrich a single location with soil data.

Parameters:
  • latitude (float) – Latitude in decimal degrees

  • longitude (float) – Longitude in decimal degrees

  • depth_cm (str | None) – Depth interval (e.g., “0-5cm”, “5-15cm”)

Returns:

SoilResult – SoilResult with best available soil data

enrich_batch(locations, depth_cm='0-5cm')[source]

Enrich multiple locations with soil data.

Parameters:
  • locations (list[tuple[float, float]]) – List of (latitude, longitude) tuples

  • depth_cm (str | None) – Depth interval for all locations

Returns:

list[SoilResult] – List of SoilResult objects

enrich_biosample(sample_data)[source]

Enrich a single biosample with soil data.

Parameters:

sample_data (dict) – Biosample dictionary with location information

Returns:

dict – Original sample_data enhanced with soil enrichment

get_provider_status()[source]

Get status of all soil providers.

Returns:

dict[str, dict] – Dictionary mapping provider names to status information

Weather Service

Weather enrichment service for biosample environmental context.

Orchestrates multiple weather providers to deliver day-specific weather data with temporal precision tracking and standardized schema mapping.

class biosample_enricher.weather.service.WeatherService(providers=None)[source]

Bases: object

Multi-provider weather enrichment service for biosample metadata.

Provides day-specific weather data using a provider fallback chain with temporal precision tracking and standardized output schema.

Parameters:

providers (list[WeatherProviderBase] | None)

__init__(providers=None)[source]

Initialize weather service with provider chain.

Parameters:

providers (list[WeatherProviderBase] | None) – List of weather providers in priority order. If None, uses default Open-Meteo + MeteoStat providers.

get_weather_for_biosample(biosample, target_schema='nmdc')[source]

Get weather data for a biosample and map to target schema.

Parameters:
  • biosample (dict[str, Any]) – Biosample dictionary with location and collection date

  • target_schema (str) – “nmdc” or “gold” for schema mapping

Returns:

dict[str, Any] – Dict with weather enrichment results and schema-mapped fields

get_daily_weather(lat, lon, target_date, parameters=None)[source]

Get daily weather data by integrating results from all providers.

Parameters:
  • lat (float) – Latitude in decimal degrees

  • lon (float) – Longitude in decimal degrees

  • target_date (date) – Date for weather lookup

  • parameters (list[str] | None) – Optional list of specific parameters to fetch

Returns:

WeatherResult – WeatherResult with integrated data from all available providers

get_provider_info()[source]

Get information about all configured providers.

Return type:

list[dict[str, Any]]

Marine Service

Marine enrichment service for biosample oceanographic context.

Orchestrates multiple marine data providers to deliver comprehensive marine environmental data with quality tracking and standardized schema mapping.

class biosample_enricher.marine.service.MarineService(providers=None)[source]

Bases: object

Multi-provider marine enrichment service for biosample metadata.

Provides comprehensive oceanographic data using multiple providers with quality tracking and standardized output schema for marine samples.

Parameters:

providers (list[MarineProviderBase] | None)

__init__(providers=None)[source]

Initialize marine service with provider chain.

Parameters:

providers (list[MarineProviderBase] | None) – List of marine providers to use. If None, uses default OISST + GEBCO + ESA CCI providers.

get_marine_data_for_biosample(biosample, target_schema='nmdc')[source]

Get marine data for a biosample and map to target schema.

Parameters:
  • biosample (dict[str, Any]) – Biosample dictionary with location and collection date

  • target_schema (str) – Target schema format (“nmdc” or “gold”)

Returns:

dict[str, Any] – Dictionary with enrichment status, marine data, and schema mapping

get_comprehensive_marine_data(latitude, longitude, target_date)[source]

Get comprehensive marine data from all available providers.

Parameters:
  • latitude (float) – Latitude in decimal degrees

  • longitude (float) – Longitude in decimal degrees

  • target_date (date) – Date for marine data query

Returns:

MarineResult – MarineResult with combined data from all providers

Land Service

Land cover and vegetation enrichment service orchestration.

class biosample_enricher.land.service.LandService[source]

Bases: object

Multi-provider land cover and vegetation enrichment service.

Queries ALL available providers for comprehensive data coverage: - Land Cover: ESA WorldCover, NLCD (US), MODIS Land Cover, CGLS - Vegetation: MODIS NDVI/EVI/LAI/FPAR, VIIRS NDVI, Sentinel-2 (selective)

Returns full provenance with exact coordinates/dates from each provider.

__init__()[source]

Initialize land service with all providers.

enrich_location(latitude, longitude, target_date=None, time_window_days=16)[source]

Enrich a single location with land cover and vegetation data.

Parameters:
  • latitude (float) – Latitude in decimal degrees

  • longitude (float) – Longitude in decimal degrees

  • target_date (date | None) – Target date for temporal alignment

  • time_window_days (int) – Search window for vegetation indices

Returns:

LandResult – LandResult with data from ALL available providers

enrich_batch(locations, target_date=None, time_window_days=16)[source]

Enrich multiple locations with land cover and vegetation data.

Parameters:
  • locations (list[tuple[float, float]]) – List of (latitude, longitude) tuples

  • target_date (date | None) – Target date for all locations

  • time_window_days (int) – Search window for vegetation indices

Returns:

list[LandResult] – List of LandResult objects

enrich_biosample(sample_data)[source]

Enrich a single biosample with land cover and vegetation data.

Parameters:

sample_data (dict[str, Any]) – Biosample dictionary with location information

Returns:

dict[str, Any] – Original sample_data enhanced with land enrichment

get_provider_status()[source]

Get status of all land cover and vegetation providers.

Returns:

dict[str, dict[str, Any]] – Dictionary mapping provider names to status information

Forward Geocoding Service

Forward geocoding service for coordinating multiple providers (place names to coordinates).

class biosample_enricher.forward_geocoding.service.ForwardGeocodingService[source]

Bases: object

Service for managing forward geocoding providers (place names to coordinates).

__init__()[source]

Initialize the forward geocoding service.

get_available_providers()[source]

Get list of available provider names.

Return type:

list[str]

get_provider(name)[source]

Get a specific provider by name.

Parameters:

name (str)

Return type:

ForwardGeocodingProvider | None

get_provider_status()[source]

Get status information for all providers.

Return type:

dict[str, dict[str, Any]]

geocode(query, provider=None, *, read_from_cache=True, write_to_cache=True, timeout_s=30.0, language='en', country_codes=None, max_results=10)[source]

Perform forward geocoding to convert place name to coordinates.

Parameters:
  • query (str) – Place name or address to search for

  • provider (str | None) – Provider name (None for auto-selection)

  • read_from_cache (bool) – Whether to read from cache

  • write_to_cache (bool) – Whether to write to cache

  • timeout_s (float) – Request timeout in seconds

  • language (str) – Language code for results

  • country_codes (list[str] | None) – List of ISO country codes to restrict search

  • max_results (int) – Maximum number of results

Returns:

ForwardGeocodeResult | None – Forward geocoding result or None if failed

geocode_multiple(query, providers=None, *, read_from_cache=True, write_to_cache=True, timeout_s=30.0, language='en', country_codes=None, max_results=5)[source]

Perform forward geocoding using multiple providers for comparison.

Parameters:
  • query (str) – Place name or address to search for

  • providers (list[str] | None) – List of provider names (None for all available)

  • read_from_cache (bool) – Whether to read from cache

  • write_to_cache (bool) – Whether to write to cache

  • timeout_s (float) – Request timeout in seconds

  • language (str) – Language code for results

  • country_codes (list[str] | None) – List of ISO country codes to restrict search

  • max_results (int) – Maximum results per provider

Returns:

dict[str, ForwardGeocodeResult] – Dictionary mapping provider names to results

get_coordinates_for_place(place_name, prefer_provider=None, language='en', country_hint=None)[source]

Get coordinates and enrichment data for a biosample place name.

This is the main method for biosample enrichment - converts place names from metadata into precise coordinates.

Parameters:
  • place_name (str) – Name of place/location from biosample metadata

  • prefer_provider (str | None) – Preferred provider name

  • language (str) – Language code for results

  • country_hint (str | None) – ISO country code hint for better results

Returns:

dict[str, Any] – Dictionary with coordinates and administrative information

Reverse Geocoding Service

Reverse geocoding service for coordinating multiple providers.

class biosample_enricher.reverse_geocoding.service.ReverseGeocodingService[source]

Bases: object

Service for managing reverse geocoding providers.

__init__()[source]

Initialize the reverse geocoding service.

get_available_providers()[source]

Get list of available provider names.

Return type:

list[str]

get_provider(name)[source]

Get a specific provider by name.

Parameters:

name (str) – Provider name

Returns:

ReverseGeocodingProvider | None – Provider instance or None if not found

reverse_geocode(lat, lon, provider=None, *, read_from_cache=True, write_to_cache=True, timeout_s=20.0, language='en', limit=10)[source]

Perform reverse geocoding using specified or default provider.

Parameters:
  • lat (float) – Latitude in decimal degrees

  • lon (float) – Longitude in decimal degrees

  • provider (str | None) – Provider name (None for auto-selection)

  • read_from_cache (bool) – Whether to read from cache

  • write_to_cache (bool) – Whether to write to cache

  • timeout_s (float) – Request timeout in seconds

  • language (str) – Language code for results

  • limit (int) – Maximum number of results

Returns:

ReverseGeocodeResult | None – Reverse geocoding result or None if failed

reverse_geocode_multiple(lat, lon, providers=None, *, read_from_cache=True, write_to_cache=True, timeout_s=20.0, language='en', limit=10)[source]

Perform reverse geocoding using multiple providers sequentially.

Parameters:
  • lat (float) – Latitude in decimal degrees

  • lon (float) – Longitude in decimal degrees

  • providers (list[str] | None) – List of provider names (None for all available)

  • read_from_cache (bool) – Whether to read from cache

  • write_to_cache (bool) – Whether to write to cache

  • timeout_s (float) – Request timeout in seconds

  • language (str) – Language code for results

  • limit (int) – Maximum number of results per provider

Returns:

dict[str, ReverseGeocodeResult] – Dictionary mapping provider names to results

compare_providers(lat, lon, *, language='en', limit=5)[source]

Compare results from all available providers.

Parameters:
  • lat (float) – Latitude in decimal degrees

  • lon (float) – Longitude in decimal degrees

  • language (str) – Language code for results

  • limit (int) – Maximum number of results per provider

Returns:

dict[str, Any] – Comparison dictionary with results and analysis

OSM Features Service

OSM geographic features enrichment service.

class biosample_enricher.osm_features.service.OSMFeaturesService(default_radius_m=1000, enable_google=True)[source]

Bases: object

Service for enriching locations with geographic features from multiple providers.

Parameters:
  • default_radius_m (int)

  • enable_google (bool)

__init__(default_radius_m=1000, enable_google=True)[source]

Initialize geographic features service.

Parameters:
  • default_radius_m (int) – Default search radius in meters

  • enable_google (bool) – Whether to enable Google Places provider if available

get_combined_features_for_location(latitude, longitude, radius_m=None, timeout_s=180)[source]

Get geographic features from both OSM and Google Places providers.

Parameters:
  • latitude (float) – Latitude coordinate

  • longitude (float) – Longitude coordinate

  • radius_m (int | None) – Search radius in meters (uses default if None)

  • timeout_s (int) – Query timeout in seconds

Returns:

CombinedFeaturesResult – Combined features result from both providers

get_features_for_location(latitude, longitude, radius_m=None, timeout_s=180)[source]

Get geographic features around a location from OSM only (for backward compatibility).

Parameters:
  • latitude (float) – Latitude coordinate

  • longitude (float) – Longitude coordinate

  • radius_m (int | None) – Search radius in meters (uses default if None)

  • timeout_s (int) – Query timeout in seconds

Returns:

OSMFeaturesResult | None – OSM features result or None if failed

enrich_biosample_location(latitude, longitude, radius_m=None, timeout_s=180, use_combined=True)[source]

Enrich a biosample location with geographic features from multiple providers.

Parameters:
  • latitude (float) – Latitude coordinate

  • longitude (float) – Longitude coordinate

  • radius_m (int | None) – Search radius in meters (uses default if None)

  • timeout_s (int) – Query timeout in seconds

  • use_combined (bool) – Whether to use combined providers (True) or OSM only (False)

Returns:

dict[str, Any] – Dictionary suitable for biosample enrichment

get_features_for_biosample(biosample, radius_m=1000, timeout_s=180)[source]

Get OSM features for a biosample dictionary.

Parameters:
  • biosample (dict[str, Any]) – Biosample data dictionary

  • radius_m (int) – Search radius in meters

  • timeout_s (int) – Query timeout in seconds

Returns:

dict[str, Any] – Enrichment result dictionary

get_provider_status()[source]

Get status of all geographic features providers.

Return type:

dict[str, Any]

batch_enrich_locations(locations, radius_m=1000, timeout_s=180)[source]

Batch enrich multiple locations with OSM features.

Parameters:
  • locations (list[tuple[float, float]]) – List of (latitude, longitude) tuples

  • radius_m (int) – Search radius in meters

  • timeout_s (int) – Query timeout in seconds per location

Returns:

list[dict[str, Any]] – List of enrichment dictionaries