Source code for biosample_enricher.environmental_metadata

"""
Get environmental metadata for geographic coordinates.

This module provides a single entry point for retrieving environmental data
from multiple provider services. It returns values in standardized formats
suitable for NMDC submissions and other applications requiring location-based
environmental metadata.

Supported Slots
---------------

Climate (Multi-year averages, from Meteostat 30-year normals 1991-2020):
    annual_precpt (float, mm):
        Mean annual precipitation. Average of all annual precipitation values
        or estimated equivalent from regional indexes or Isohyetal maps.
        Requires: lat, lon
        Provider: MeteostatProvider.get_climate_normals()
        Note: 30-year average, does NOT require datetime

    annual_temp (float, °C):
        Mean annual temperature averaged over 30-year period.
        Requires: lat, lon
        Provider: MeteostatProvider.get_climate_normals()
        Note: 30-year average, does NOT require datetime

Weather (Point-in-time observations, from Meteostat/Open-Meteo):
    temp (float, °C):
        Temperature at the time of sampling.
        Requires: lat, lon, datetime (collection date/time)
        Provider: WeatherService.get_daily_weather()
        Note: Ideally as close to collection time as possible

    air_temp (float, °C):
        Air temperature at the time of sampling. Same as temp for atmospheric samples.
        Requires: lat, lon, datetime (collection date/time)
        Provider: WeatherService.get_daily_weather()
        Note: Ideally as close to collection time as possible

    humidity (string, "X.X g/m3"):
        Amount of water vapor in the air at time of sampling.
        Requires: lat, lon, datetime (collection date/time)
        Provider: WeatherService.get_daily_weather()
        Note: Returns string with unit, e.g., "15.2 g/m3"

    wind_speed (string, "X.X m/s"):
        Speed of wind measured at the time of sampling.
        Requires: lat, lon, datetime (collection date/time)
        Provider: WeatherService.get_daily_weather()
        Note: Returns string with unit, e.g., "5.5 m/s"

    wind_direction (string):
        Direction from which wind originates.
        Requires: lat, lon, datetime (collection date/time)
        Provider: WeatherService.get_daily_weather()
        Note: Returns string with degrees, e.g., "245 degrees"

    solar_irradiance (string, "X.X W/m²"):
        Amount of solar energy arriving at a surface area during time interval.
        Requires: lat, lon, datetime (collection date/time)
        Provider: WeatherService.get_daily_weather()
        Note: Returns string with unit, e.g., "850.5 W/m²"

Elevation/Topography (from USGS, Open Topo Data, Google):
    elev (float, m):
        Elevation (height above mean sea level) of the sampling site in meters.
        Used for points on earth's surface (terrestrial, aquatic sampling sites).
        Requires: lat, lon
        Provider: ElevationService.get_elevation()
        Note: For ground surface elevations only. Cannot determine altitude of
              airborne samples (aircraft, balloons) from lat/lon alone.

Marine/Bathymetry (from GEBCO, ESA CCI, NOAA):
    depth (string, "X.X m"):
        Vertical distance below local surface. For marine samples, this is
        water depth (bathymetry). For terrestrial subsurface samples, this is
        soil depth and must be measured, not inferred.
        Requires: lat, lon
        Provider: MarineService.get_bathymetry()
        Note: Returns bathymetry only (ocean floor depth). Does NOT return
              soil depth or sampling depth within water column.

Soil Properties (from SoilGrids, USDA NRCS):
    ph (float):
        pH measurement of sample, liquid portion, or aqueous phase.
        Requires: lat, lon
        Provider: SoilService.get_soil_properties()
        Note: Returns surface (0-5cm) pH value

    soil_type (string):
        Description of soil type or classification (ENVO terms preferred).
        Requires: lat, lon
        Provider: SoilService.get_soil_properties()
        Example: "plinthosol [ENVO:00002250]"

Land Cover/Vegetation (from ESA WorldCover, MODIS, NLCD):
    cur_vegetation (string):
        Current vegetation classification from standard systems or agricultural crop.
        Requires: lat, lon
        Provider: LandService (future)
        Status: Placeholder - needs implementation
        Examples: "deciduous forest", "Bauhinia variegata"

Flooding (from USGS, NOAA):
    flooding (string):
        Historical and/or physical evidence of flooding with dates.
        Requires: lat, lon
        Provider: FloodingService (future - Issue #192)
        Status: Placeholder - needs research
        Format: "YYYY-MM-DD" or "YYYY-MM to YYYY-MM"

Usage Example
-------------
    >>> from datetime import datetime
    >>> from biosample_enricher.environmental_metadata import get_environmental_metadata
    >>>
    >>> # Get annual climate values (no datetime needed)
    >>> values = get_environmental_metadata(
    ...     lat=37.7749,
    ...     lon=-122.4194,
    ...     slots=["annual_precpt", "annual_temp", "elev"]
    ... )
    >>> print(values)
    {
        "annual_precpt": 453.1,    # mm/year (30-year average)
        "annual_temp": 14.6,        # °C (30-year average)
        "elev": 52.0                # m above sea level
    }
    >>>
    >>> # Get day-specific weather values (datetime required)
    >>> values = get_environmental_metadata(
    ...     lat=37.7749,
    ...     lon=-122.4194,
    ...     slots=["temp", "humidity", "wind_speed"],
    ...     datetime_obj=datetime(2023, 7, 15, 14, 30)
    ... )
    >>> print(values)
    {
        "temp": 22.3,              # °C on 2023-07-15
        "humidity": "12.5 g/m3",   # at time of sampling
        "wind_speed": "5.2 m/s"    # at time of sampling
    }
    >>>
    >>> # Mix of annual and day-specific values
    >>> values = get_environmental_metadata(
    ...     lat=40.7128,
    ...     lon=-74.0060,
    ...     slots=["annual_precpt", "temp", "elev", "ph"],
    ...     datetime_obj=datetime(2023, 7, 15)
    ... )
    >>> print(values)
    {
        "annual_precpt": 1268.4,   # 30-year average
        "temp": 28.1,              # Day-specific temperature
        "elev": 10.0,              # Elevation
        "ph": 6.2                  # Surface soil pH
    }

Implementation Details
----------------------
This function:
1. Groups requested slots by the service they require (weather, elevation, soil, marine)
2. Makes ONE call per service to fetch all needed data efficiently
3. Extracts specific slot values in submission-schema format and units
4. Returns only successfully retrieved values (missing/failed slots are omitted)
5. Handles errors gracefully - partial success is allowed

Data Sources:
- Weather: Meteostat (station-based historical), Open-Meteo (gridded reanalysis)
- Elevation: USGS 3DEP, Open Topo Data, Google Elevation API
- Soil: SoilGrids (global 250m), USDA NRCS (US only)
- Marine: GEBCO bathymetry, ESA CCI ocean color, NOAA OISST

Units and Formats:
All values are returned in the units specified by submission-schema:
- Temperatures: degrees Celsius (float)
- Precipitation: millimeters (float)
- Distances: meters (float or string with "m")
- Wind speed: meters per second (string with "m/s")
- Humidity: grams per cubic meter (string with "g/m3")
- pH: unitless (float, 0-14 scale)

Known Limitations and Future Work:
1. **Sample Type Validation**: Currently does NOT validate whether requested slots
   are appropriate for the sample location. For example:
   - Will return soil pH/soil_type even for ocean locations
   - Will return bathymetric depth even for terrestrial locations
   - Future work needed to classify locations as: terrestrial soil, inland/freshwater,
     coastal, or open ocean, and validate slot requests accordingly.

2. **Unsupported Slots Requiring Measured Data**:
   - alt (altitude): For airborne samples (aircraft, balloons) - requires measurement
   - salinity: Requires water sample analysis or oceanographic models - not yet implemented
   - cur_vegetation: Requires land cover classification - not yet implemented
   - flooding: Requires historical flood data - not yet implemented (Issue #192)

3. **Depth Interpretation**: The 'depth' slot currently returns bathymetry (ocean floor
   depth) for marine locations. It does NOT return:
   - Soil sampling depth (must be measured by user)
   - Water column sampling depth (must be measured by user)
"""

from datetime import datetime
from typing import Any

from biosample_enricher.consensus import ConsensusStrategy, compute_consensus
from biosample_enricher.elevation.service import ElevationService
from biosample_enricher.logging_config import get_logger
from biosample_enricher.marine.service import MarineService
from biosample_enricher.models import ElevationRequest
from biosample_enricher.soil.service import SoilService
from biosample_enricher.weather.service import WeatherService

logger = get_logger(__name__)

__all__ = [
    "get_environmental_metadata",
    "CLIMATE_SLOTS",
    "WEATHER_SLOTS",
    "ELEVATION_SLOTS",
    "MARINE_SLOTS",
    "SOIL_SLOTS",
    "ALL_SUPPORTED_SLOTS",
    "CLIMATE_PROVIDERS",
    "ELEVATION_PROVIDERS",
    "CONSENSUS_STRATEGIES",
]

# Available consensus strategies for combining multi-provider values
# Dynamically generated from ConsensusStrategy enum to avoid duplication
CONSENSUS_STRATEGIES = frozenset(s.value for s in ConsensusStrategy)

# Supported submission schema slots
CLIMATE_SLOTS = frozenset(["annual_precpt", "annual_temp"])
WEATHER_SLOTS = frozenset(
    ["temp", "air_temp", "humidity", "wind_speed", "wind_direction", "solar_irradiance"]
)
ELEVATION_SLOTS = frozenset(["elev"])
MARINE_SLOTS = frozenset(["depth"])
SOIL_SLOTS = frozenset(["ph", "soil_type"])

ALL_SUPPORTED_SLOTS = (
    CLIMATE_SLOTS | WEATHER_SLOTS | ELEVATION_SLOTS | MARINE_SLOTS | SOIL_SLOTS
)

# Available providers by slot category
CLIMATE_PROVIDERS = frozenset(["meteostat", "nasa_power"])
ELEVATION_PROVIDERS = frozenset(["usgs", "google", "open_topo_data", "osm"])



[docs]
def get_environmental_metadata(
    lat: float,
    lon: float,
    slots: list[str],
    datetime_obj: datetime | None = None,
    providers: list[str] | None = None,
    strategy: str = "mean",
) -> dict[str, Any]:
    """
    Get environmental metadata for geographic coordinates.

    Retrieves environmental data from multiple provider services and returns
    values in standardized formats. Supports climate, elevation, weather,
    marine, and soil data.

    ALWAYS returns provider metadata for transparency - showing which providers
    contributed to each value and enabling quality checking by comparing sources.

    Args:
        lat: Latitude in decimal degrees (-90 to 90)
        lon: Longitude in decimal degrees (-180 to 180)
        slots: List of submission-schema slot names to retrieve.
               Must be from ALL_SUPPORTED_SLOTS constant.
               Supported slots are organized by category:
               - CLIMATE_SLOTS: annual_precpt, annual_temp
               - WEATHER_SLOTS: temp, air_temp, humidity, wind_speed, wind_direction, solar_irradiance
               - ELEVATION_SLOTS: elev
               - MARINE_SLOTS: depth
               - SOIL_SLOTS: ph, soil_type
               Examples: ["annual_precpt", "annual_temp", "temp", "elev"]
        datetime_obj: Optional datetime for temporal data (collection date/time).
                     Required for WEATHER_SLOTS (temp, air_temp, humidity, wind_speed, wind_direction, solar_irradiance)
                     Not used for CLIMATE_SLOTS, ELEVATION_SLOTS, MARINE_SLOTS, SOIL_SLOTS
        providers: Optional list of specific provider names to use (filters available providers).
                  For climate data (CLIMATE_SLOTS): Must be from CLIMATE_PROVIDERS ("meteostat", "nasa_power")
                  For elevation data (ELEVATION_SLOTS): Must be from ELEVATION_PROVIDERS ("usgs", "google", "open_topo_data", "osm")
                  If None, all available providers are queried.
        strategy: How to combine values from multiple providers (default: "mean"):
                 - "mean": Average across all successful providers
                 - "median": Middle value (robust to outliers)
                 - "first": Use first successful provider in priority order
                 - "best_quality": Use provider with best quality metric (e.g., closest station, highest resolution)

    Returns:
        Dict with two keys:
        - "values": Dict mapping slot names to their values in submission-schema format.
            Values are in the correct units as specified by submission-schema:
            - float: annual_precpt (mm), annual_temp (°C), temp (°C), air_temp (°C), elev (m), ph
            - string: humidity (g/m3), wind_speed (m/s), wind_direction, solar_irradiance, depth (m), soil_type
            Missing or failed slots are omitted (not included, not set to None).

        - "metadata": Dict with provider information for transparency:
            - "climate_normals": Provider details for annual_precpt/annual_temp (if requested)
              - "providers_used": List of provider names that contributed data
              - "consensus_strategy": How values were combined ("mean", "median", "first", "best_quality")
              - "provider_results": Dict of {provider_name: {annual_precpt, annual_temp, period, ...}}
              - "failed_providers": Dict of {provider_name: error_message}
            - "elevation": Provider details for elev (if requested)
              - "providers_used": List of provider names that contributed data
              - "consensus_strategy": How values were combined
              - "provider_results": Dict of {provider_name: {elevation_m, resolution_m, ...}}
              - "failed_providers": Dict of {provider_name: error_message}
            - "weather": Provider details for temp/humidity/etc. (future)
            - "marine": Provider details for depth (future)
            - "soil": Provider details for ph/soil_type (future)

    Raises:
        ValueError: If lat/lon are out of valid ranges
        ValueError: If slots list is empty
        ValueError: If an unsupported slot name is provided

    Example:
        >>> from datetime import datetime
        >>> result = get_environmental_metadata(
        ...     lat=37.7749,
        ...     lon=-122.4194,
        ...     slots=["annual_precpt", "annual_temp"],
        ... )
        >>>
        >>> # Consensus values (averaged across providers)
        >>> print(result["values"])
        {"annual_precpt": 519.3, "annual_temp": 14.1}
        >>>
        >>> # See which providers contributed
        >>> print(result["metadata"]["climate_normals"]["providers_used"])
        ["meteostat", "nasa_power"]
        >>>
        >>> # Compare individual provider results
        >>> for provider, data in result["metadata"]["climate_normals"]["provider_results"].items():
        ...     print(f"{provider}: {data['annual_precpt']:.1f} mm/year ({data['period']})")
        meteostat: 453.1 mm/year (1991-2020)
        nasa_power: 585.5 mm/year (2001-2020)
    """
    # Validate inputs
    if not -90 <= lat <= 90:
        raise ValueError(f"Latitude must be between -90 and 90, got {lat}")
    if not -180 <= lon <= 180:
        raise ValueError(f"Longitude must be between -180 and 180, got {lon}")
    if not slots:
        raise ValueError("slots list cannot be empty")

    # Validate slot names
    invalid_slots = set(slots) - ALL_SUPPORTED_SLOTS
    if invalid_slots:
        raise ValueError(
            f"Unsupported slot(s): {sorted(invalid_slots)}. "
            f"Supported slots: {sorted(ALL_SUPPORTED_SLOTS)}"
        )

    # Validate providers if specified
    if providers is not None:
        requesting_climate = bool(set(slots) & CLIMATE_SLOTS)
        requesting_elevation = bool(set(slots) & ELEVATION_SLOTS)

        # Build set of valid providers based on requested slots
        valid_providers: set[str] = set()
        if requesting_climate:
            valid_providers |= CLIMATE_PROVIDERS
        if requesting_elevation:
            valid_providers |= ELEVATION_PROVIDERS

        if valid_providers:
            invalid_providers = set(providers) - valid_providers
            if invalid_providers:
                raise ValueError(
                    f"Invalid provider(s): {sorted(invalid_providers)}. "
                    f"Valid providers for requested slots: {sorted(valid_providers)}"
                )

    logger.info(
        f"Getting environmental metadata for {len(slots)} slots at ({lat}, {lon})"
    )

    result = {}

    # Group slots by the service they require (use module-level constants)
    weather_slots = CLIMATE_SLOTS | WEATHER_SLOTS
    elevation_slots = ELEVATION_SLOTS
    marine_slots = MARINE_SLOTS
    soil_slots = SOIL_SLOTS
    land_slots = {"cur_vegetation"}
    flooding_slots = {"flooding"}

    requested_weather_slots = [s for s in slots if s in weather_slots]
    requested_elevation_slots = [s for s in slots if s in elevation_slots]
    requested_marine_slots = [s for s in slots if s in marine_slots]
    requested_soil_slots = [s for s in slots if s in soil_slots]
    requested_land_slots = [s for s in slots if s in land_slots]
    requested_flooding_slots = [s for s in slots if s in flooding_slots]

    # Collect metadata if requested
    all_metadata: dict[str, Any] = {}

    # Fetch weather data if needed (includes climate normals)
    if requested_weather_slots:
        try:
            weather_service = WeatherService()
            weather_values, weather_metadata = _get_weather_values(
                weather_service,
                lat,
                lon,
                requested_weather_slots,
                datetime_obj,
                providers,
                strategy,
            )
            result.update(weather_values)
            all_metadata.update(weather_metadata)
        except Exception as e:
            logger.error(f"Failed to get weather values: {e}")

    # Fetch elevation data if needed
    if requested_elevation_slots:
        try:
            elevation_service = ElevationService()
            elevation_values, elevation_metadata = _get_elevation_values(
                elevation_service,
                lat,
                lon,
                requested_elevation_slots,
                providers,
                strategy,
            )
            result.update(elevation_values)
            all_metadata.update(elevation_metadata)
        except Exception as e:
            logger.error(f"Failed to get elevation values: {e}")

    # Fetch marine data if needed
    if requested_marine_slots:
        try:
            marine_service = MarineService()
            marine_values = _get_marine_values(
                marine_service, lat, lon, requested_marine_slots, providers
            )
            result.update(marine_values)
        except Exception as e:
            logger.error(f"Failed to get marine values: {e}")

    # Fetch soil data if needed
    if requested_soil_slots:
        try:
            soil_service = SoilService()
            soil_values = _get_soil_values(
                soil_service, lat, lon, requested_soil_slots, providers
            )
            result.update(soil_values)
        except Exception as e:
            logger.error(f"Failed to get soil values: {e}")

    # Placeholder for land cover (not yet implemented)
    if requested_land_slots:
        logger.warning(
            f"Land cover slots {requested_land_slots} not yet implemented (Issue TBD)"
        )

    # Placeholder for flooding (not yet implemented)
    if requested_flooding_slots:
        logger.warning(
            f"Flooding slots {requested_flooding_slots} not yet implemented (Issue #192)"
        )

    logger.info(f"Successfully retrieved {len(result)}/{len(slots)} slot values")

    # Always return values with metadata for transparency
    return {"values": result, "metadata": all_metadata}



def _get_weather_values(
    service: WeatherService,
    lat: float,
    lon: float,
    slots: list[str],
    datetime_obj: datetime | None,
    providers: list[str] | None,
    strategy: str = "mean",
) -> tuple[dict[str, Any], dict[str, Any]]:
    """
    Extract weather-related slot values.

    Args:
        service: WeatherService instance
        lat: Latitude in decimal degrees
        lon: Longitude in decimal degrees
        slots: List of requested slot names
        datetime_obj: Optional datetime for weather data
        providers: Optional list of preferred provider names
        strategy: Consensus strategy - "mean", "median", "first", "best_quality"
                 Default is "mean" (consistent with get_environmental_metadata)

    Returns:
        Tuple of (values_dict, metadata_dict) where metadata_dict contains
        provider information for transparency.
    """
    values: dict[str, Any] = {}
    metadata: dict[str, Any] = {}

    # Determine which data to fetch
    needs_climate_normals = any(s in ["annual_precpt", "annual_temp"] for s in slots)
    needs_daily_weather = any(
        s
        in [
            "temp",
            "air_temp",
            "humidity",
            "wind_speed",
            "wind_direction",
            "solar_irradiance",
        ]
        for s in slots
    )

    # Get climate normals for annual values
    if needs_climate_normals:
        try:
            # Get results from all providers (MultiProviderClimateNormals)
            normals = service.get_climate_normals(lat, lon, providers=providers)

            # Use the specified strategy to combine provider values
            schema_values = normals.to_submission_schema(strategy=strategy)

            if "annual_precpt" in slots:
                annual_precip = schema_values.get("annual_precpt")
                if annual_precip is not None:
                    values["annual_precpt"] = annual_precip  # mm

            if "annual_temp" in slots:
                annual_temp = schema_values.get("annual_temp")
                if annual_temp is not None:
                    values["annual_temp"] = annual_temp  # °C

            # Build metadata about providers
            provider_results = {}
            for provider_name in normals.successful_providers:
                result = normals.get_provider_result(provider_name)
                if result:
                    provider_results[provider_name] = {
                        "annual_precpt": result.get_annual_precipitation(),
                        "annual_temp": result.get_annual_temperature(),
                        "returned_start_year": result.normals_period[0],
                        "returned_end_year": result.normals_period[1],
                        "station_distance_km": result.station_distance_km
                        if result.station_distance_km > 0
                        else None,
                    }

            metadata["climate_normals"] = {
                "providers_used": normals.successful_providers,
                "consensus_strategy": strategy,
                "requested_start_year": normals.requested_start_year,
                "requested_end_year": normals.requested_end_year,
                "provider_results": provider_results,
                "failed_providers": normals.failed_providers,
            }

        except Exception as e:
            logger.warning(f"Failed to get climate normals: {e}")

    # Get daily weather if date provided
    if needs_daily_weather:
        if not datetime_obj:
            logger.warning(
                f"datetime_obj required for slots {[s for s in slots if s in ['temp', 'air_temp', 'humidity', 'wind_speed', 'wind_direction', 'solar_irradiance']]} but not provided"
            )
        else:
            try:
                weather_result = service.get_daily_weather(
                    lat, lon, datetime_obj.date(), parameters=None
                )

                if (
                    "temp" in slots or "air_temp" in slots
                ) and weather_result.temperature:
                    temp_value = weather_result.temperature.value
                    if isinstance(temp_value, dict):
                        temp = temp_value.get("avg")
                    else:
                        temp = temp_value
                    if temp is not None:
                        if "temp" in slots:
                            values["temp"] = temp  # °C
                        if "air_temp" in slots:
                            values["air_temp"] = temp  # °C

                if "humidity" in slots and weather_result.humidity:
                    humidity_value = weather_result.humidity.value
                    if isinstance(humidity_value, dict):
                        humidity = humidity_value.get("avg")
                    else:
                        humidity = humidity_value
                    if humidity is not None:
                        # Convert % to g/m3 if needed (depends on provider)
                        values["humidity"] = f"{humidity} g/m3"  # string format

                if "wind_speed" in slots and weather_result.wind_speed:
                    wind_speed_value = weather_result.wind_speed.value
                    if isinstance(wind_speed_value, dict):
                        wind_speed = wind_speed_value.get("avg")
                    else:
                        wind_speed = wind_speed_value
                    if wind_speed is not None:
                        # Convert to m/s if in km/h
                        unit = weather_result.wind_speed.unit
                        if unit == "km/h":
                            wind_speed = wind_speed / 3.6
                        values["wind_speed"] = f"{wind_speed:.1f} m/s"  # string format

                if "wind_direction" in slots and weather_result.wind_direction:
                    wind_dir = weather_result.wind_direction.value
                    if wind_dir is not None:
                        values["wind_direction"] = (
                            f"{wind_dir} degrees"  # string format
                        )

                if "solar_irradiance" in slots and weather_result.solar_radiation:
                    solar_value = weather_result.solar_radiation.value
                    if isinstance(solar_value, dict):
                        solar = solar_value.get("daily_avg")
                    else:
                        solar = solar_value
                    if solar is not None:
                        values["solar_irradiance"] = f"{solar} W/m²"  # string format

            except Exception as e:
                logger.warning(f"Failed to get daily weather: {e}")

    return values, metadata


def _get_elevation_values(
    service: ElevationService,
    lat: float,
    lon: float,
    slots: list[str],
    providers: list[str] | None,
    strategy: str = "mean",
) -> tuple[dict[str, Any], dict[str, Any]]:
    """
    Extract elevation-related slot values with metadata.

    Uses the shared consensus module to combine values from multiple providers.

    Note: Only retrieves ground surface elevation (elev). Does NOT support
    altitude (alt) for airborne samples, as that cannot be determined from
    lat/lon alone and requires actual measurement from the sampling platform.

    Args:
        service: ElevationService instance
        lat: Latitude in decimal degrees
        lon: Longitude in decimal degrees
        slots: List of requested slot names
        providers: Optional list of preferred provider names
        strategy: Consensus strategy - "mean", "median", "first", "best_quality"
                 Default is "mean" (consistent with get_environmental_metadata)

    Returns:
        Tuple of (values_dict, metadata_dict)
    """
    values: dict[str, Any] = {}
    metadata: dict[str, Any] = {}

    try:
        # Create ElevationRequest with preferred providers if specified
        elevation_providers = None
        if providers is not None:
            # Filter to only elevation-relevant providers
            elevation_providers = [p for p in providers if p in ELEVATION_PROVIDERS]

        request = ElevationRequest(
            latitude=lat,
            longitude=lon,
            preferred_providers=elevation_providers,
        )
        observations = service.get_elevation(request)

        # Collect data from all observations
        provider_results: dict[str, Any] = {}
        provider_elevations: dict[str, float | None] = {}
        quality_scores: dict[str, float] = {}  # For best_quality strategy
        failed_providers: dict[str, str] = {}

        for obs in observations:
            provider_name = obs.provider.name
            if obs.value_numeric is not None:
                provider_results[provider_name] = {
                    "elevation_m": obs.value_numeric,
                    "resolution_m": obs.spatial_resolution_m,
                    "distance_to_input_m": obs.distance_to_input_m,
                    "vertical_datum": obs.vertical_datum,
                }
                provider_elevations[provider_name] = obs.value_numeric
                # Use resolution as quality metric (lower is better)
                if obs.spatial_resolution_m is not None:
                    quality_scores[provider_name] = obs.spatial_resolution_m
            elif obs.error_message:
                failed_providers[provider_name] = obs.error_message

        # Use shared consensus module to compute final value
        consensus_result = compute_consensus(
            provider_elevations,
            strategy=strategy,
            quality_scores=quality_scores if strategy == "best_quality" else None,
            lower_is_better=True,  # Lower resolution = better
        )

        # Set value if we have one
        if "elev" in slots and consensus_result["value"] is not None:
            values["elev"] = consensus_result["value"]

        # Build metadata
        metadata["elevation"] = {
            "providers_used": consensus_result["providers_used"],
            "consensus_strategy": consensus_result["strategy"],
            "provider_results": provider_results,
            "failed_providers": failed_providers,
        }

    except Exception as e:
        logger.warning(f"Failed to get elevation: {e}")
        metadata["elevation"] = {
            "providers_used": [],
            "consensus_strategy": strategy,
            "provider_results": {},
            "failed_providers": {"error": str(e)},
        }

    return values, metadata


def _get_marine_values(
    service: MarineService,
    lat: float,
    lon: float,
    slots: list[str],
    _providers: list[str] | None,
) -> dict[str, Any]:
    """
    Extract marine-related slot values.

    Note: Currently only retrieves bathymetry (depth). Uses a fixed date for
    the query since bathymetry data doesn't vary temporally.
    """
    values: dict[str, Any] = {}

    try:
        from datetime import date

        # Use current date for bathymetry query (bathymetry doesn't vary temporally)
        marine_result = service.get_comprehensive_marine_data(lat, lon, date.today())

        if "depth" in slots and marine_result.bathymetry is not None:
            # bathymetry is a MarineObservation with a value
            depth_value = marine_result.bathymetry.value
            if depth_value is not None:
                # Handle both float and dict values
                if isinstance(depth_value, dict):
                    # Use first available value from dict
                    depth = next(
                        (v for v in depth_value.values() if v is not None), None
                    )
                else:
                    depth = depth_value

                if depth is not None:
                    # submission-schema expects string with unit
                    # Use absolute value since bathymetry is typically negative
                    values["depth"] = f"{abs(depth)} m"

    except Exception as e:
        logger.warning(f"Failed to get marine data: {e}")

    return values


def _get_soil_values(
    service: SoilService,
    lat: float,
    lon: float,
    slots: list[str],
    _providers: list[str] | None,
) -> dict[str, Any]:
    """
    Extract soil-related slot values.

    Note: Uses surface soil depth (0-5cm) by default for pH and classification.
    """
    values: dict[str, Any] = {}

    try:
        # Get surface soil data (0-5cm depth)
        soil_result = service.enrich_location(lat, lon, depth_cm="0-5cm")

        # SoilResult contains a list of observations - use first one with data
        if soil_result.observations:
            surface_obs = soil_result.observations[0]

            # Extract pH if available
            if "ph" in slots and surface_obs.ph_h2o is not None:
                values["ph"] = surface_obs.ph_h2o

            # Extract soil classification if available
            if "soil_type" in slots:
                # Prefer USDA classification, fallback to WRB
                if surface_obs.classification_usda:
                    values["soil_type"] = surface_obs.classification_usda
                elif surface_obs.classification_wrb:
                    values["soil_type"] = surface_obs.classification_wrb

    except Exception as e:
        logger.warning(f"Failed to get soil data: {e}")

    return values