Models

Data models used throughout the biosample-enricher package.

Core Models

These models are shared across multiple services and represent fundamental concepts.

biosample_enricher.models

Pydantic models for biosample data normalization and validation.

Provides explicit schema definitions for standardized biosample location data extracted from NMDC and GOLD databases, and elevation service output models.

class biosample_enricher.models.BiosampleLocation(**data)[source]

Bases: BaseModel

Standardized biosample location data for API enrichment.

Parameters:: data (Any)

latitude: float | None

longitude: float | None

collection_date: str | None

textual_location: str | None

sample_id: str | None

database_source: str | None

extraction_timestamp: str | None

nmdc_biosample_id: str | None

gold_biosample_id: str | None

alternative_identifiers: list[str] | None

external_database_identifiers: list[str] | None

biosample_identifiers: list[str] | None

sample_identifiers: list[str] | None

nmdc_studies: list[str] | None

gold_studies: list[str] | None

coordinate_precision: int | None

date_precision: str | None

location_completeness: float | None

env_broad_scale: str | None

env_local_scale: str | None

env_medium: str | None

sample_type: str | None

is_host_associated: bool | None

calculate_completeness()[source]

Calculate location completeness score based on available fields.

Return type:: BiosampleLocation

classmethod validate_collection_date(v)[source]

Validate collection date format.

Parameters:: v (str | None)
Return type:: str | None

is_enrichable()[source]

Check if sample has minimum data for API enrichment.

Return type:: bool

to_dict()[source]

Convert to dictionary for serialization with enrichable status.

Return type:: dict

class Config[source]

Bases: object

Pydantic configuration.

extra = 'forbid'

validate_assignment = True

str_strip_whitespace = True

model_config: ClassVar[ConfigDict] = {'extra': 'forbid', 'str_strip_whitespace': True, 'validate_assignment': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class biosample_enricher.models.Variable(value)[source]

Bases: str, Enum

Enumeration of variables that can be observed/measured.

ELEVATION = 'elevation'

class biosample_enricher.models.ValueStatus(value)[source]

Bases: str, Enum

Status of the observation value.

OK = 'ok'

ERROR = 'error'

PARTIAL = 'partial'

UNKNOWN = 'unknown'

class biosample_enricher.models.GeoPoint(**data)[source]

Bases: BaseModel

Geographic point with precision information.

Parameters:: data (Any)

lat: float

lon: float

precision_digits: int | None

uncertainty_m: float | None

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class biosample_enricher.models.ProviderRef(**data)[source]

Bases: BaseModel

Reference to a data provider.

Parameters:: data (Any)

name: str

endpoint: str | None

api_version: str | None

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class biosample_enricher.models.Observation(**data)[source]

Bases: BaseModel

A single observation/measurement result.

Parameters:: data (Any)

variable: Variable

value_numeric: float | None

unit_ucum: str

value_status: ValueStatus

provider: ProviderRef

request_location: GeoPoint

measurement_location: GeoPoint | None

distance_to_input_m: float | None

spatial_resolution_m: float | None

vertical_datum: str | None

raw_payload: str | None

raw_payload_sha256: str | None

normalization_version: str

cache_used: bool | None

request_id: str | None

error_message: str | None

created_at: datetime | None

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class biosample_enricher.models.EnrichmentRun(**data)[source]

Bases: BaseModel

Metadata about a specific enrichment run.

Parameters:: data (Any)

started_at: datetime

ended_at: datetime | None

tool_version: str | None

git_sha: str | None

read_from_cache: bool | None

write_to_cache: bool | None

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class biosample_enricher.models.OutputEnvelope(**data)[source]

Bases: BaseModel

Top-level container for enrichment results.

Parameters:: data (Any)

schema_version: str

run: EnrichmentRun

subject_id: str

observations: list[Observation]

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class biosample_enricher.models.CoordinateClassification(**data)[source]

Bases: BaseModel

Classification of geographic coordinates.

Parameters:: data (Any)

is_us_territory: bool

is_land: bool | None

country_code: str | None

region: str | None

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class biosample_enricher.models.ElevationRequest(**data)[source]

Bases: BaseModel

Request for elevation data at specific coordinates.

Parameters:: data (Any)

latitude: float

longitude: float

preferred_providers: list[str] | None

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class biosample_enricher.models.ElevationResult(**data)[source]

Bases: BaseModel

Single elevation result for compatibility/convenience.

Parameters:: data (Any)

latitude: float

longitude: float

elevation_meters: float

provider: str

accuracy_meters: float | None

data_source: str

timestamp: datetime

classification: CoordinateClassification

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class biosample_enricher.models.FetchResult(**data)[source]

Bases: BaseModel

Internal result from provider fetch operation.

Parameters:: data (Any)

ok: bool

elevation: float | None

location: GeoPoint | None

resolution_m: float | None

vertical_datum: str | None

raw: dict[str, Any]

error: str | None

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

Service-Specific Models

Each service has its own specialized models for requests and responses.

Weather Models

Weather enrichment data models with standardized schema for biosample metadata.

class biosample_enricher.weather.models.TemporalQuality(value)[source]

Bases: str, Enum

Temporal precision quality levels for weather data.

DAY_SPECIFIC_COMPLETE = 'day_specific_complete'

DAY_SPECIFIC_PARTIAL = 'day_specific_partial'

WEEKLY_COMPOSITE = 'weekly_composite'

MONTHLY_CLIMATOLOGY = 'monthly_climatology'

NO_DATA = 'no_data'

class biosample_enricher.weather.models.WeatherProvider(value)[source]

Bases: str, Enum

Supported weather data providers.

OPEN_METEO = 'open_meteo'

METEOSTAT = 'meteostat'

NOAA = 'noaa'

ECMWF = 'ecmwf'

class biosample_enricher.weather.models.TemporalPrecision(method, target_date, data_quality, coverage_info=None, caveat=None, provider=None)[source]

Bases: object

Temporal precision metadata for weather observations.

Parameters:

method (str)
target_date (str)
data_quality (TemporalQuality)
coverage_info (str | None)
caveat (str | None)
provider (str | None)

method: str

target_date: str

data_quality: TemporalQuality

coverage_info: str | None = None

caveat: str | None = None

provider: str | None = None

__init__(method, target_date, data_quality, coverage_info=None, caveat=None, provider=None)

Parameters:

method (str)
target_date (str)
data_quality (TemporalQuality)
coverage_info (str | None)
caveat (str | None)
provider (str | None)

class biosample_enricher.weather.models.WeatherObservation(**data)[source]

Bases: BaseModel

Single weather parameter observation with units and temporal context.

Parameters:: data (Any)

value: float | dict[str, float]

unit: str

temporal_precision: TemporalPrecision

quality_score: int | None

class Config[source]

Bases: object

arbitrary_types_allowed = True

classmethod validate_value(v)[source]: Validate that value is either a number or dict with numeric values.

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class biosample_enricher.weather.models.WeatherResult(**data)[source]

Bases: BaseModel

Standardized weather enrichment result aligned with NMDC/GOLD schemas.

Maps weather API responses to biosample schema fields with temporal precision.

Parameters:: data (Any)

temperature: WeatherObservation | None

wind_speed: WeatherObservation | None

wind_direction: WeatherObservation | None

humidity: WeatherObservation | None

solar_radiation: WeatherObservation | None

precipitation: WeatherObservation | None

pressure: WeatherObservation | None

location: dict[str, float]

collection_date: str

providers_attempted: list[str]

successful_providers: list[str]

failed_providers: list[str]

overall_quality: TemporalQuality | None

classmethod validate_date_format(v)[source]: Ensure collection date is in YYYY-MM-DD format.

get_schema_mapping(target_schema='nmdc')[source]

Map weather observations to target biosample schema fields.

Parameters:: target_schema (str) – “nmdc” or “gold”
Returns:: dict[str, Any] – Dict mapping to schema field names and values

get_coverage_metrics()[source]

Generate before/after coverage metrics for this weather enrichment.

Returns:: dict[str, Any] – Dict with coverage statistics for metrics reporting

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class biosample_enricher.weather.models.MultiProviderClimateNormals(**data)[source]

Bases: BaseModel

Climate normals from multiple providers for comparison and validation.

Returns results from all requested providers, allowing users to: - Compare values across different data sources - Detect provider outages/failures - Validate data quality by cross-checking - Choose which provider to trust for their use case

This is the default return type when multiple providers are queried.

Parameters:: data (Any)

providers: dict[str, ClimateNormalsResult]

location: dict[str, float]

requested_providers: list[str]

successful_providers: list[str]

failed_providers: dict[str, str]

requested_start_year: int

requested_end_year: int

get_provider_result(provider_name)[source]

Get result from a specific provider.

Parameters:: provider_name (str)
Return type:: ClimateNormalsResult | None

get_consensus_precipitation()[source]

Calculate consensus annual precipitation across all successful providers.

Returns the mean precipitation if multiple providers available, otherwise returns the single provider’s value.

Return type:: float | None

get_consensus_temperature()[source]

Calculate consensus annual temperature across all successful providers.

Returns the mean temperature if multiple providers available, otherwise returns the single provider’s value.

Return type:: float | None

get_value_ranges()[source]

Get min/max range of values across providers.

Useful for detecting large discrepancies and data quality issues.

Returns:

annual_precpt_range: (min, max) in mm/year or None
annual_temp_range: (min, max) in °C or None

Return type:

Dict with keys

to_submission_schema(provider=None, strategy='mean')[source]

Extract values in submission-schema compatible format.

Parameters:

provider (str | None) – Specific provider to use (e.g., “meteostat”). If None, uses strategy.
strategy (str) – How to combine multiple providers: - “mean”: Average across all successful providers (default) - “median”: Middle value when sorted (robust to outliers) - “first”: Use first successful provider - “best_quality”: Use provider with lowest station_distance_km

Returns:

dict[str, Any] – Dict with submission-schema values plus metadata about provider selection

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class biosample_enricher.weather.models.ClimateNormalsResult(**data)[source]

Bases: BaseModel

30-year climate averages (normals) from a single provider.

Climate normals provide baseline environmental conditions over a standard 30-year period (typically 1991-2020), representing typical climate rather than day-to-day weather variability.

Use this for: - Annual precipitation totals (sum of 12 monthly means) - Annual temperature averages - Long-term climate characterization - Biosample metadata fields like annual_precpt, annual_temp

For day-specific weather conditions, use WeatherResult instead. For multi-provider comparisons, see MultiProviderClimateNormals.

Parameters:: data (Any)

monthly_precipitation: list[float | None]

monthly_temperature: list[float | None]

station_id: str

station_distance_km: float

location: dict[str, float]

normals_period: tuple[int, int]

provider: str

data_quality: str | None

get_annual_precipitation()[source]

Calculate annual precipitation by summing 12 monthly normals.

Returns total in millimeters, suitable for submission-schema annual_precpt slot.

Returns:

Annual precipitation in millimeters (mm/year), or None if: data incomplete (requires at least 10 months of valid data).

Return type:

float

Example

>>> result.get_annual_precipitation()
547.2  # mm/year

get_annual_temperature()[source]

Calculate annual average temperature from 12 monthly normals.

Returns average in degrees Celsius, suitable for submission-schema annual_temp slot.

Returns:: float | None – Annual average temperature in °C, or None if data incomplete.

Example

>>> result.get_annual_temperature()
12.5  # °C

to_submission_schema()[source]

Extract values in submission-schema compatible format.

Provides simple scalar values suitable for NMDC submission-schema slots, following general-purpose design pattern (Issue #193).

Returns:

annual_precpt: float | None - Annual precipitation in millimeters (mm/year). Sum of 12 monthly normals. None if <10 months available.
annual_temp: float | None - Annual average temperature in degrees Celsius (°C). Average of 12 monthly normals. None if <10 months.
climate_normals_period: str - Period as “YYYY-YYYY” (e.g. “1991-2020”)
station_distance_km: float - Distance to weather station in kilometers
data_source: str - Provider name (e.g. “meteostat”)

Return type:

Dict with keys

Example

>>> normals = service.get_climate_normals(37.7749, -122.4194)
>>> values = normals.to_submission_schema()
>>> print(f"Annual rainfall: {values['annual_precpt']} mm")
Annual rainfall: 547.2 mm

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

Marine Models

Marine enrichment data models with standardized schema for oceanographic metadata.

class biosample_enricher.marine.models.MarineQuality(value)[source]

Bases: str, Enum

Data quality levels for marine observations.

SATELLITE_L3 = 'satellite_l3'

SATELLITE_L4 = 'satellite_l4'

MODEL_REANALYSIS = 'model_reanalysis'

CLIMATOLOGY = 'climatology'

STATIC_DATASET = 'static_dataset'

NO_DATA = 'no_data'

class biosample_enricher.marine.models.MarineProvider(value)[source]

Bases: str, Enum

Supported marine data providers.

NOAA_OISST = 'noaa_oisst'

GEBCO = 'gebco'

ESA_CCI = 'esa_cci'

CMEMS = 'cmems'

OSCAR = 'oscar'

class biosample_enricher.marine.models.MarinePrecision(method, target_date, data_quality, spatial_resolution=None, temporal_resolution=None, provider=None)[source]

Bases: object

Precision metadata for marine observations.

Parameters:

method (str)
target_date (str)
data_quality (MarineQuality)
spatial_resolution (str | None)
temporal_resolution (str | None)
provider (str | None)

method: str

target_date: str

data_quality: MarineQuality

spatial_resolution: str | None = None

temporal_resolution: str | None = None

provider: str | None = None

__init__(method, target_date, data_quality, spatial_resolution=None, temporal_resolution=None, provider=None)

Parameters:

method (str)
target_date (str)
data_quality (MarineQuality)
spatial_resolution (str | None)
temporal_resolution (str | None)
provider (str | None)

class biosample_enricher.marine.models.MarineObservation(**data)[source]

Bases: BaseModel

Single marine parameter observation with units and precision context.

Parameters:: data (Any)

value: float | dict[str, float]

unit: str

precision: MarinePrecision

quality_score: int | None

uncertainty: float | None

classmethod validate_value(v)[source]: Validate observation value.

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class biosample_enricher.marine.models.MarineResult(**data)[source]

Bases: BaseModel

Complete marine data result for a location and date.

Parameters:: data (Any)

location: dict[str, float]

collection_date: str

sea_surface_temperature: MarineObservation | None

bathymetry: MarineObservation | None

chlorophyll_a: MarineObservation | None

salinity: MarineObservation | None

dissolved_oxygen: MarineObservation | None

ph: MarineObservation | None

ocean_current_u: MarineObservation | None

ocean_current_v: MarineObservation | None

significant_wave_height: MarineObservation | None

successful_providers: list[str]

failed_providers: list[str]

overall_quality: MarineQuality

classmethod validate_date_format(v)[source]: Validate date format.

get_schema_mapping(target_schema)[source]

Map marine data to target biosample schema.

Parameters:: target_schema (str) – “nmdc” or “gold”
Returns:: dict[str, Any] – Dictionary mapping to schema fields

get_coverage_metrics()[source]

Generate coverage metrics for this marine result.

Return type:: dict[str, Any]

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

Soil Models

Pydantic models for soil enrichment data.

class biosample_enricher.soil.models.SoilObservation(**data)[source]

Bases: BaseModel

Individual soil measurement or prediction at a location.

Parameters:: data (Any)

classification_usda: str | None

classification_wrb: str | None

confidence_usda: float | None

confidence_wrb: float | None

ph_h2o: float | None

organic_carbon: float | None

bulk_density: float | None

sand_percent: float | None

silt_percent: float | None

clay_percent: float | None

texture_class: str | None

total_nitrogen: float | None

available_phosphorus: float | None

cation_exchange_capacity: float | None

depth_cm: str | None

measurement_method: str | None

classmethod validate_texture_class(v)[source]

Validate USDA texture class names.

Parameters:: v (str | None)
Return type:: str | None

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class biosample_enricher.soil.models.SoilResult(**data)[source]

Bases: BaseModel

Results from soil enrichment for a specific location.

Parameters:: data (Any)

latitude: float

longitude: float

distance_m: float | None

observations: list[SoilObservation]

quality_score: float

provider: str

retrieved_at: datetime

errors: list[str]

warnings: list[str]

to_nmdc_schema()[source]

Convert to NMDC biosample schema format.

Return type:: dict[str, Any]

to_gold_schema()[source]

Convert to GOLD biosample schema format.

Return type:: dict[str, Any]

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

biosample_enricher.soil.models.classify_texture(sand_pct, silt_pct, clay_pct)[source]

Classify soil texture using USDA texture triangle.

Parameters:

sand_pct (float) – Sand percentage (0-100)
silt_pct (float) – Silt percentage (0-100)
clay_pct (float) – Clay percentage (0-100)

Returns:

str – USDA texture class name

Raises:

ValueError – If percentages don’t sum to ~100% or are invalid

Land Cover Models

Data models for land cover and vegetation enrichment.

class biosample_enricher.land.models.LandCoverObservation(**data)[source]

Bases: BaseModel

Land cover classification from a specific provider.

Parameters:: data (Any)

provider: str

actual_location: dict[str, float]

distance_m: float

actual_date: date | None

temporal_offset_days: int | None

class_code: str | None

class_label: str | None

classification_system: str | None

confidence: float | None

resolution_m: float | None

dataset_version: str | None

quality_flags: list[str]

classmethod validate_location(v)[source]

Validate location coordinates.

Parameters:: v (dict[str, float])
Return type:: dict[str, float]

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class biosample_enricher.land.models.VegetationObservation(**data)[source]

Bases: BaseModel

Vegetation indices from a specific provider.

Parameters:: data (Any)

provider: str

actual_location: dict[str, float]

distance_m: float

actual_date: date | None

temporal_offset_days: int | None

ndvi: float | None

evi: float | None

lai: float | None

fpar: float | None

confidence: float | None

resolution_m: float | None

composite_period: str | None

quality_flags: list[str]

classmethod validate_location(v)[source]

Validate location coordinates.

Parameters:: v (dict[str, float])
Return type:: dict[str, float]

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class biosample_enricher.land.models.LandResult(**data)[source]

Bases: BaseModel

Complete land cover and vegetation enrichment result.

Parameters:: data (Any)

requested_location: dict[str, float]

requested_date: date | None

land_cover: list[LandCoverObservation]

vegetation: list[VegetationObservation]

overall_quality_score: float

providers_attempted: list[str]

providers_successful: list[str]

errors: list[str]

warnings: list[str]

classmethod validate_requested_location(v)[source]

Validate requested location coordinates.

Parameters:: v (dict[str, float])
Return type:: dict[str, float]

to_nmdc_schema()[source]

Convert to NMDC schema format.

Return type:: dict[str, Any]

to_gold_schema()[source]

Convert to GOLD schema format.

Return type:: dict[str, Any]

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

Geocoding Models

Forward Geocoding:

Data models for forward geocoding results (place names to coordinates).

class biosample_enricher.forward_geocoding.models.LocationType(value)[source]

Bases: str, Enum

Types of locations that can be geocoded.

COUNTRY = 'country'

STATE = 'state'

CITY = 'city'

TOWN = 'town'

VILLAGE = 'village'

POSTAL_CODE = 'postal_code'

ADDRESS = 'address'

LANDMARK = 'landmark'

NATURAL_FEATURE = 'natural_feature'

ADMINISTRATIVE_AREA = 'administrative_area'

UNKNOWN = 'unknown'

class biosample_enricher.forward_geocoding.models.GeometryType(value)[source]

Bases: str, Enum

Types of geometry returned by geocoding services.

POINT = 'POINT'

BOUNDS = 'BOUNDS'

APPROXIMATE = 'APPROXIMATE'

INTERPOLATED = 'INTERPOLATED'

ROOFTOP = 'ROOFTOP'

class biosample_enricher.forward_geocoding.models.BoundingBox(**data)[source]

Bases: BaseModel

Geographic bounding box for a location.

Parameters:: data (Any)

northeast_lat: float

northeast_lon: float

southwest_lat: float

southwest_lon: float

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class biosample_enricher.forward_geocoding.models.ForwardGeocodeLocation(**data)[source]

Bases: BaseModel

A geocoded location result (place name to coordinates).

Parameters:: data (Any)

input_query: str

formatted_address: str

display_name: str | None

latitude: float

longitude: float

country: str | None

country_code: str | None

state: str | None

state_code: str | None

county: str | None

city: str | None

postal_code: str | None

location_type: LocationType

geometry_type: GeometryType | None

bounding_box: BoundingBox | None

confidence: float | None

relevance: float | None

accuracy_m: float | None

place_id: str | None

osm_id: str | None

osm_type: str | None

importance: float | None

population: int | None

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class biosample_enricher.forward_geocoding.models.ForwardGeocodeProvider(**data)[source]

Bases: BaseModel

Information about the geocoding provider.

Parameters:: data (Any)

name: str

endpoint: str | None

api_version: str | None

attribution: str | None

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class biosample_enricher.forward_geocoding.models.ForwardGeocodeResult(**data)[source]

Bases: BaseModel

Complete forward geocoding result with metadata.

Parameters:: data (Any)

query: str

query_type: str | None

locations: list[ForwardGeocodeLocation]

provider: ForwardGeocodeProvider

status: str

error_message: str | None

response_time_ms: float | None

cache_hit: bool

timestamp: datetime

raw_response: dict[str, Any] | None

get_best_match()[source]

Get the highest relevance/confidence location result.

Return type:: ForwardGeocodeLocation | None

get_coordinates()[source]

Get coordinates from best match.

Return type:: tuple[float, float] | None

get_administrative_summary()[source]

Get administrative components from best match.

Return type:: dict[str, str]

to_enrichment_dict()[source]

Convert to dictionary suitable for biosample coordinate enrichment.

Return type:: dict[str, Any]

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class biosample_enricher.forward_geocoding.models.ForwardGeocodeFetchResult(**data)[source]

Bases: BaseModel

Internal result from provider fetch operation.

Parameters:: data (Any)

ok: bool

result: ForwardGeocodeResult | None

error: str | None

raw: dict[str, Any]

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

Reverse Geocoding:

Pydantic models for reverse geocoding data normalization and validation.

Provides explicit schema definitions for standardized reverse geocoding results from OSM and Google providers.

class biosample_enricher.reverse_geocoding_models.AddressComponentType(value)[source]

Bases: str, Enum

Types of address components from various providers.

COUNTRY = 'country'

ADMINISTRATIVE_AREA_LEVEL_1 = 'administrative_area_level_1'

ADMINISTRATIVE_AREA_LEVEL_2 = 'administrative_area_level_2'

ADMINISTRATIVE_AREA_LEVEL_3 = 'administrative_area_level_3'

ADMINISTRATIVE_AREA_LEVEL_4 = 'administrative_area_level_4'

ADMINISTRATIVE_AREA_LEVEL_5 = 'administrative_area_level_5'

LOCALITY = 'locality'

SUBLOCALITY = 'sublocality'

SUBLOCALITY_LEVEL_1 = 'sublocality_level_1'

SUBLOCALITY_LEVEL_2 = 'sublocality_level_2'

SUBLOCALITY_LEVEL_3 = 'sublocality_level_3'

SUBLOCALITY_LEVEL_4 = 'sublocality_level_4'

SUBLOCALITY_LEVEL_5 = 'sublocality_level_5'

ROUTE = 'route'

STREET_NUMBER = 'street_number'

STREET_ADDRESS = 'street_address'

PREMISE = 'premise'

SUBPREMISE = 'subpremise'

POSTAL_CODE = 'postal_code'

POSTAL_CODE_PREFIX = 'postal_code_prefix'

POSTAL_CODE_SUFFIX = 'postal_code_suffix'

NATURAL_FEATURE = 'natural_feature'

PARK = 'park'

POINT_OF_INTEREST = 'point_of_interest'

ESTABLISHMENT = 'establishment'

NEIGHBORHOOD = 'neighborhood'

COLLOQUIAL_AREA = 'colloquial_area'

PLUS_CODE = 'plus_code'

POLITICAL = 'political'

INTERSECTION = 'intersection'

CONTINENT = 'continent'

REGION = 'region'

ISLAND = 'island'

ARCHIPELAGO = 'archipelago'

class biosample_enricher.reverse_geocoding_models.AddressComponent(**data)[source]

Bases: BaseModel

Structured address component with type and value.

Parameters:: data (Any)

type: AddressComponentType

short_name: str

long_name: str | None

confidence: float | None

osm_id: str | None

wikidata_id: str | None

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class biosample_enricher.reverse_geocoding_models.BoundingBox(**data)[source]

Bases: BaseModel

Geographic bounding box.

Parameters:: data (Any)

north: float

south: float

east: float

west: float

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class biosample_enricher.reverse_geocoding_models.PlaceType(value)[source]

Bases: str, Enum

Types of places that can be returned.

BUILDING = 'building'

HOUSE = 'house'

AMENITY = 'amenity'

SHOP = 'shop'

TOURISM = 'tourism'

HISTORIC = 'historic'

LEISURE = 'leisure'

NATURAL = 'natural'

LANDUSE = 'landuse'

WATERWAY = 'waterway'

HIGHWAY = 'highway'

RAILWAY = 'railway'

AEROWAY = 'aeroway'

BOUNDARY = 'boundary'

PLACE = 'place'

OFFICE = 'office'

EMERGENCY = 'emergency'

MILITARY = 'military'

CRAFT = 'craft'

MAN_MADE = 'man_made'

ESTABLISHMENT = 'establishment'

POINT_OF_INTEREST = 'point_of_interest'

PARK = 'park'

OTHER = 'other'

class biosample_enricher.reverse_geocoding_models.ReverseGeocodeLocation(**data)[source]

Bases: BaseModel

Single reverse geocoding result location.

Parameters:: data (Any)

formatted_address: str

display_name: str | None

components: list[AddressComponent]

country: str | None

country_code: str | None

state: str | None

state_code: str | None

county: str | None

city: str | None

suburb: str | None

postcode: str | None

road: str | None

house_number: str | None

house_name: str | None

place_type: PlaceType | None

place_rank: int | None

importance: float | None

lat: float

lon: float

bounding_box: BoundingBox | None

place_id: str | None

osm_id: str | None

osm_type: str | None

wikidata_id: str | None

wikipedia_url: str | None

licence: str | None

attribution: str | None

distance_m: float | None

confidence: float | None

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class biosample_enricher.reverse_geocoding_models.ReverseGeocodeProvider(**data)[source]

Bases: BaseModel

Information about the reverse geocoding provider.

Parameters:: data (Any)

name: str

endpoint: str | None

api_version: str | None

rate_limit: int | None

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class biosample_enricher.reverse_geocoding_models.ReverseGeocodeResult(**data)[source]

Bases: BaseModel

Complete reverse geocoding result with metadata.

Parameters:: data (Any)

query_lat: float

query_lon: float

locations: list[ReverseGeocodeLocation]

provider: ReverseGeocodeProvider

status: str

error_message: str | None

response_time_ms: float | None

cache_hit: bool

timestamp: datetime

raw_response: dict[str, Any] | None

get_best_match()[source]

Get the best matching location (first in list).

Return type:: ReverseGeocodeLocation | None

get_country()[source]

Get country from the best match.

Return type:: str | None

get_formatted_address()[source]

Get formatted address from the best match.

Return type:: str | None

filter_by_type(place_type)[source]

Filter locations by place type.

Parameters:: place_type (PlaceType)
Return type:: list[ReverseGeocodeLocation]

to_simple_dict()[source]

Convert to simple dictionary for easy viewing.

Return type:: dict[str, Any]

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class biosample_enricher.reverse_geocoding_models.ReverseGeocodeFetchResult(**data)[source]

Bases: BaseModel

Internal result from provider fetch operation.

Parameters:: data (Any)

ok: bool

result: ReverseGeocodeResult | None

error: str | None

raw: dict[str, Any]

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

OSM Features Models

Data models for OpenStreetMap geographic features.

class biosample_enricher.osm_features.models.OSMElementType(value)[source]

Bases: str, Enum

Types of OSM elements.

NODE = 'node'

WAY = 'way'

RELATION = 'relation'

class biosample_enricher.osm_features.models.GeometryType(value)[source]

Bases: str, Enum

Types of geometric representations.

POINT = 'point'

LINESTRING = 'linestring'

POLYGON = 'polygon'

MULTIPOLYGON = 'multipolygon'

class biosample_enricher.osm_features.models.FeatureCategory(value)[source]

Bases: str, Enum

Main categories of OSM features.

NATURAL = 'natural'

WATERWAY = 'waterway'

HIGHWAY = 'highway'

RAILWAY = 'railway'

AEROWAY = 'aeroway'

AMENITY = 'amenity'

LEISURE = 'leisure'

LANDUSE = 'landuse'

BUILDING = 'building'

BOUNDARY = 'boundary'

PLACE = 'place'

TOURISM = 'tourism'

SHOP = 'shop'

CRAFT = 'craft'

OFFICE = 'office'

OTHER = 'other'

class biosample_enricher.osm_features.models.Coordinates(**data)[source]

Bases: BaseModel

Geographic coordinates.

Parameters:: data (Any)

latitude: float

longitude: float

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class biosample_enricher.osm_features.models.OSMNamedFeature(**data)[source]

Bases: BaseModel

A named geographic feature from OpenStreetMap.

Parameters:: data (Any)

osm_type: OSMElementType

osm_id: int

name: str | None

alt_names: list[str]

wikidata_id: str | None

wikipedia: str | None

centroid: Coordinates | None

distance_km: float | None

geometry_type: GeometryType | None

category: FeatureCategory

subcategory: str | None

tags: dict[str, str]

importance: float | None

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class biosample_enricher.osm_features.models.OSMUnnamedCounts(**data)[source]

Bases: BaseModel

Counts of unnamed features by category and subcategory.

Parameters:: data (Any)

key: str

total_count: int

value_counts: dict[str, dict[str, int]]

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class biosample_enricher.osm_features.models.OSMQuery(**data)[source]

Bases: BaseModel

Parameters for an OSM Overpass query.

Parameters:: data (Any)

center: Coordinates

radius_m: int

timeout_s: int

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class biosample_enricher.osm_features.models.OSMFeaturesResult(**data)[source]

Bases: BaseModel

Complete result from OSM features enrichment.

Parameters:: data (Any)

query: OSMQuery

named_features: list[OSMNamedFeature]

unnamed_counts: list[OSMUnnamedCounts]

total_elements: int

named_features_count: int

unnamed_categories_count: int

total_unnamed_count: int

success: bool

error_message: str | None

response_time_ms: float | None

data_source: str

query_timestamp: datetime

get_features_by_category(category)[source]

Get all named features of a specific category.

Parameters:: category (FeatureCategory)
Return type:: list[OSMNamedFeature]

get_nearest_feature(category)[source]

Get the nearest named feature of a specific category.

Parameters:: category (FeatureCategory)
Return type:: OSMNamedFeature | None

get_feature_counts_by_category()[source]

Get counts of unnamed features by main category.

Return type:: dict[str, int]

get_distance_summary()[source]

Generate distance summary for key feature categories.

Return type:: dict[str, Any]

to_enrichment_dict()[source]

Convert to dictionary suitable for biosample enrichment.

Return type:: dict[str, Any]

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class biosample_enricher.osm_features.models.OSMFetchResult(**data)[source]

Bases: BaseModel

Internal result from OSM Overpass API fetch operation.

Parameters:: data (Any)

ok: bool

result: OSMFeaturesResult | None

error: str | None

raw: dict[str, Any]

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class biosample_enricher.osm_features.models.GooglePlacesFeature(**data)[source]

Bases: BaseModel

A feature from Google Places API.

Parameters:: data (Any)

google_place_id: str

name: str | None

types: list[str]

centroid: Coordinates | None

distance_km: float | None

category: FeatureCategory

subcategory: str | None

rating: float | None

user_ratings_total: int | None

price_level: int | None

business_status: str | None

vicinity: str | None

formatted_address: str | None

icon_url: str | None

photos: list[dict[str, Any]]

plus_code: dict[str, Any] | None

raw_data: dict[str, Any]

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class biosample_enricher.osm_features.models.GooglePlacesResult(**data)[source]

Bases: BaseModel

Result from Google Places API query.

Parameters:: data (Any)

query: Coordinates

radius_m: int

named_features: list[GooglePlacesFeature]

unnamed_counts: list[dict[str, Any]]

total_features: int

success: bool

provider: str

error_message: str | None

to_enrichment_dict()[source]

Convert to dictionary suitable for biosample enrichment.

Return type:: dict[str, Any]

get_nearest_feature(category)[source]

Get nearest feature of specified category.

Parameters:: category (FeatureCategory)
Return type:: GooglePlacesFeature | None

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class biosample_enricher.osm_features.models.GooglePlacesFetchResult(**data)[source]

Bases: BaseModel

Result of fetching from Google Places API.

Parameters:: data (Any)

ok: bool

result: GooglePlacesResult | None

error: str | None

raw: dict[str, Any]

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class biosample_enricher.osm_features.models.CombinedFeaturesResult(**data)[source]

Bases: BaseModel

Combined results from multiple geographic feature providers.

Parameters:: data (Any)

query: Coordinates

radius_m: int

osm_result: OSMFeaturesResult | None

google_result: GooglePlacesResult | None

providers_successful: list[str]

providers_failed: list[str]

combined_enrichment_success: bool

to_enrichment_dict()[source]

Convert combined results to enrichment dictionary.

Return type:: dict[str, Any]

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

Common Patterns

All models follow these conventions:

Type Safety

Full type hints using Python 3.11+ syntax
Pydantic validation for all external data
mypy strict mode compliance

Coordinate Handling

Latitude: -90 to 90 decimal degrees
Longitude: -180 to 180 decimal degrees
Automatic canonicalization to 4 decimal places (~11m precision)

Observation Pattern

Many services return Observation objects with:

value_numeric: Numeric measurement (if applicable)
value_string: String value (if applicable)
provider: Data source information
metadata: Additional context

Models

Core Models

biosample_enricher.models

Service-Specific Models

Weather Models

Marine Models

Soil Models

Land Cover Models

Geocoding Models

OSM Features Models

Common Patterns

Type Safety

Coordinate Handling

Observation Pattern

See Also