Models

Data models used throughout the biosample-enricher package.

Core Models

These models are shared across multiple services and represent fundamental concepts.

biosample_enricher.models

Pydantic models for biosample data normalization and validation.

Provides explicit schema definitions for standardized biosample location data extracted from NMDC and GOLD databases, and elevation service output models.

class biosample_enricher.models.BiosampleLocation(**data)[source]

Bases: BaseModel

Standardized biosample location data for API enrichment.

Parameters:

data (Any)

latitude: float | None
longitude: float | None
collection_date: str | None
textual_location: str | None
sample_id: str | None
database_source: str | None
extraction_timestamp: str | None
nmdc_biosample_id: str | None
gold_biosample_id: str | None
alternative_identifiers: list[str] | None
external_database_identifiers: list[str] | None
biosample_identifiers: list[str] | None
sample_identifiers: list[str] | None
nmdc_studies: list[str] | None
gold_studies: list[str] | None
coordinate_precision: int | None
date_precision: str | None
location_completeness: float | None
env_broad_scale: str | None
env_local_scale: str | None
env_medium: str | None
sample_type: str | None
is_host_associated: bool | None
calculate_completeness()[source]

Calculate location completeness score based on available fields.

Return type:

BiosampleLocation

classmethod validate_collection_date(v)[source]

Validate collection date format.

Parameters:

v (str | None)

Return type:

str | None

is_enrichable()[source]

Check if sample has minimum data for API enrichment.

Return type:

bool

to_dict()[source]

Convert to dictionary for serialization with enrichable status.

Return type:

dict

class Config[source]

Bases: object

Pydantic configuration.

extra = 'forbid'
validate_assignment = True
str_strip_whitespace = True
model_config: ClassVar[ConfigDict] = {'extra': 'forbid', 'str_strip_whitespace': True, 'validate_assignment': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class biosample_enricher.models.Variable(value)[source]

Bases: str, Enum

Enumeration of variables that can be observed/measured.

ELEVATION = 'elevation'
class biosample_enricher.models.ValueStatus(value)[source]

Bases: str, Enum

Status of the observation value.

OK = 'ok'
ERROR = 'error'
PARTIAL = 'partial'
UNKNOWN = 'unknown'
class biosample_enricher.models.GeoPoint(**data)[source]

Bases: BaseModel

Geographic point with precision information.

Parameters:

data (Any)

lat: float
lon: float
precision_digits: int | None
uncertainty_m: float | None
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class biosample_enricher.models.ProviderRef(**data)[source]

Bases: BaseModel

Reference to a data provider.

Parameters:

data (Any)

name: str
endpoint: str | None
api_version: str | None
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class biosample_enricher.models.Observation(**data)[source]

Bases: BaseModel

A single observation/measurement result.

Parameters:

data (Any)

variable: Variable
value_numeric: float | None
unit_ucum: str
value_status: ValueStatus
provider: ProviderRef
request_location: GeoPoint
measurement_location: GeoPoint | None
distance_to_input_m: float | None
spatial_resolution_m: float | None
vertical_datum: str | None
raw_payload: str | None
raw_payload_sha256: str | None
normalization_version: str
cache_used: bool | None
request_id: str | None
error_message: str | None
created_at: datetime | None
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class biosample_enricher.models.EnrichmentRun(**data)[source]

Bases: BaseModel

Metadata about a specific enrichment run.

Parameters:

data (Any)

started_at: datetime
ended_at: datetime | None
tool_version: str | None
git_sha: str | None
read_from_cache: bool | None
write_to_cache: bool | None
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class biosample_enricher.models.OutputEnvelope(**data)[source]

Bases: BaseModel

Top-level container for enrichment results.

Parameters:

data (Any)

schema_version: str
run: EnrichmentRun
subject_id: str
observations: list[Observation]
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class biosample_enricher.models.CoordinateClassification(**data)[source]

Bases: BaseModel

Classification of geographic coordinates.

Parameters:

data (Any)

is_us_territory: bool
is_land: bool | None
country_code: str | None
region: str | None
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class biosample_enricher.models.ElevationRequest(**data)[source]

Bases: BaseModel

Request for elevation data at specific coordinates.

Parameters:

data (Any)

latitude: float
longitude: float
preferred_providers: list[str] | None
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class biosample_enricher.models.ElevationResult(**data)[source]

Bases: BaseModel

Single elevation result for compatibility/convenience.

Parameters:

data (Any)

latitude: float
longitude: float
elevation_meters: float
provider: str
accuracy_meters: float | None
data_source: str
timestamp: datetime
classification: CoordinateClassification
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class biosample_enricher.models.FetchResult(**data)[source]

Bases: BaseModel

Internal result from provider fetch operation.

Parameters:

data (Any)

ok: bool
elevation: float | None
location: GeoPoint | None
resolution_m: float | None
vertical_datum: str | None
raw: dict[str, Any]
error: str | None
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

Service-Specific Models

Each service has its own specialized models for requests and responses.

Weather Models

Weather enrichment data models with standardized schema for biosample metadata.

class biosample_enricher.weather.models.TemporalQuality(value)[source]

Bases: str, Enum

Temporal precision quality levels for weather data.

DAY_SPECIFIC_COMPLETE = 'day_specific_complete'
DAY_SPECIFIC_PARTIAL = 'day_specific_partial'
WEEKLY_COMPOSITE = 'weekly_composite'
MONTHLY_CLIMATOLOGY = 'monthly_climatology'
NO_DATA = 'no_data'
class biosample_enricher.weather.models.WeatherProvider(value)[source]

Bases: str, Enum

Supported weather data providers.

OPEN_METEO = 'open_meteo'
METEOSTAT = 'meteostat'
NOAA = 'noaa'
ECMWF = 'ecmwf'
class biosample_enricher.weather.models.TemporalPrecision(method, target_date, data_quality, coverage_info=None, caveat=None, provider=None)[source]

Bases: object

Temporal precision metadata for weather observations.

Parameters:
method: str
target_date: str
data_quality: TemporalQuality
coverage_info: str | None = None
caveat: str | None = None
provider: str | None = None
__init__(method, target_date, data_quality, coverage_info=None, caveat=None, provider=None)
Parameters:
class biosample_enricher.weather.models.WeatherObservation(**data)[source]

Bases: BaseModel

Single weather parameter observation with units and temporal context.

Parameters:

data (Any)

value: float | dict[str, float]
unit: str
temporal_precision: TemporalPrecision
quality_score: int | None
class Config[source]

Bases: object

arbitrary_types_allowed = True
classmethod validate_value(v)[source]

Validate that value is either a number or dict with numeric values.

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class biosample_enricher.weather.models.WeatherResult(**data)[source]

Bases: BaseModel

Standardized weather enrichment result aligned with NMDC/GOLD schemas.

Maps weather API responses to biosample schema fields with temporal precision.

Parameters:

data (Any)

temperature: WeatherObservation | None
wind_speed: WeatherObservation | None
wind_direction: WeatherObservation | None
humidity: WeatherObservation | None
solar_radiation: WeatherObservation | None
precipitation: WeatherObservation | None
pressure: WeatherObservation | None
location: dict[str, float]
collection_date: str
providers_attempted: list[str]
successful_providers: list[str]
failed_providers: list[str]
overall_quality: TemporalQuality | None
classmethod validate_date_format(v)[source]

Ensure collection date is in YYYY-MM-DD format.

get_schema_mapping(target_schema='nmdc')[source]

Map weather observations to target biosample schema fields.

Parameters:

target_schema (str) – “nmdc” or “gold”

Returns:

dict[str, Any] – Dict mapping to schema field names and values

get_coverage_metrics()[source]

Generate before/after coverage metrics for this weather enrichment.

Returns:

dict[str, Any] – Dict with coverage statistics for metrics reporting

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class biosample_enricher.weather.models.MultiProviderClimateNormals(**data)[source]

Bases: BaseModel

Climate normals from multiple providers for comparison and validation.

Returns results from all requested providers, allowing users to: - Compare values across different data sources - Detect provider outages/failures - Validate data quality by cross-checking - Choose which provider to trust for their use case

This is the default return type when multiple providers are queried.

Parameters:

data (Any)

providers: dict[str, ClimateNormalsResult]
location: dict[str, float]
requested_providers: list[str]
successful_providers: list[str]
failed_providers: dict[str, str]
requested_start_year: int
requested_end_year: int
get_provider_result(provider_name)[source]

Get result from a specific provider.

Parameters:

provider_name (str)

Return type:

ClimateNormalsResult | None

get_consensus_precipitation()[source]

Calculate consensus annual precipitation across all successful providers.

Returns the mean precipitation if multiple providers available, otherwise returns the single provider’s value.

Return type:

float | None

get_consensus_temperature()[source]

Calculate consensus annual temperature across all successful providers.

Returns the mean temperature if multiple providers available, otherwise returns the single provider’s value.

Return type:

float | None

get_value_ranges()[source]

Get min/max range of values across providers.

Useful for detecting large discrepancies and data quality issues.

Returns:

  • annual_precpt_range: (min, max) in mm/year or None

  • annual_temp_range: (min, max) in °C or None

Return type:

Dict with keys

to_submission_schema(provider=None, strategy='mean')[source]

Extract values in submission-schema compatible format.

Parameters:
  • provider (str | None) – Specific provider to use (e.g., “meteostat”). If None, uses strategy.

  • strategy (str) – How to combine multiple providers: - “mean”: Average across all successful providers (default) - “median”: Middle value when sorted (robust to outliers) - “first”: Use first successful provider - “best_quality”: Use provider with lowest station_distance_km

Returns:

dict[str, Any] – Dict with submission-schema values plus metadata about provider selection

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class biosample_enricher.weather.models.ClimateNormalsResult(**data)[source]

Bases: BaseModel

30-year climate averages (normals) from a single provider.

Climate normals provide baseline environmental conditions over a standard 30-year period (typically 1991-2020), representing typical climate rather than day-to-day weather variability.

Use this for: - Annual precipitation totals (sum of 12 monthly means) - Annual temperature averages - Long-term climate characterization - Biosample metadata fields like annual_precpt, annual_temp

For day-specific weather conditions, use WeatherResult instead. For multi-provider comparisons, see MultiProviderClimateNormals.

Parameters:

data (Any)

monthly_precipitation: list[float | None]
monthly_temperature: list[float | None]
station_id: str
station_distance_km: float
location: dict[str, float]
normals_period: tuple[int, int]
provider: str
data_quality: str | None
get_annual_precipitation()[source]

Calculate annual precipitation by summing 12 monthly normals.

Returns total in millimeters, suitable for submission-schema annual_precpt slot.

Returns:

Annual precipitation in millimeters (mm/year), or None if

data incomplete (requires at least 10 months of valid data).

Return type:

float

Example

>>> result.get_annual_precipitation()
547.2  # mm/year
get_annual_temperature()[source]

Calculate annual average temperature from 12 monthly normals.

Returns average in degrees Celsius, suitable for submission-schema annual_temp slot.

Returns:

float | None – Annual average temperature in °C, or None if data incomplete.

Example

>>> result.get_annual_temperature()
12.5  # °C
to_submission_schema()[source]

Extract values in submission-schema compatible format.

Provides simple scalar values suitable for NMDC submission-schema slots, following general-purpose design pattern (Issue #193).

Returns:

  • annual_precpt: float | None - Annual precipitation in millimeters (mm/year). Sum of 12 monthly normals. None if <10 months available.

  • annual_temp: float | None - Annual average temperature in degrees Celsius (°C). Average of 12 monthly normals. None if <10 months.

  • climate_normals_period: str - Period as “YYYY-YYYY” (e.g. “1991-2020”)

  • station_distance_km: float - Distance to weather station in kilometers

  • data_source: str - Provider name (e.g. “meteostat”)

Return type:

Dict with keys

Example

>>> normals = service.get_climate_normals(37.7749, -122.4194)
>>> values = normals.to_submission_schema()
>>> print(f"Annual rainfall: {values['annual_precpt']} mm")
Annual rainfall: 547.2 mm
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

Marine Models

Marine enrichment data models with standardized schema for oceanographic metadata.

class biosample_enricher.marine.models.MarineQuality(value)[source]

Bases: str, Enum

Data quality levels for marine observations.

SATELLITE_L3 = 'satellite_l3'
SATELLITE_L4 = 'satellite_l4'
MODEL_REANALYSIS = 'model_reanalysis'
CLIMATOLOGY = 'climatology'
STATIC_DATASET = 'static_dataset'
NO_DATA = 'no_data'
class biosample_enricher.marine.models.MarineProvider(value)[source]

Bases: str, Enum

Supported marine data providers.

NOAA_OISST = 'noaa_oisst'
GEBCO = 'gebco'
ESA_CCI = 'esa_cci'
CMEMS = 'cmems'
OSCAR = 'oscar'
class biosample_enricher.marine.models.MarinePrecision(method, target_date, data_quality, spatial_resolution=None, temporal_resolution=None, provider=None)[source]

Bases: object

Precision metadata for marine observations.

Parameters:
method: str
target_date: str
data_quality: MarineQuality
spatial_resolution: str | None = None
temporal_resolution: str | None = None
provider: str | None = None
__init__(method, target_date, data_quality, spatial_resolution=None, temporal_resolution=None, provider=None)
Parameters:
class biosample_enricher.marine.models.MarineObservation(**data)[source]

Bases: BaseModel

Single marine parameter observation with units and precision context.

Parameters:

data (Any)

value: float | dict[str, float]
unit: str
precision: MarinePrecision
quality_score: int | None
uncertainty: float | None
classmethod validate_value(v)[source]

Validate observation value.

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class biosample_enricher.marine.models.MarineResult(**data)[source]

Bases: BaseModel

Complete marine data result for a location and date.

Parameters:

data (Any)

location: dict[str, float]
collection_date: str
sea_surface_temperature: MarineObservation | None
bathymetry: MarineObservation | None
chlorophyll_a: MarineObservation | None
salinity: MarineObservation | None
dissolved_oxygen: MarineObservation | None
ph: MarineObservation | None
ocean_current_u: MarineObservation | None
ocean_current_v: MarineObservation | None
significant_wave_height: MarineObservation | None
successful_providers: list[str]
failed_providers: list[str]
overall_quality: MarineQuality
classmethod validate_date_format(v)[source]

Validate date format.

get_schema_mapping(target_schema)[source]

Map marine data to target biosample schema.

Parameters:

target_schema (str) – “nmdc” or “gold”

Returns:

dict[str, Any] – Dictionary mapping to schema fields

get_coverage_metrics()[source]

Generate coverage metrics for this marine result.

Return type:

dict[str, Any]

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

Soil Models

Pydantic models for soil enrichment data.

class biosample_enricher.soil.models.SoilObservation(**data)[source]

Bases: BaseModel

Individual soil measurement or prediction at a location.

Parameters:

data (Any)

classification_usda: str | None
classification_wrb: str | None
confidence_usda: float | None
confidence_wrb: float | None
ph_h2o: float | None
organic_carbon: float | None
bulk_density: float | None
sand_percent: float | None
silt_percent: float | None
clay_percent: float | None
texture_class: str | None
total_nitrogen: float | None
available_phosphorus: float | None
cation_exchange_capacity: float | None
depth_cm: str | None
measurement_method: str | None
classmethod validate_texture_class(v)[source]

Validate USDA texture class names.

Parameters:

v (str | None)

Return type:

str | None

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class biosample_enricher.soil.models.SoilResult(**data)[source]

Bases: BaseModel

Results from soil enrichment for a specific location.

Parameters:

data (Any)

latitude: float
longitude: float
distance_m: float | None
observations: list[SoilObservation]
quality_score: float
provider: str
retrieved_at: datetime
errors: list[str]
warnings: list[str]
to_nmdc_schema()[source]

Convert to NMDC biosample schema format.

Return type:

dict[str, Any]

to_gold_schema()[source]

Convert to GOLD biosample schema format.

Return type:

dict[str, Any]

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

biosample_enricher.soil.models.classify_texture(sand_pct, silt_pct, clay_pct)[source]

Classify soil texture using USDA texture triangle.

Parameters:
  • sand_pct (float) – Sand percentage (0-100)

  • silt_pct (float) – Silt percentage (0-100)

  • clay_pct (float) – Clay percentage (0-100)

Returns:

str – USDA texture class name

Raises:

ValueError – If percentages don’t sum to ~100% or are invalid

Land Cover Models

Data models for land cover and vegetation enrichment.

class biosample_enricher.land.models.LandCoverObservation(**data)[source]

Bases: BaseModel

Land cover classification from a specific provider.

Parameters:

data (Any)

provider: str
actual_location: dict[str, float]
distance_m: float
actual_date: date | None
temporal_offset_days: int | None
class_code: str | None
class_label: str | None
classification_system: str | None
confidence: float | None
resolution_m: float | None
dataset_version: str | None
quality_flags: list[str]
classmethod validate_location(v)[source]

Validate location coordinates.

Parameters:

v (dict[str, float])

Return type:

dict[str, float]

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class biosample_enricher.land.models.VegetationObservation(**data)[source]

Bases: BaseModel

Vegetation indices from a specific provider.

Parameters:

data (Any)

provider: str
actual_location: dict[str, float]
distance_m: float
actual_date: date | None
temporal_offset_days: int | None
ndvi: float | None
evi: float | None
lai: float | None
fpar: float | None
confidence: float | None
resolution_m: float | None
composite_period: str | None
quality_flags: list[str]
classmethod validate_location(v)[source]

Validate location coordinates.

Parameters:

v (dict[str, float])

Return type:

dict[str, float]

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class biosample_enricher.land.models.LandResult(**data)[source]

Bases: BaseModel

Complete land cover and vegetation enrichment result.

Parameters:

data (Any)

requested_location: dict[str, float]
requested_date: date | None
land_cover: list[LandCoverObservation]
vegetation: list[VegetationObservation]
overall_quality_score: float
providers_attempted: list[str]
providers_successful: list[str]
errors: list[str]
warnings: list[str]
classmethod validate_requested_location(v)[source]

Validate requested location coordinates.

Parameters:

v (dict[str, float])

Return type:

dict[str, float]

to_nmdc_schema()[source]

Convert to NMDC schema format.

Return type:

dict[str, Any]

to_gold_schema()[source]

Convert to GOLD schema format.

Return type:

dict[str, Any]

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

Geocoding Models

Forward Geocoding:

Data models for forward geocoding results (place names to coordinates).

class biosample_enricher.forward_geocoding.models.LocationType(value)[source]

Bases: str, Enum

Types of locations that can be geocoded.

COUNTRY = 'country'
STATE = 'state'
CITY = 'city'
TOWN = 'town'
VILLAGE = 'village'
POSTAL_CODE = 'postal_code'
ADDRESS = 'address'
LANDMARK = 'landmark'
NATURAL_FEATURE = 'natural_feature'
ADMINISTRATIVE_AREA = 'administrative_area'
UNKNOWN = 'unknown'
class biosample_enricher.forward_geocoding.models.GeometryType(value)[source]

Bases: str, Enum

Types of geometry returned by geocoding services.

POINT = 'POINT'
BOUNDS = 'BOUNDS'
APPROXIMATE = 'APPROXIMATE'
INTERPOLATED = 'INTERPOLATED'
ROOFTOP = 'ROOFTOP'
class biosample_enricher.forward_geocoding.models.BoundingBox(**data)[source]

Bases: BaseModel

Geographic bounding box for a location.

Parameters:

data (Any)

northeast_lat: float
northeast_lon: float
southwest_lat: float
southwest_lon: float
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class biosample_enricher.forward_geocoding.models.ForwardGeocodeLocation(**data)[source]

Bases: BaseModel

A geocoded location result (place name to coordinates).

Parameters:

data (Any)

input_query: str
formatted_address: str
display_name: str | None
latitude: float
longitude: float
country: str | None
country_code: str | None
state: str | None
state_code: str | None
county: str | None
city: str | None
postal_code: str | None
location_type: LocationType
geometry_type: GeometryType | None
bounding_box: BoundingBox | None
confidence: float | None
relevance: float | None
accuracy_m: float | None
place_id: str | None
osm_id: str | None
osm_type: str | None
importance: float | None
population: int | None
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class biosample_enricher.forward_geocoding.models.ForwardGeocodeProvider(**data)[source]

Bases: BaseModel

Information about the geocoding provider.

Parameters:

data (Any)

name: str
endpoint: str | None
api_version: str | None
attribution: str | None
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class biosample_enricher.forward_geocoding.models.ForwardGeocodeResult(**data)[source]

Bases: BaseModel

Complete forward geocoding result with metadata.

Parameters:

data (Any)

query: str
query_type: str | None
locations: list[ForwardGeocodeLocation]
provider: ForwardGeocodeProvider
status: str
error_message: str | None
response_time_ms: float | None
cache_hit: bool
timestamp: datetime
raw_response: dict[str, Any] | None
get_best_match()[source]

Get the highest relevance/confidence location result.

Return type:

ForwardGeocodeLocation | None

get_coordinates()[source]

Get coordinates from best match.

Return type:

tuple[float, float] | None

get_administrative_summary()[source]

Get administrative components from best match.

Return type:

dict[str, str]

to_enrichment_dict()[source]

Convert to dictionary suitable for biosample coordinate enrichment.

Return type:

dict[str, Any]

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class biosample_enricher.forward_geocoding.models.ForwardGeocodeFetchResult(**data)[source]

Bases: BaseModel

Internal result from provider fetch operation.

Parameters:

data (Any)

ok: bool
result: ForwardGeocodeResult | None
error: str | None
raw: dict[str, Any]
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

Reverse Geocoding:

Pydantic models for reverse geocoding data normalization and validation.

Provides explicit schema definitions for standardized reverse geocoding results from OSM and Google providers.

class biosample_enricher.reverse_geocoding_models.AddressComponentType(value)[source]

Bases: str, Enum

Types of address components from various providers.

COUNTRY = 'country'
ADMINISTRATIVE_AREA_LEVEL_1 = 'administrative_area_level_1'
ADMINISTRATIVE_AREA_LEVEL_2 = 'administrative_area_level_2'
ADMINISTRATIVE_AREA_LEVEL_3 = 'administrative_area_level_3'
ADMINISTRATIVE_AREA_LEVEL_4 = 'administrative_area_level_4'
ADMINISTRATIVE_AREA_LEVEL_5 = 'administrative_area_level_5'
LOCALITY = 'locality'
SUBLOCALITY = 'sublocality'
SUBLOCALITY_LEVEL_1 = 'sublocality_level_1'
SUBLOCALITY_LEVEL_2 = 'sublocality_level_2'
SUBLOCALITY_LEVEL_3 = 'sublocality_level_3'
SUBLOCALITY_LEVEL_4 = 'sublocality_level_4'
SUBLOCALITY_LEVEL_5 = 'sublocality_level_5'
ROUTE = 'route'
STREET_NUMBER = 'street_number'
STREET_ADDRESS = 'street_address'
PREMISE = 'premise'
SUBPREMISE = 'subpremise'
POSTAL_CODE = 'postal_code'
POSTAL_CODE_PREFIX = 'postal_code_prefix'
POSTAL_CODE_SUFFIX = 'postal_code_suffix'
NATURAL_FEATURE = 'natural_feature'
PARK = 'park'
POINT_OF_INTEREST = 'point_of_interest'
ESTABLISHMENT = 'establishment'
NEIGHBORHOOD = 'neighborhood'
COLLOQUIAL_AREA = 'colloquial_area'
PLUS_CODE = 'plus_code'
POLITICAL = 'political'
INTERSECTION = 'intersection'
CONTINENT = 'continent'
REGION = 'region'
ISLAND = 'island'
ARCHIPELAGO = 'archipelago'
class biosample_enricher.reverse_geocoding_models.AddressComponent(**data)[source]

Bases: BaseModel

Structured address component with type and value.

Parameters:

data (Any)

type: AddressComponentType
short_name: str
long_name: str | None
confidence: float | None
osm_id: str | None
wikidata_id: str | None
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class biosample_enricher.reverse_geocoding_models.BoundingBox(**data)[source]

Bases: BaseModel

Geographic bounding box.

Parameters:

data (Any)

north: float
south: float
east: float
west: float
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class biosample_enricher.reverse_geocoding_models.PlaceType(value)[source]

Bases: str, Enum

Types of places that can be returned.

BUILDING = 'building'
HOUSE = 'house'
AMENITY = 'amenity'
SHOP = 'shop'
TOURISM = 'tourism'
HISTORIC = 'historic'
LEISURE = 'leisure'
NATURAL = 'natural'
LANDUSE = 'landuse'
WATERWAY = 'waterway'
HIGHWAY = 'highway'
RAILWAY = 'railway'
AEROWAY = 'aeroway'
BOUNDARY = 'boundary'
PLACE = 'place'
OFFICE = 'office'
EMERGENCY = 'emergency'
MILITARY = 'military'
CRAFT = 'craft'
MAN_MADE = 'man_made'
ESTABLISHMENT = 'establishment'
POINT_OF_INTEREST = 'point_of_interest'
PARK = 'park'
OTHER = 'other'
class biosample_enricher.reverse_geocoding_models.ReverseGeocodeLocation(**data)[source]

Bases: BaseModel

Single reverse geocoding result location.

Parameters:

data (Any)

formatted_address: str
display_name: str | None
components: list[AddressComponent]
country: str | None
country_code: str | None
state: str | None
state_code: str | None
county: str | None
city: str | None
suburb: str | None
postcode: str | None
road: str | None
house_number: str | None
house_name: str | None
place_type: PlaceType | None
place_rank: int | None
importance: float | None
lat: float
lon: float
bounding_box: BoundingBox | None
place_id: str | None
osm_id: str | None
osm_type: str | None
wikidata_id: str | None
wikipedia_url: str | None
licence: str | None
attribution: str | None
distance_m: float | None
confidence: float | None
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class biosample_enricher.reverse_geocoding_models.ReverseGeocodeProvider(**data)[source]

Bases: BaseModel

Information about the reverse geocoding provider.

Parameters:

data (Any)

name: str
endpoint: str | None
api_version: str | None
rate_limit: int | None
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class biosample_enricher.reverse_geocoding_models.ReverseGeocodeResult(**data)[source]

Bases: BaseModel

Complete reverse geocoding result with metadata.

Parameters:

data (Any)

query_lat: float
query_lon: float
locations: list[ReverseGeocodeLocation]
provider: ReverseGeocodeProvider
status: str
error_message: str | None
response_time_ms: float | None
cache_hit: bool
timestamp: datetime
raw_response: dict[str, Any] | None
get_best_match()[source]

Get the best matching location (first in list).

Return type:

ReverseGeocodeLocation | None

get_country()[source]

Get country from the best match.

Return type:

str | None

get_formatted_address()[source]

Get formatted address from the best match.

Return type:

str | None

filter_by_type(place_type)[source]

Filter locations by place type.

Parameters:

place_type (PlaceType)

Return type:

list[ReverseGeocodeLocation]

to_simple_dict()[source]

Convert to simple dictionary for easy viewing.

Return type:

dict[str, Any]

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class biosample_enricher.reverse_geocoding_models.ReverseGeocodeFetchResult(**data)[source]

Bases: BaseModel

Internal result from provider fetch operation.

Parameters:

data (Any)

ok: bool
result: ReverseGeocodeResult | None
error: str | None
raw: dict[str, Any]
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

OSM Features Models

Data models for OpenStreetMap geographic features.

class biosample_enricher.osm_features.models.OSMElementType(value)[source]

Bases: str, Enum

Types of OSM elements.

NODE = 'node'
WAY = 'way'
RELATION = 'relation'
class biosample_enricher.osm_features.models.GeometryType(value)[source]

Bases: str, Enum

Types of geometric representations.

POINT = 'point'
LINESTRING = 'linestring'
POLYGON = 'polygon'
MULTIPOLYGON = 'multipolygon'
class biosample_enricher.osm_features.models.FeatureCategory(value)[source]

Bases: str, Enum

Main categories of OSM features.

NATURAL = 'natural'
WATERWAY = 'waterway'
HIGHWAY = 'highway'
RAILWAY = 'railway'
AEROWAY = 'aeroway'
AMENITY = 'amenity'
LEISURE = 'leisure'
LANDUSE = 'landuse'
BUILDING = 'building'
BOUNDARY = 'boundary'
PLACE = 'place'
TOURISM = 'tourism'
SHOP = 'shop'
CRAFT = 'craft'
OFFICE = 'office'
OTHER = 'other'
class biosample_enricher.osm_features.models.Coordinates(**data)[source]

Bases: BaseModel

Geographic coordinates.

Parameters:

data (Any)

latitude: float
longitude: float
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class biosample_enricher.osm_features.models.OSMNamedFeature(**data)[source]

Bases: BaseModel

A named geographic feature from OpenStreetMap.

Parameters:

data (Any)

osm_type: OSMElementType
osm_id: int
name: str | None
alt_names: list[str]
wikidata_id: str | None
wikipedia: str | None
centroid: Coordinates | None
distance_km: float | None
geometry_type: GeometryType | None
category: FeatureCategory
subcategory: str | None
tags: dict[str, str]
importance: float | None
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class biosample_enricher.osm_features.models.OSMUnnamedCounts(**data)[source]

Bases: BaseModel

Counts of unnamed features by category and subcategory.

Parameters:

data (Any)

key: str
total_count: int
value_counts: dict[str, dict[str, int]]
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class biosample_enricher.osm_features.models.OSMQuery(**data)[source]

Bases: BaseModel

Parameters for an OSM Overpass query.

Parameters:

data (Any)

center: Coordinates
radius_m: int
timeout_s: int
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class biosample_enricher.osm_features.models.OSMFeaturesResult(**data)[source]

Bases: BaseModel

Complete result from OSM features enrichment.

Parameters:

data (Any)

query: OSMQuery
named_features: list[OSMNamedFeature]
unnamed_counts: list[OSMUnnamedCounts]
total_elements: int
named_features_count: int
unnamed_categories_count: int
total_unnamed_count: int
success: bool
error_message: str | None
response_time_ms: float | None
data_source: str
query_timestamp: datetime
get_features_by_category(category)[source]

Get all named features of a specific category.

Parameters:

category (FeatureCategory)

Return type:

list[OSMNamedFeature]

get_nearest_feature(category)[source]

Get the nearest named feature of a specific category.

Parameters:

category (FeatureCategory)

Return type:

OSMNamedFeature | None

get_feature_counts_by_category()[source]

Get counts of unnamed features by main category.

Return type:

dict[str, int]

get_distance_summary()[source]

Generate distance summary for key feature categories.

Return type:

dict[str, Any]

to_enrichment_dict()[source]

Convert to dictionary suitable for biosample enrichment.

Return type:

dict[str, Any]

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class biosample_enricher.osm_features.models.OSMFetchResult(**data)[source]

Bases: BaseModel

Internal result from OSM Overpass API fetch operation.

Parameters:

data (Any)

ok: bool
result: OSMFeaturesResult | None
error: str | None
raw: dict[str, Any]
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class biosample_enricher.osm_features.models.GooglePlacesFeature(**data)[source]

Bases: BaseModel

A feature from Google Places API.

Parameters:

data (Any)

google_place_id: str
name: str | None
types: list[str]
centroid: Coordinates | None
distance_km: float | None
category: FeatureCategory
subcategory: str | None
rating: float | None
user_ratings_total: int | None
price_level: int | None
business_status: str | None
vicinity: str | None
formatted_address: str | None
icon_url: str | None
photos: list[dict[str, Any]]
plus_code: dict[str, Any] | None
raw_data: dict[str, Any]
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class biosample_enricher.osm_features.models.GooglePlacesResult(**data)[source]

Bases: BaseModel

Result from Google Places API query.

Parameters:

data (Any)

query: Coordinates
radius_m: int
named_features: list[GooglePlacesFeature]
unnamed_counts: list[dict[str, Any]]
total_features: int
success: bool
provider: str
error_message: str | None
to_enrichment_dict()[source]

Convert to dictionary suitable for biosample enrichment.

Return type:

dict[str, Any]

get_nearest_feature(category)[source]

Get nearest feature of specified category.

Parameters:

category (FeatureCategory)

Return type:

GooglePlacesFeature | None

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class biosample_enricher.osm_features.models.GooglePlacesFetchResult(**data)[source]

Bases: BaseModel

Result of fetching from Google Places API.

Parameters:

data (Any)

ok: bool
result: GooglePlacesResult | None
error: str | None
raw: dict[str, Any]
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class biosample_enricher.osm_features.models.CombinedFeaturesResult(**data)[source]

Bases: BaseModel

Combined results from multiple geographic feature providers.

Parameters:

data (Any)

query: Coordinates
radius_m: int
osm_result: OSMFeaturesResult | None
google_result: GooglePlacesResult | None
providers_successful: list[str]
providers_failed: list[str]
combined_enrichment_success: bool
to_enrichment_dict()[source]

Convert combined results to enrichment dictionary.

Return type:

dict[str, Any]

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

Common Patterns

All models follow these conventions:

Type Safety

  • Full type hints using Python 3.11+ syntax

  • Pydantic validation for all external data

  • mypy strict mode compliance

Coordinate Handling

  • Latitude: -90 to 90 decimal degrees

  • Longitude: -180 to 180 decimal degrees

  • Automatic canonicalization to 4 decimal places (~11m precision)

Observation Pattern

Many services return Observation objects with:

  • value_numeric: Numeric measurement (if applicable)

  • value_string: String value (if applicable)

  • provider: Data source information

  • metadata: Additional context

See Also