koa_middleware.store

Classes

CalibrationStore([instrument_name, ...])

Class to manage the storage, retrieval, and synchronization of calibration data between a local database and the remote archive.

class koa_middleware.store.CalibrationStore(instrument_name: str | None = None, cache_dir: str | None = None, local_database_filename: str | None = None, connect_remote: bool = True, use_cached: bool = None, origin: str | None = None, sync_on_init: bool = True)[source]

Bases: object

Class to manage the storage, retrieval, and synchronization of calibration data between a local database and the remote archive.

The CalibrationStore class provides a unified interface for interacting with both local (SQLite) and remote calibration databases. It handles caching of calibration files, querying for specific calibrations, and synchronizing calibration metadata between local and remote repositories.

Constructing this class sets up the necessary directory structure for caching calibration files and initializes the LocalCalibrationDB instance for managing the local SQLite database.

  • Creates the cache_dir, cache_dir/calibrations/<instrument_name>, and cache_dir/database directories if they do not already exist.

  • Initializes self.local_db with a LocalCalibrationDB instance.

  • Initializes self.remote_db with a RemoteCalibrationDB instance (if connect_remote is True).

Parameters:
  • instrument_name (str | None) – The name of the instrument associated with the calibration data (e.g., ‘hispec’, ‘liger’).

  • cache_dir (str | None) – The absolute path to the directory where calibration files and the local SQLite database will be stored. If None, uses the KOA_CALIBRATION_CACHE environment variable. Required either as parameter or environment variable.

  • local_database_filename (str | None) – The filename for the local SQLite database. If None, uses the KOA_LOCAL_CALIBRATION_DATABASE_FILENAME environment variable. If that is also None, defaults to f'{instrument_name.lower()}_calibrations.db'.

  • connect_remote (bool, optional) – Set to False to skip initializing the remote database connection. Default is True.

  • Variables (Environment)

  • directory. (- KOA_CALIBRATION_CACHE (Required) Path to cached calibrations)

  • Default (- KOA_CALIBRATIONS_URL (Optional) Remote database URL.)

  • Default

  • Default

  • Default

Examples

>>> from koa_middleware import CalibrationStore
>>> # Initialize with explicit parameters
>>> store = CalibrationStore(
...     instrument_name='hispec',
...     cache_dir='/tmp/koa_cache/',
...     local_database_filename='hispec_calibrations.db',
...     connect_remote=False
... )
>>> # Initialize using environment variables (assuming they are set)
>>> os.environ['KOA_CALIBRATION_CACHE'] = '/tmp/koa_cache/'
>>> store = CalibrationStore(instrument_name='hispec')
calibration_file_in_cache(cal: dict | str | SupportsCalibrationModelIO, filename: str | None = None) str | None[source]

Checks if a calibration file is already present in the local cache.

Parameters:
  • cal (dict | str | SupportsCalibrationModelIO) –

    Can be one of:
    • str : A calibration ID string or filepath.

    • dict : A calibration metadata dict.

    • SupportsCalibrationModelIO : A calibration data model instance.

  • filename (str | None) – The filename to check for. If None, the filename will be extracted from the input cal parameter.

Returns:

filepath – The absolute local file path if the calibration file is found in the cache, otherwise None.

Return type:

str | None

calibration_record_in_cache(cal: dict | str | SupportsCalibrationModelIO, mode: str = 'id') dict | None[source]

Checks if a calibration is already present in the local cache.

Parameters:
  • cal (dict | str | SupportsCalibrationModelIO) –

    Can be one of:
    • str : A calibration ID string or filepath.

    • dict : A calibration metadata dict.

    • SupportsCalibrationModelIO : A calibration data model instance.

  • mode (str) –

    The mode to check the cache. Can be one of:
    • ’id’ : Check by calibration ID (cal_id), the primary key in the database.

    • ’version-family’ : Check by the version family values.

    • ’md5’ : Check by the MD5 checksum of the calibration file.

Returns:

The calibration metadata record if found, otherwise None.

Return type:

dict | None

close()[source]

Closes the connections to the local DB. Currently nothing is done to close the remote DB. The Keck Login session is cached for re-use within the same python session.

detect_version_issues()[source]
download_calibration_file(calibration: dict | str) str[source]

Downloads a calibration file from the remote DB. This does not register the calibration in the local DB. Most use cases should use store.get_calibration() instead.

Parameters:

calibration (dict | str) – A calibration metadata dictionary or calibration ID string.

Returns:

The absolute local file path where the calibration file was downloaded.

Return type:

str

generate_calibration_version(cal: dict | SupportsCalibrationModelIO, origin: str | None = None) str[source]

Generate the next calibration version (“001”, “002”, …), scoped to the calibration’s version family and origin.

Parameters:
  • cal (dict | SupportsCalibrationModelIO) – The calibration record for which to generate the version. Must contain the necessary metadata fields to determine its version family (e.g. cal_type, datetime_obs, master_cal, spectrograph).

  • origin (str | None, optional) – The origin to use for generating the version. If None, the origin from the calibration metadata will be used.

Returns:

The calibration version string

Return type:

str

get_calibration(cal: dict | str) tuple[str, dict][source]

Retrieves the calibration file based on its record or ID. Checks if the calibration is already cached locally, and downloads it if not.

Parameters:

cal (dict | str) – A calibration metadata dictionary, calibration ID string, or local filepath string.

Returns:

result

  • str: The absolute local file path where the calibration file is stored.

  • dict: The calibration metadata dictionary as stored in the local database.

Return type:

tuple[str, dict]

get_last_updated(source: str | None = None, **kwargs) str | None[source]

Get the last updated timestamp for the instrument’s calibration data.

Parameters:
  • source (str | None) – Whether to query from the ‘local’ or ‘remote’ database. If None, defaults to ‘remote’ if available, otherwise ‘local’.

  • **kwargs – Additional parameters to pass to local_db.get_last_updated() or remote_db.get_last_updated().

Returns:

The last updated timestamp as a string, or None if no entries exist.

Return type:

str | None

get_missing_local_files() list[dict][source]

Identifies all calibration files that are recorded in the local sqlite DB but are missing from the local cache directory.

Parameters:

instrument_name (str, optional) – The name of the instrument to check for missing files. If None, all instruments are checked.

Returns:

A list of calibration metadata dictionaries for calibrations that are missing from the local cache.

Return type:

list[dict]

get_missing_records(source: str = 'remote', mode: str = 'id') list[dict][source]

Identifies calibration entries present in one database but missing from another.

Parameters:
  • source (str, optional) –

    • ‘remote’ (default): Returns entries in remote DB but not in local DB.

    • ’local’: Returns entries in local DB but not in remote DB.

  • mode (str, optional) – The mode to determine which entries are considered missing. Options are: - ‘id’ (default): Entries whose IDs are not present in the target database. - ‘last_updated’: Entries with a last_updated timestamp greater than the most recent timestamp in the target database.

Returns:

A list of dictionaries of metadata representing entries

that are in the source DB but not yet in the target DB.

Return type:

list[dict]

get_version_family_column_names(cal_type: str)[source]

Retrieves the column names for the version family attributes. By default, this includes ‘cal_type’ and ‘datetime_obs’, but subclasses should override this method to specify different or additional columns for different calibration types.

Parameters:

cal_type (str) – The type of calibration.

get_version_family_values(cal: dict) dict[source]

Retrieves the fields/values that determine whether or not a calibration requires a new version.

Parameters:
  • cal (dict) – A calibration metadata record. One key must be ‘cal_type’ to determine the calibration type and thus the version family fields.

  • cal_type (str) – The type of calibration.

Returns:

A dictionary containing only the keys/values for metadata that determines the version family.

Return type:

dict

query(source: str | None = None, **kwargs) list[dict] | dict | None[source]

Query calibrations from local or remote database.

Users can also query the local and remote databases directly using store.local_db.query() and store.remote_db.query().

This method may be removed in the future if not found useful.

Parameters:
  • source (str | None) – Whether to query from the ‘local’ or ‘remote’ database. If None, defaults to ‘local’.

  • **kwargs – Additional parameters to pass to the underlying query method.

Returns:

Query results from the specified source.

Return type:

list[dict] | dict | None

record_from(cal: dict | SupportsCalibrationModelIO) dict[source]

Extracts a calibration record dictionary from a given input.

Parameters:

cal (dict | SupportsCalibrationModelIO) – The input from which to extract the calibration record. Can be a dict or any object with a to_record() method.

Returns:

A dictionary representing the calibration record.

Return type:

dict

Raises:

ValueError – If the input type is invalid or if the object does not have a to_record() method.

register_calibration(cal: SupportsCalibrationModelIO, origin: str | None = None, new_version: bool = False) tuple[str, dict][source]

Registers a calibration to the local cache and metadata database.

Parameters:
  • cal (SupportsCalibrationModelIO) – The datamodel object to register.

  • origin (str, optional) – The origin to register the calibration under.

  • new_version (bool, optional) – Whether to generate a new version for this calibration. If False, the method will check if a calibration with the same version family already exists in the cache and skip registration if so. Defaults to False.

Returns:

A tuple containing:
  • str: The local file path where the calibration was saved.

  • dict: The calibration metadata dictionary as added to the database.

Return type:

tuple[str, dict]

save_calibration_file(cal: SupportsCalibrationModelIO, cal_record: dict | None = None) str[source]

Saves a calibration file to the local cache directory.

Parameters:
  • cal (SupportsCalibrationModelIO) – The calibration data model instance to save.

  • cal_record (dict | None) – The corresponding record.

Returns:

The absolute local file path where the calibration file was saved.

Return type:

str

select_and_get_calibration(input, selector: CalibrationSelector) tuple[str, dict][source]

Selects the best calibration based on input data and a selection rule, then retrieves it.

This method uses a CalibrationSelector to identify the most appropriate calibration for the given input data. Once selected, it retrieves the calibration file, downloading it if it’s not already cached locally.

Parameters:
  • input – The input data product for which a calibration is needed.

  • selector (CalibrationSelector) – An instance of a CalibrationSelector class.

Returns:

  • tuple[str, dict]

    • str: The local file path of the retrieved calibration file.

    • dict: The record of the selected calibration from the local database.

  • Example – >>> # Assuming my_input_data and my_selector are defined >>> local_filepath, calibration_record = store.select_and_get_calibration(my_input_data, my_selector) >>> print(f”Calibration file: {local_filepath}”) >>> print(f”Calibration ID: {calibration_record[‘id’]}”)

sync_records_from_cached_files(cals: SupportsCalibrationModelIO | Sequence[SupportsCalibrationModelIO]) None[source]

Populates the local database from existing cached calibration files.

Parameters:

cals (SupportsCalibrationModelIO | Sequence[SupportsCalibrationModelIO]) – A single calibration metadata dictionary or a data model instance, or a list of these.

Notes

This method may be removed in the future if not found useful.

sync_records_from_remote(cals, mode: str = 'id') list[dict][source]

Synchronizes the local database with the remote database.

This method fetches entries from the remote database that are missing from the local database based on the mode parameter, see below. It then adds these missing entries to the local database.

Parameters:

mode (str, optional) –

The mode to determine which entries are considered missing. Options are:

  • ’last_updated’: Entries with a last_updated timestamp greater than the most recent timestamp in the local database.

  • ’id’ (default): Entries whose IDs are not present in the local database.

Returns:

cals – A list of dictionaries representing calibration entries that were added to the local database during synchronization.

Return type:

list[dict]

sync_records_to_remote(mode: str = 'id') list[dict][source]

Uploads local calibration entires to the remote DB.

Parameters:

mode (str, optional) –

The mode to determine which entries are considered missing. Options are: - ‘last_updated’: Entries with a last_updated timestamp greater than the most recent timestamp in the local database.

  • ’id’ (default): Entries whose IDs are not present in the local database.

Returns:

cals – A list of dictionaries representing calibration entries that were added to the remote database during synchronization.

Return type:

list[dict]