spras.config package

Submodules

spras.config.algorithms module

Dynamic construction of algorithm parameters with runtime type information for parameter combinations. This has been isolated from schema.py as it is not declarative, and rather mainly contains validators and lower-level pydantic code.

spras.config.config module

This config file is being used as a singleton. Because python creates a single instance of modules when they’re imported, we rely on the Snakefile instantiating the module. In particular, when the Snakefile calls init_config, it will reassign config to take the value of the actual config provided by Snakemake. After that point, any module that imports this module can access a config option by checking the object’s value. For example

import spras.config.config as config container_framework = config.config.container_settings.framework

will grab the top level registry configuration option as it appears in the config file

class spras.config.config.Config(raw_config: dict[str, Any])

Bases: object

classmethod from_file(filepath: str | PathLike[str])

process_algorithms(raw_config: RawConfig): Parse algorithm information Each algorithm’s parameters are provided as a list of dictionaries Defaults are handled in the Python function or class that wraps running that algorithm Keys in the parameter dictionary are strings

process_analysis(raw_config: RawConfig)

process_config(raw_config: RawConfig)

process_datasets(raw_config: RawConfig): Parse dataset information Datasets is initially a list, where each list entry has a dataset label and lists of input files Convert the dataset list into a dict where the label is the key and update the config data structure

spras.config.config.init_from_file(filepath)

spras.config.config.init_global(config_dict)

spras.config.container_schema module

The separate container schema specification file. For information about pydantic, see schema.py.

We move this to a separate file to allow containers.py to explicitly take in this subsection of the configuration.

class spras.config.container_schema.ContainerFramework(*values)

Bases: CaseInsensitiveEnum

apptainer = 'apptainer'

docker = 'docker'

dsub = 'dsub'

singularity = 'singularity'

class spras.config.container_schema.ContainerRegistry(*, base_url: str = 'docker.io', owner: str = 'reedcompbio')

Bases: BaseModel

base_url: str: The domain of the registry

model_config: ClassVar[ConfigDict] = {'extra': 'forbid', 'use_attribute_docstrings': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

owner: str: The owner or project of the registry

class spras.config.container_schema.ContainerSettings(*, framework: ContainerFramework = ContainerFramework.docker, unpack_singularity: bool = False, enable_profiling: bool = False, registry: ContainerRegistry)

Bases: BaseModel

enable_profiling: bool: A Boolean indicating whether to enable container runtime profiling (apptainer/singularity only)

framework: ContainerFramework

model_config: ClassVar[ConfigDict] = {'extra': 'forbid', 'use_attribute_docstrings': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

registry: ContainerRegistry

unpack_singularity: bool

class spras.config.container_schema.ProcessedContainerSettings(framework: spras.config.container_schema.ContainerFramework = <ContainerFramework.docker: 'docker'>, unpack_singularity: bool = False, prefix: str = 'docker.io/reedcompbio', enable_profiling: bool = False, hash_length: int = 7)

Bases: object

enable_profiling: bool = False

framework: ContainerFramework = 'docker'

static from_container_settings(settings: ContainerSettings, hash_length: int) → ProcessedContainerSettings

hash_length: int = 7

The hash length for container-specific usage. This does not appear in the output folder, but it may show up in logs, and usually never needs to be tinkered with. This will be the top-level hash_length specified in the config.

We prefer this hash_length in our container-running logic to avoid a (future) dependency diamond.

prefix: str = 'docker.io/reedcompbio'

unpack_singularity: bool = False

spras.config.dataset module

class spras.config.dataset.DatasetSchema(*, label: ~typing.Annotated[str, ~pydantic.functional_validators.AfterValidator(func=~spras.config.util.label_validator.<locals>.validate)], node_files: list[str | ~os.PathLike[str]], edge_files: list[str | ~os.PathLike[str]], other_files: list[str | ~os.PathLike[str]], data_dir: str | ~os.PathLike[str])

Bases: BaseModel

Collection of information related to Dataset objects in the configuration.

data_dir: str | PathLike[str]

edge_files: list[str | PathLike[str]]

label: validate)]

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

node_files: list[str | PathLike[str]]

other_files: list[str | PathLike[str]]

spras.config.revision module

The revision is an optional hash associated to all files in the designated output directory to make sure that file _names_ are immutable. We attach the revision to three labels:

Datasets
Gold standards
Algorithms

In the future, the spras revision may change depending on what files are effected (e.g specific algorithms will have specific revisions that change as they get updated) to avoid unnecessary running in the Reed-CompBio/spras-benchmarking repository.

This is an optional feature, as the spras_revision function below is dependent on a RECORD file (described in the docstring associated with spras_revision.)

We provide the convenient attach_spras_revision used in ./config.py, and detach_spras_revision used to get rid of the revision for algorithms specifically.

spras.config.revision.attach_spras_revision(immutable_files: bool, label: str) → str

Attaches the SPRAS revision to a label. This function signature may become more complex as specific labels get versioned.

@param label: The label to attach the SPRAS revision to. @param immutable_files: if False, this function is equivalent to id.

spras.config.revision.detach_spras_revision(immutable_files: bool, attached_label: str) → str: The inverse of attach_spras_revision.

spras.config.revision.spras_revision() → str

Gets the current revision of SPRAS.

Note: This is not dependent on the SPRAS release version number nor the git commit, but rather solely on the PyPA RECORD file, (https://packaging.python.org/en/latest/specifications/recording-installed-packages/#the-record-file), which contains hashes of all of the installed SPRAS files [excluding RECORD itself], and is also included in the package distribution. This means that, when developing SPRAS, spras_revision will be updated when spras is initially installed. However, for editable pip installs (e.g. from pip install -e .), the spras_revision will not be updated, as the RECORD file only contains metadata: https://setuptools.pypa.io/en/latest/userguide/development_mode.html.

spras.config.schema module

Contains the raw pydantic schema for the configuration file.

Using Pydantic as our backing config parser allows us to declaratively type our config, giving us more robust user errors with guarantees that parts of the config exist after parsing it through Pydantic.

We declare models using two classes here: - BaseModel (docs: https://docs.pydantic.dev/latest/concepts/models/) - CaseInsensitiveEnum (see ./util.py)

class spras.config.schema.Analysis(*, summary: ~spras.config.schema.SummaryAnalysis = SummaryAnalysis(include=False), cytoscape: ~spras.config.schema.CytoscapeAnalysis = CytoscapeAnalysis(include=False), ml: ~spras.config.schema.MlAnalysis = MlAnalysis(include=False, aggregate_per_algorithm=False, components=2, labels=True, kde=False, remove_empty_pathways=False, linkage=<MlLinkage.ward: 'ward'>, metric=<MlMetric.euclidean: 'euclidean'>), evaluation: ~spras.config.schema.EvaluationAnalysis = EvaluationAnalysis(include=False, aggregate_per_algorithm=False))

Bases: BaseModel

cytoscape: CytoscapeAnalysis

evaluation: EvaluationAnalysis

ml: MlAnalysis

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

summary: SummaryAnalysis

class spras.config.schema.CytoscapeAnalysis(*, include: bool)

Bases: BaseModel

include: bool

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class spras.config.schema.EvaluationAnalysis(*, include: bool, aggregate_per_algorithm: bool = False)

Bases: BaseModel

aggregate_per_algorithm: bool

include: bool

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class spras.config.schema.GoldStandard(*, label: ~typing.Annotated[str, ~pydantic.functional_validators.AfterValidator(func=~spras.config.util.label_validator.<locals>.validate)], node_files: list[str] = [], edge_files: list[str] = [], data_dir: str, dataset_labels: list[str])

Bases: BaseModel

data_dir: str

dataset_labels: list[str]

edge_files: list[str]

label: validate)]

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

node_files: list[str]

class spras.config.schema.Locations(*, reconstruction_dir: str)

Bases: BaseModel

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

reconstruction_dir: str

class spras.config.schema.MlAnalysis(*, include: bool, aggregate_per_algorithm: bool = False, components: int = 2, labels: bool = True, kde: bool = False, remove_empty_pathways: bool = False, linkage: MlLinkage = MlLinkage.ward, metric: MlMetric = MlMetric.euclidean)

Bases: BaseModel

aggregate_per_algorithm: bool

components: int

include: bool

kde: bool

labels: bool

linkage: MlLinkage

metric: MlMetric

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

remove_empty_pathways: bool

class spras.config.schema.MlLinkage(*values)

Bases: CaseInsensitiveEnum

average = 'average'

complete = 'complete'

single = 'single'

ward = 'ward'

class spras.config.schema.MlMetric(*values)

Bases: CaseInsensitiveEnum

cosine = 'cosine'

euclidean = 'euclidean'

manhattan = 'manhattan'

class spras.config.schema.RawConfig(*, containers: ~spras.config.container_schema.ContainerSettings, immutable_files: bool = False, hash_length: int = 7, algorithms: list[~typing.Annotated[~spras.config.algorithms.allpairsModel | ~spras.config.algorithms.bowtiebuilderModel | ~spras.config.algorithms.diamondModel | ~spras.config.algorithms.dominoModel | ~spras.config.algorithms.meoModel | ~spras.config.algorithms.mincostflowModel | ~spras.config.algorithms.omicsintegrator1Model | ~spras.config.algorithms.omicsintegrator2Model | ~spras.config.algorithms.pathlinkerModel | ~spras.config.algorithms.responsenetModel | ~spras.config.algorithms.rwrModel | ~spras.config.algorithms.strwrModel, FieldInfo(annotation=NoneType, required=True, discriminator='name')]], datasets: list[~spras.config.dataset.DatasetSchema], gold_standards: list[~spras.config.schema.GoldStandard] = [], analysis: ~spras.config.schema.Analysis = Analysis(summary=SummaryAnalysis(include=False), cytoscape=CytoscapeAnalysis(include=False), ml=MlAnalysis(include=False, aggregate_per_algorithm=False, components=2, labels=True, kde=False, remove_empty_pathways=False, linkage=<MlLinkage.ward: 'ward'>, metric=<MlMetric.euclidean: 'euclidean'>), evaluation=EvaluationAnalysis(include=False, aggregate_per_algorithm=False)), reconstruction_settings: ~spras.config.schema.ReconstructionSettings)

Bases: BaseModel

algorithms: list[Annotated[allpairsModel | bowtiebuilderModel | diamondModel | dominoModel | meoModel | mincostflowModel | omicsintegrator1Model | omicsintegrator2Model | pathlinkerModel | responsenetModel | rwrModel | strwrModel, FieldInfo(annotation=NoneType, required=True, discriminator='name')]]

analysis: Analysis

containers: ContainerSettings

datasets: list[DatasetSchema]

gold_standards: list[GoldStandard]

hash_length: int: The length of the hash used to identify a parameter combination

immutable_files: bool

If enabled, this tags all files with their local file version. Most files do not have a specific version, and by default, this will be the hash of all the SPRAS files in the PyPA installation. This option will not work if SPRAS was not installed in a PyPA-compliant manner (PyPA-compliant installations include but are not limited to pip, poetry, uv, conda, pixi.)

By default, this is disabled, as it can make output file names confusing.

model_config: ClassVar[ConfigDict] = {'extra': 'forbid', 'use_attribute_docstrings': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

reconstruction_settings: ReconstructionSettings

class spras.config.schema.ReconstructionSettings(*, locations: Locations)

Bases: BaseModel

locations: Locations

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class spras.config.schema.SummaryAnalysis(*, include: bool)

Bases: BaseModel

include: bool

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

spras.config.util module

General config utilities. This is the only config file that should be imported by algorithms, and algorithms should only import this config file.

class spras.config.util.CaseInsensitiveEnum(new_class_name, /, names, *, module=None, qualname=None, type=None, start=1, boundary=None)

Bases: str, Enum

We prefer this over Enum to make sure the config parsing is more relaxed when it comes to string enum values.

class spras.config.util.Empty

Bases: BaseModel

The empty base model. Used for specifying that an algorithm takes no parameters, yet is deterministic.

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

spras.config.util.label_validator(name: str): A validator takes in a label and ensures that it contains only letters, numbers, or underscores.

spras.config package

Submodules

spras.config.algorithms module

spras.config.config module

spras.config.container_schema module

spras.config.dataset module

spras.config.revision module

spras.config.schema module

spras.config.util module

Module contents