API Reference¶
Here you can find documentation on all classes and their methods in Surround.
Assembler¶
-
class
surround.assembler.
Assembler
(assembler_name='', config=None)[source]¶ Class responsible for assembling and executing a Surround pipeline.
Responsibilities:
- Encapsulate the configuration data and pipeline stages
- Load configuration from a specified module
- Run the pipeline with input data in predict/batch/train mode
For more information on this process, see the About page.
Example:
assembler = Assembler("Example pipeline") assembler.set_stages([PreFilter(), PredictStage(), PostFilter()]) assembler.init_assembler(batch_mode=False) data = AssemblyState("some data") assembler.run(data, is_training=False)
Batch-predict mode:
assembler.init_assembler(batch_mode=True) assembler.run(data, is_training=False)
Training mode:
assembler.init_assembler(batch_mode=True) assembler.run(data, is_training=True)
Predict/Estimate mode:
assembler.init_assembler(batch_mode=False) assembler.run(data, is_training=False)
Constructor for an Assembler pipeline:
Parameters: - assembler_name (str) – The name of the pipeline
- config – Surround Config object
-
init_assembler
()[source]¶ Initializes the assembler and all of it’s stages.
Calls the
surround.stage.Stage.initialise()
method of all stages and the estimator.Note
Should be called after
surround.assembler.Assembler.set_config()
.Returns: whether the initialisation was successful Return type: bool
-
load_config
(module)[source]¶ Given a module contained in the root of the project, create an instance of
surround.config.Config
loading configuration data from theconfig.yaml
found in the project, and use this configuration for the pipeline.Note
Should be called before
surround.assembler.Assemble.init_assembler()
Parameters: module (str) – name of the module
-
run
(state=None, mode=<RunMode.PREDICT: 2>)[source]¶ Run the pipeline using the input data provided.
If
is_training
is set toTrue
then when it gets to the execution of the estimator, it will use thesurround.stage.Estimator.fit()
method instead.If
surround.enable_stage_output_dump
is enabled in the Config instance then each stage and estimator’ssurround.stage.Stage.dump_output()
method will be called.This method doesn’t return anything, instead results should be stored in the
state
object passed in the parameters.Parameters: - state (
surround.State
) – Data passed between each stage in the pipeline - is_training (bool) – Run the pipeline in training mode or not
- state (
-
set_config
(config)[source]¶ Set the configuration data to be used during pipeline execution.
Note
Should be called before
surround.assembler.Assembler.init_assembler()
.Parameters: config ( surround.config.Config
) – the configuration data
-
set_finaliser
(finaliser)[source]¶ Set the final stage that will be executed no matter how the pipeline runs. This will be executed even when the pipeline fails or throws an error.
Parameters: finaliser ( surround.stage.Stage
) – the final stage instance
-
set_stages
(stages)[source]¶ Set the stages to be executed one after the other in the pipeline.
Parameters: stages (list of surround.stage.Stage
) – list of stages to execute
Config¶
-
class
surround.config.
Config
(project_root=None, package_path=None, auto_load=False)[source]¶ An iterable dictionary class that loads and stores all the configuration settings from both default and project YAML files and environment variables. Primarily used in stages to retrieve configuration data set for development/production.
Responsibilities:
- Parse the config.yaml file and store the data as key-value pairs.
- Allow environment variables override data loaded from file/dict (must be prefixed with
SURROUND_
). - Provide READ-ONLY access to the stored config values via
[]
operator and iteration.
Example usage:
config = Config() config.read_from_dict({ "debug": True }) config.read_config_files(["config.yaml"]) if config["debug"]: # Do debug stuff for key, value in config: # Iterate over all data
You could then override the above configuration using the systems environment variables, just prefix the var with SURROUND_ like so:
SURROUND_DEBUG=False
It also supports overriding nested configuration data, for example with the following config:
predict: debug: True
We can override the above with the following environment variable:
SURRROUND_PREDICT_DEBUG=False
Constructor of the Config class, loads the default YAML file into storage. If the
project_root
is provided then the project’s config.yaml file is also loaded into configuration.The default config file (defaults.yaml) can be found in the same directory as the config.py script. The project config file (config.yaml) can be found in the root of the project folder.
Parameters: - project_root (str) – path to the root directory of the surround project (default: None)
- package_path (str) – path to the root directory of the package that contains the surround project (default: None)
- auto_load (bool) – Attempt to load the config.yaml file from the Surround project in the current directory (default: False)
-
get_dict
()[source]¶ Returns the configuration data in a dictionary
Returns: dictionary of the configuration data Return type: dict
-
get_path
(path)[source]¶ Returns value that can be found at the key path provided (useful for nested values).
For example:
config.get_path('surround.stages') == config['surround']['stages'] --> True
Parameters: path (str) – path to the value in storage Returns: the value found at the path or none if not found Return type: any
State¶
-
class
surround.
State
[source]¶ Stores the data to be passed between each stage in a pipeline. Each stage is responsible for setting the attributes to this class.
Formerly know as
SurroundData
.Attributes:
- stage_metadata (
list
) - information that can be used to identify the stage - execution_time (
str
) - how long it took to execute the entire pipeline - errors (
list
) - list of error messages (stops the pipeline when appended to) - warnings (
list
) - list of warning messages (displayed in console)
Example:
class AssemblyState(State): # Extra attributes must be defined before the pipeline is ran! input_data = None output_data = None def __init__(self, input_data) self.input_data = input_data class Predict(Estimator): # Do prediction here pipeline = Assembler("Example") .set_stages([Predict()]) pipeline.init_assembler() data = PipelineData("received data") pipeline.run(data) print(data.output_data)
Note
This class is frozen when the pipeline is being ran. This means that an exception will be thrown if a new attribute is added during pipeline execution.
- stage_metadata (
Stage¶
-
class
surround.stage.
Stage
[source]¶ Base class of all stages in a Surround pipeline.
See the following class for more information:
-
dump_output
(state, config)[source]¶ Dump the output of the stage after the stage has transformed the data.
Note
This is called by
surround.assembler.Assembler.run()
(when dumping output is requested).Parameters: - state (Instance or child of the
surround.State
class) – Stores intermediate data from each stage in the pipeline - config (
surround.config.Config
) – Config of the pipeline
- state (Instance or child of the
-
initialise
(config)[source]¶ Initialise the stage, this may be loading a model or loading data.
Note
This is called by
surround.assembler.Assembler.init_assembler()
.Parameters: config ( surround.config.Config
) – Contains the settings for each stage
-
Estimator¶
-
class
surround.stage.
Estimator
[source]¶ Base class for an estimator in a Surround pipeline. Responsible for performing estimation or training using the input data.
This stage is executed by
surround.assembler.Assembler.run()
.Example:
class Predict(Estimator): def initialise(self, config): self.model = load_model(os.path.join(config["models_path"], "model.pb")) def estimate(self, state, config): state.output_data = run_model(self.model) def fit(self, state, config): state.output_data = train_model(self.model)
-
estimate
(state, config)[source]¶ Process input data and store estimated values.
Note
This method is ONLY called by
surround.assembler.Assembler.run()
when running in predict/batch-predict mode.Parameters: - state (Instance or child of the
surround.State
class) – Stores intermediate data from each stage in the pipeline - config (
surround.config.Config
) – Contains the settings for each stage
- state (Instance or child of the
-
fit
(state, config)[source]¶ Train a model using the input data.
Note
This method is ONLY called by
surround.assembler.Assembler.run()
when running in training mode.Parameters: - state (Instance or child of the
surround.State
class) – Stores intermediate data from each stage in the pipeline - config (
surround.config.Config
) – Contains the settings for each stage
- state (Instance or child of the
-
Runner¶
-
class
surround.runners.
Runner
(assembler=None)[source]¶ Base class for runners which are responsible for:
- Initializing an
surround.assembler.Assembler
. - Loading/preparing input data.
- Running the
surround.assembler.Assembler
.
Example batch runner:
class BatchRunner(Runner): def load_data(self, mode, config): state = AssemblyState() if mode == RunMode.TRAIN: state.input_data = load_files('training_set') else: state.input_data = load_files('predict_set') return state
Note
You get a Batch Runner and Web Runner (if web requested) when you generate a project using the CLI tool.
Parameters: assembler ( surround.assembler.Assembler
) – The assembler the runner will execute-
load_data
(mode, config)[source]¶ Load the data and prepare it to be fed into the
surround.assembler.Assembler
.Parameters: - mode (
surround.runners.RunMode
) – the mode the assembly was run in (batch, train, predict, web) - config (
surround.config.Config
) – the configuration of the assembly
- mode (
-
run
(mode=<RunMode.PREDICT: 2>)[source]¶ Prepare data and execute the
surround.assembler.Assembler
.Parameters: is_training (bool) – Run the pipeline in training mode or not
-
set_assembler
(assembler)[source]¶ Set the Assembler instance the runner will execute.
Parameters: assembler ( surround.assembler.Assembler
) – the Assembler instance
- Initializing an
Data Container¶
-
class
surround.data.container.
DataContainer
(path=None, metadata_version='v0.1')[source]¶ Represents a data container which holds both data and metadata.
Responsibilities:
- Import files into a container and export
- Load existing containers
- Extract files
Parameters: -
export
(export_to)[source]¶ Import all staged files into the container, hash the contents, set the hash to the metadata and import the metadata file.
Parameters: export_to (str) – path to export the file to
-
extract_all
(extract_to)[source]¶ Extract all files in the current data container to a path on disk
Parameters: extract_to (str) – path to extract files to Returns: true on success, false otherwise Return type: bool
-
extract_file
(internal_path, extract_path='.')[source]¶ Extract a file in the current data container to a path on disk
Parameters: - internal_path (str) – path inside the container
- extract_path – path to extract file to
Returns: true on success, false otherwise
Return type:
-
extract_file_bytes
(path)[source]¶ Extract the bytes of a file in the current data container
Parameters: path (str) – path inside the container Returns: the bytes extracted or None if it doesn’t exist Return type: bytes
-
extract_files
(internal_paths, extract_path='.')[source]¶ Extract files in the current data container to a path on disk
Parameters: Returns: true on success, false otherwise
Return type:
-
file_exists
(path)[source]¶ Checks whether file exists in current data container
Returns: true if the file exists Return type: bool
-
get_files
()[source]¶ Returns all the files in the current data container
Returns: list of the files Return type: list
-
import_directory
(path, generate_metadata=True, reimport=True)[source]¶ Stage the directory provided for importing when export is requested.
Parameters:
-
import_file
(import_path, internal_path, generate_metadata=True)[source]¶ Stage file for importing when the next export operation is called.
Parameters:
Metadata¶
-
class
surround.data.metadata.
Metadata
(version='v0.1')[source]¶ Represents metadata of a Data Container.
Responsibilities:
- Create metadata, explorting to YAML string and/or file
- Generate default metadata as per schema
- Automatically generate values to fields based on files given
- Get/set properties
Parameters: version (str) – the version of the schema to use (default: v0.1) -
generate_default
(version)[source]¶ Generate a dictionary with all required fields created as per the schema.
Parameters: version (str) – which version of the schema to use Returns: the dictionary with default values Return type: dict
-
generate_from_directory
(directory)[source]¶ Automatically generate metadata from a directory, such as:
- Formats (mime types)
- Types (types from vocab)
- Group manifests (each root level directory is considered a group)
Parameters: directory (str) – path to the directory to generate from
-
generate_from_file
(filepath)[source]¶ Automatically generate metadata from a single file
Parameters: filepath (str) – path to the file
-
generate_from_files
(files, root, root_level_dirs)[source]¶ Automatically generate metadata from a list of files such as:
- Formats (mime types)
- Types (types from vocab)
- Group manifests (each root level directory is considered a group)
Parameters:
-
generate_manifest_for_group
(group_name, files, formats=None)[source]¶ Generate a manifest for a group of files where the manifest contains:
- path
- description
- language
- formats (mime types)
- types (from vocab)
Store the manifest in the metadata storage plus return it.
Parameters: Returns: the manifest created
Return type:
-
get_property
(path)[source]¶ Get the value of a property given a path in dot notation e.g. summary.title
metadata.get_property('summary.title')
would retrieveTest name
from the following:summary: title: Test name
Parameters: path (str) – path to the property using dot notation Returns: the value of the property, none otherwise Return type: any
-
load_from_path
(path)[source]¶ Load metadata from file (YAML)
Parameters: path (str) – path to the YAML file
-
save_to_data
()[source]¶ Returns metadata as string formatted in YAML
Returns: the data in YAML string Return type: str
-
save_to_json
(indent=4)[source]¶ Returns metadata as string formatted in JSON
Parameters: indent (int) – number of spaces in indentations Returns: the data in JSON format Return type: str