API Reference

Here you can find documentation on all classes and their methods in Surround.

Assembler

class surround.assembler.Assembler(assembler_name='', config=None)[source]

Class responsible for assembling and executing a Surround pipeline.

Responsibilities:

  • Encapsulate the configuration data and pipeline stages
  • Load configuration from a specified module
  • Run the pipeline with input data in predict/batch/train mode

For more information on this process, see the About page.

Example:

assembler = Assembler("Example pipeline")
assembler.set_stages([PreFilter(), PredictStage(), PostFilter()])
assembler.init_assembler(batch_mode=False)

data = AssemblyState("some data")
assembler.run(data, is_training=False)

Batch-predict mode:

assembler.init_assembler(batch_mode=True)
assembler.run(data, is_training=False)

Training mode:

assembler.init_assembler(batch_mode=True)
assembler.run(data, is_training=True)

Predict/Estimate mode:

assembler.init_assembler(batch_mode=False)
assembler.run(data, is_training=False)

Constructor for an Assembler pipeline:

Parameters:
  • assembler_name (str) – The name of the pipeline
  • config – Surround Config object
init_assembler()[source]

Initializes the assembler and all of it’s stages.

Calls the surround.stage.Stage.initialise() method of all stages and the estimator.

Note

Should be called after surround.assembler.Assembler.set_config().

Returns:whether the initialisation was successful
Return type:bool
load_config(module)[source]

Given a module contained in the root of the project, create an instance of surround.config.Config loading configuration data from the config.yaml found in the project, and use this configuration for the pipeline.

Note

Should be called before surround.assembler.Assemble.init_assembler()

Parameters:module (str) – name of the module
run(state=None, mode=<RunMode.PREDICT: 2>)[source]

Run the pipeline using the input data provided.

If is_training is set to True then when it gets to the execution of the estimator, it will use the surround.stage.Estimator.fit() method instead.

If surround.enable_stage_output_dump is enabled in the Config instance then each stage and estimator’s surround.stage.Stage.dump_output() method will be called.

This method doesn’t return anything, instead results should be stored in the state object passed in the parameters.

Parameters:
  • state (surround.State) – Data passed between each stage in the pipeline
  • is_training (bool) – Run the pipeline in training mode or not
set_config(config)[source]

Set the configuration data to be used during pipeline execution.

Note

Should be called before surround.assembler.Assembler.init_assembler().

Parameters:config (surround.config.Config) – the configuration data
set_finaliser(finaliser)[source]

Set the final stage that will be executed no matter how the pipeline runs. This will be executed even when the pipeline fails or throws an error.

Parameters:finaliser (surround.stage.Stage) – the final stage instance
set_stages(stages)[source]

Set the stages to be executed one after the other in the pipeline.

Parameters:stages (list of surround.stage.Stage) – list of stages to execute

Config

class surround.config.Config(project_root=None, package_path=None, auto_load=False)[source]

An iterable dictionary class that loads and stores all the configuration settings from both default and project YAML files and environment variables. Primarily used in stages to retrieve configuration data set for development/production.

Responsibilities:

  • Parse the config.yaml file and store the data as key-value pairs.
  • Allow environment variables override data loaded from file/dict (must be prefixed with SURROUND_).
  • Provide READ-ONLY access to the stored config values via [] operator and iteration.

Example usage:

config = Config()
config.read_from_dict({ "debug": True })
config.read_config_files(["config.yaml"])

if config["debug"]:
    # Do debug stuff

for key, value in config:
    # Iterate over all data

You could then override the above configuration using the systems environment variables, just prefix the var with SURROUND_ like so:

SURROUND_DEBUG=False

It also supports overriding nested configuration data, for example with the following config:

predict:
    debug: True

We can override the above with the following environment variable:

SURRROUND_PREDICT_DEBUG=False

Constructor of the Config class, loads the default YAML file into storage. If the project_root is provided then the project’s config.yaml file is also loaded into configuration.

The default config file (defaults.yaml) can be found in the same directory as the config.py script. The project config file (config.yaml) can be found in the root of the project folder.

Parameters:
  • project_root (str) – path to the root directory of the surround project (default: None)
  • package_path (str) – path to the root directory of the package that contains the surround project (default: None)
  • auto_load (bool) – Attempt to load the config.yaml file from the Surround project in the current directory (default: False)
get_dict()[source]

Returns the configuration data in a dictionary

Returns:dictionary of the configuration data
Return type:dict
get_path(path)[source]

Returns value that can be found at the key path provided (useful for nested values).

For example:

config.get_path('surround.stages') == config['surround']['stages']
--> True
Parameters:path (str) – path to the value in storage
Returns:the value found at the path or none if not found
Return type:any
static instance()[source]

Static method which returns the a singleton instance of Config.

read_config_files(yaml_files)[source]

Parses the YAML files provided and stores their key-value pairs in config.

Parameters:yaml_files (list) – multiple paths to the YAML files to load
Returns:true on success, throws IOError on failure
Return type:bool
read_from_dict(config_dict)[source]

Retrieve all key-value pairs from the dict provided and store in config.

Parameters:config_dict (dict) – configuration settings to be added to storage
Returns:true on success, throws exception on failure (TypeError)
Return type:bool

State

class surround.State[source]

Stores the data to be passed between each stage in a pipeline. Each stage is responsible for setting the attributes to this class.

Formerly know as SurroundData.

Attributes:

  • stage_metadata (list) - information that can be used to identify the stage
  • execution_time (str) - how long it took to execute the entire pipeline
  • errors (list) - list of error messages (stops the pipeline when appended to)
  • warnings (list) - list of warning messages (displayed in console)

Example:

class AssemblyState(State):
    # Extra attributes must be defined before the pipeline is ran!
    input_data = None
    output_data = None

    def __init__(self, input_data)
        self.input_data = input_data


class Predict(Estimator):
    # Do prediction here

pipeline = Assembler("Example")
            .set_stages([Predict()])
pipeline.init_assembler()

data = PipelineData("received data")
pipeline.run(data)

print(data.output_data)

Note

This class is frozen when the pipeline is being ran. This means that an exception will be thrown if a new attribute is added during pipeline execution.

Stage

class surround.stage.Stage[source]

Base class of all stages in a Surround pipeline.

See the following class for more information:

dump_output(state, config)[source]

Dump the output of the stage after the stage has transformed the data.

Note

This is called by surround.assembler.Assembler.run() (when dumping output is requested).

Parameters:
initialise(config)[source]

Initialise the stage, this may be loading a model or loading data.

Parameters:config (surround.config.Config) – Contains the settings for each stage
operate(state, config)[source]

Main function to be called in an assembly. :param state: Contains all pipeline state including input and output data :param config: Config for the assembly

Estimator

class surround.stage.Estimator[source]

Base class for an estimator in a Surround pipeline. Responsible for performing estimation or training using the input data.

This stage is executed by surround.assembler.Assembler.run().

Example:

class Predict(Estimator):
    def initialise(self, config):
        self.model = load_model(os.path.join(config["models_path"], "model.pb"))

    def estimate(self, state, config):
        state.output_data = run_model(self.model)

    def fit(self, state, config):
        state.output_data = train_model(self.model)
estimate(state, config)[source]

Process input data and store estimated values.

Note

This method is ONLY called by surround.assembler.Assembler.run() when running in predict/batch-predict mode.

Parameters:
  • state (Instance or child of the surround.State class) – Stores intermediate data from each stage in the pipeline
  • config (surround.config.Config) – Contains the settings for each stage
fit(state, config)[source]

Train a model using the input data.

Note

This method is ONLY called by surround.assembler.Assembler.run() when running in training mode.

Parameters:
  • state (Instance or child of the surround.State class) – Stores intermediate data from each stage in the pipeline
  • config (surround.config.Config) – Contains the settings for each stage

Runner

class surround.runners.Runner(assembler=None)[source]

Base class for runners which are responsible for:

Example batch runner:

class BatchRunner(Runner):
    def load_data(self, mode, config):
        state = AssemblyState()

        if mode == RunMode.TRAIN:
            state.input_data = load_files('training_set')
        else:
            state.input_data = load_files('predict_set')

        return state

Note

You get a Batch Runner and Web Runner (if web requested) when you generate a project using the CLI tool.

Parameters:assembler (surround.assembler.Assembler) – The assembler the runner will execute
load_data(mode, config)[source]

Load the data and prepare it to be fed into the surround.assembler.Assembler.

Parameters:
  • mode (surround.runners.RunMode) – the mode the assembly was run in (batch, train, predict, web)
  • config (surround.config.Config) – the configuration of the assembly
run(mode=<RunMode.PREDICT: 2>)[source]

Prepare data and execute the surround.assembler.Assembler.

Parameters:is_training (bool) – Run the pipeline in training mode or not
set_assembler(assembler)[source]

Set the Assembler instance the runner will execute.

Parameters:assembler (surround.assembler.Assembler) – the Assembler instance

Data Container

class surround.data.container.DataContainer(path=None, metadata_version='v0.1')[source]

Represents a data container which holds both data and metadata.

Responsibilities:

  • Import files into a container and export
  • Load existing containers
  • Extract files
Parameters:
  • path (str) – path for container to load (default: None)
  • metadata_version (str) – the version of metadata being used (default: v0.1)
export(export_to)[source]

Import all staged files into the container, hash the contents, set the hash to the metadata and import the metadata file.

Parameters:export_to (str) – path to export the file to
extract_all(extract_to)[source]

Extract all files in the current data container to a path on disk

Parameters:extract_to (str) – path to extract files to
Returns:true on success, false otherwise
Return type:bool
extract_file(internal_path, extract_path='.')[source]

Extract a file in the current data container to a path on disk

Parameters:
  • internal_path (str) – path inside the container
  • extract_path – path to extract file to
Returns:

true on success, false otherwise

Return type:

bool

extract_file_bytes(path)[source]

Extract the bytes of a file in the current data container

Parameters:path (str) – path inside the container
Returns:the bytes extracted or None if it doesn’t exist
Return type:bytes
extract_files(internal_paths, extract_path='.')[source]

Extract files in the current data container to a path on disk

Parameters:
  • internal_paths (list) – list of files to extract
  • extract_path (str) – path to extract files to
Returns:

true on success, false otherwise

Return type:

bool

file_exists(path)[source]

Checks whether file exists in current data container

Returns:true if the file exists
Return type:bool
get_files()[source]

Returns all the files in the current data container

Returns:list of the files
Return type:list
import_directory(path, generate_metadata=True, reimport=True)[source]

Stage the directory provided for importing when export is requested.

Parameters:
  • path (str) – the directory of files to import
  • generate_metadata (bool) – whether metadata should be generated for this folder
  • reimport (bool) – whether or not files that are already staged should be staged again
import_file(import_path, internal_path, generate_metadata=True)[source]

Stage file for importing when the next export operation is called.

Parameters:
  • import_path (str) – path to the file on the users drive
  • internal_path (str) – path to the file inside the container
  • generate_metadata (bool) – whether metadata should be generated for this file
import_files(files, generate_metadata=True)[source]

Stage the list of files for importing when export is requested.

Parameters:
  • files (list) – list of files to import
  • generate_metadata (bool) – whether metadata should be generated for this file
load(path)[source]

Load an existing data container, preparing it for extracting files.

Parameters:path (str) – path to the container

Metadata

class surround.data.metadata.Metadata(version='v0.1')[source]

Represents metadata of a Data Container.

Responsibilities:

  • Create metadata, explorting to YAML string and/or file
  • Generate default metadata as per schema
  • Automatically generate values to fields based on files given
  • Get/set properties
Parameters:version (str) – the version of the schema to use (default: v0.1)
generate_default(version)[source]

Generate a dictionary with all required fields created as per the schema.

Parameters:version (str) – which version of the schema to use
Returns:the dictionary with default values
Return type:dict
generate_from_directory(directory)[source]

Automatically generate metadata from a directory, such as:

  • Formats (mime types)
  • Types (types from vocab)
  • Group manifests (each root level directory is considered a group)
Parameters:directory (str) – path to the directory to generate from
generate_from_file(filepath)[source]

Automatically generate metadata from a single file

Parameters:filepath (str) – path to the file
generate_from_files(files, root, root_level_dirs)[source]

Automatically generate metadata from a list of files such as:

  • Formats (mime types)
  • Types (types from vocab)
  • Group manifests (each root level directory is considered a group)
Parameters:
  • files (list) – list of files to generate from
  • root (str) – path to the root of the folder container the files
  • root_level_dirs (list) – list of directories in the root
generate_manifest_for_group(group_name, files, formats=None)[source]

Generate a manifest for a group of files where the manifest contains:

  • path
  • description
  • language
  • formats (mime types)
  • types (from vocab)

Store the manifest in the metadata storage plus return it.

Parameters:
  • group_name (str) – name of the group
  • files (list) – list of files in the group
  • formats (list) – list of formats in the group
Returns:

the manifest created

Return type:

dict

get_property(path)[source]

Get the value of a property given a path in dot notation e.g. summary.title

metadata.get_property('summary.title') would retrieve Test name from the following:

summary:
    title: Test name
Parameters:path (str) – path to the property using dot notation
Returns:the value of the property, none otherwise
Return type:any
load_from_data(data)[source]

Load metadata from a YAML string

Parameters:data (str) – YAML string
load_from_path(path)[source]

Load metadata from file (YAML)

Parameters:path (str) – path to the YAML file
save_to_data()[source]

Returns metadata as string formatted in YAML

Returns:the data in YAML string
Return type:str
save_to_json(indent=4)[source]

Returns metadata as string formatted in JSON

Parameters:indent (int) – number of spaces in indentations
Returns:the data in JSON format
Return type:str
save_to_json_file(path, indent=4)[source]

Saves metadata to JSON file

Parameters:
  • path (str) – path to file to export to
  • indent (int) – number of spaces in indentations
save_to_path(path)[source]

Save metadata to YAML file

Parameters:path (str) – path to save file to
set_property(path, value)[source]

Set the value of a property given a path in dot notation e.g. summary.title

metadata.set_property('summary.title') would set the title of the data container.

Parameters:
  • path (str) – path to the property in dot notation
  • value (any) – value to set to the property