API Reference¶

Here you can find documentation on all classes and their methods in Surround.

Assembler¶

class surround.assembler.Assembler(assembler_name='', config=None)[source]¶

Class responsible for assembling and executing a Surround pipeline.

Responsibilities:

Encapsulate the configuration data and pipeline stages
Load configuration from a specified module
Run the pipeline with input data in predict/batch/train mode

For more information on this process, see the About page.

Example:

assembler = Assembler("Example pipeline")
assembler.set_stages([PreFilter(), PredictStage(), PostFilter()])
assembler.init_assembler(batch_mode=False)

data = AssemblyState("some data")
assembler.run(data, is_training=False)

Batch-predict mode:

assembler.init_assembler(batch_mode=True)
assembler.run(data, is_training=False)

Training mode:

assembler.init_assembler(batch_mode=True)
assembler.run(data, is_training=True)

Predict/Estimate mode:

assembler.init_assembler(batch_mode=False)
assembler.run(data, is_training=False)

Constructor for an Assembler pipeline:

Parameters:	assembler_name (str) – The name of the pipeline config – Surround Config object

init_assembler()[source]¶

Initializes the assembler and all of it’s stages.

Calls the surround.stage.Stage.initialise() method of all stages and the estimator.

Note

Should be called after surround.assembler.Assembler.set_config().

Returns:	whether the initialisation was successful
Return type:	bool

load_config(module)[source]¶

Given a module contained in the root of the project, create an instance of surround.config.Config loading configuration data from the config.yaml found in the project, and use this configuration for the pipeline.

Note

Should be called before surround.assembler.Assemble.init_assembler()

Parameters:	module (str) – name of the module

run(state=None, mode=<RunMode.PREDICT: 2>)[source]¶

Run the pipeline using the input data provided.

If is_training is set to True then when it gets to the execution of the estimator, it will use the surround.stage.Estimator.fit() method instead.

If surround.enable_stage_output_dump is enabled in the Config instance then each stage and estimator’s surround.stage.Stage.dump_output() method will be called.

This method doesn’t return anything, instead results should be stored in the state object passed in the parameters.

Parameters:	state (`surround.State`) – Data passed between each stage in the pipeline is_training (bool) – Run the pipeline in training mode or not

set_config(config)[source]¶

Set the configuration data to be used during pipeline execution.

Note

Should be called before surround.assembler.Assembler.init_assembler().

Parameters:	config (`surround.config.Config`) – the configuration data

set_finaliser(finaliser)[source]¶

Set the final stage that will be executed no matter how the pipeline runs. This will be executed even when the pipeline fails or throws an error.

Parameters:	finaliser (`surround.stage.Stage`) – the final stage instance

set_stages(stages)[source]¶

Set the stages to be executed one after the other in the pipeline.

Parameters:	stages (list of `surround.stage.Stage`) – list of stages to execute

Config¶

class surround.config.Config(project_root=None, package_path=None, auto_load=False)[source]¶

An iterable dictionary class that loads and stores all the configuration settings from both default and project YAML files and environment variables. Primarily used in stages to retrieve configuration data set for development/production.

Responsibilities:

Parse the config.yaml file and store the data as key-value pairs.
Allow environment variables override data loaded from file/dict (must be prefixed with SURROUND_).
Provide READ-ONLY access to the stored config values via [] operator and iteration.

Example usage:

config = Config()
config.read_from_dict({ "debug": True })
config.read_config_files(["config.yaml"])

if config["debug"]:
    # Do debug stuff

for key, value in config:
    # Iterate over all data

You could then override the above configuration using the systems environment variables, just prefix the var with SURROUND_ like so:

SURROUND_DEBUG=False

It also supports overriding nested configuration data, for example with the following config:

predict:
    debug: True

We can override the above with the following environment variable:

SURRROUND_PREDICT_DEBUG=False

Constructor of the Config class, loads the default YAML file into storage. If the project_root is provided then the project’s config.yaml file is also loaded into configuration.

The default config file (defaults.yaml) can be found in the same directory as the config.py script. The project config file (config.yaml) can be found in the root of the project folder.

Parameters:	project_root (str) – path to the root directory of the surround project (default: None) package_path (str) – path to the root directory of the package that contains the surround project (default: None) auto_load (bool) – Attempt to load the config.yaml file from the Surround project in the current directory (default: False)

get_dict()[source]¶

Returns the configuration data in a dictionary

Returns:	dictionary of the configuration data
Return type:	dict

get_path(path)[source]¶

Returns value that can be found at the key path provided (useful for nested values).

For example:

config.get_path('surround.stages') == config['surround']['stages']
--> True

Parameters:	path (str) – path to the value in storage
Returns:	the value found at the path or none if not found
Return type:	any

static instance()[source]¶: Static method which returns the a singleton instance of Config.

read_config_files(yaml_files)[source]¶

Parses the YAML files provided and stores their key-value pairs in config.

Parameters:	yaml_files (list) – multiple paths to the YAML files to load
Returns:	true on success, throws `IOError` on failure
Return type:	bool

read_from_dict(config_dict)[source]¶

Retrieve all key-value pairs from the dict provided and store in config.

Parameters:	config_dict (dict) – configuration settings to be added to storage
Returns:	true on success, throws exception on failure (`TypeError`)
Return type:	bool

State¶

class surround.State[source]¶

Stores the data to be passed between each stage in a pipeline. Each stage is responsible for setting the attributes to this class.

Formerly know as SurroundData.

Attributes:

stage_metadata (list) - information that can be used to identify the stage
execution_time (str) - how long it took to execute the entire pipeline
errors (list) - list of error messages (stops the pipeline when appended to)
warnings (list) - list of warning messages (displayed in console)

Example:

class AssemblyState(State):
    # Extra attributes must be defined before the pipeline is ran!
    input_data = None
    output_data = None

    def __init__(self, input_data)
        self.input_data = input_data


class Predict(Estimator):
    # Do prediction here

pipeline = Assembler("Example")
            .set_stages([Predict()])
pipeline.init_assembler()

data = PipelineData("received data")
pipeline.run(data)

print(data.output_data)

Note

This class is frozen when the pipeline is being ran. This means that an exception will be thrown if a new attribute is added during pipeline execution.

Stage¶

class surround.stage.Stage[source]¶

Base class of all stages in a Surround pipeline.

See the following class for more information:

surround.stage.Estimator

dump_output(state, config)[source]¶

Dump the output of the stage after the stage has transformed the data.

Note

This is called by surround.assembler.Assembler.run() (when dumping output is requested).

Parameters:	state (Instance or child of the `surround.State` class) – Stores intermediate data from each stage in the pipeline config (`surround.config.Config`) – Config of the pipeline

initialise(config)[source]¶

Initialise the stage, this may be loading a model or loading data.

Note

This is called by surround.assembler.Assembler.init_assembler().

Parameters:	config (`surround.config.Config`) – Contains the settings for each stage

operate(state, config)[source]¶: Main function to be called in an assembly. :param state: Contains all pipeline state including input and output data :param config: Config for the assembly

Estimator¶

class surround.stage.Estimator[source]¶

Base class for an estimator in a Surround pipeline. Responsible for performing estimation or training using the input data.

This stage is executed by surround.assembler.Assembler.run().

Example:

class Predict(Estimator):
    def initialise(self, config):
        self.model = load_model(os.path.join(config["models_path"], "model.pb"))

    def estimate(self, state, config):
        state.output_data = run_model(self.model)

    def fit(self, state, config):
        state.output_data = train_model(self.model)

estimate(state, config)[source]¶

Process input data and store estimated values.

Note

This method is ONLY called by surround.assembler.Assembler.run() when running in predict/batch-predict mode.

Parameters:	state (Instance or child of the `surround.State` class) – Stores intermediate data from each stage in the pipeline config (`surround.config.Config`) – Contains the settings for each stage

fit(state, config)[source]¶

Train a model using the input data.

Note

This method is ONLY called by surround.assembler.Assembler.run() when running in training mode.

Parameters:	state (Instance or child of the `surround.State` class) – Stores intermediate data from each stage in the pipeline config (`surround.config.Config`) – Contains the settings for each stage

Runner¶

class surround.runners.Runner(assembler=None)[source]¶

Base class for runners which are responsible for:

Initializing an surround.assembler.Assembler.
Loading/preparing input data.
Running the surround.assembler.Assembler.

Example batch runner:

class BatchRunner(Runner):
    def load_data(self, mode, config):
        state = AssemblyState()

        if mode == RunMode.TRAIN:
            state.input_data = load_files('training_set')
        else:
            state.input_data = load_files('predict_set')

        return state

Note

You get a Batch Runner and Web Runner (if web requested) when you generate a project using the CLI tool.

Parameters:	assembler (`surround.assembler.Assembler`) – The assembler the runner will execute

load_data(mode, config)[source]¶

Load the data and prepare it to be fed into the surround.assembler.Assembler.

Parameters:	mode (`surround.runners.RunMode`) – the mode the assembly was run in (batch, train, predict, web) config (`surround.config.Config`) – the configuration of the assembly

run(mode=<RunMode.PREDICT: 2>)[source]¶

Prepare data and execute the surround.assembler.Assembler.

Parameters:	is_training (bool) – Run the pipeline in training mode or not

set_assembler(assembler)[source]¶

Set the Assembler instance the runner will execute.

Parameters:	assembler (`surround.assembler.Assembler`) – the Assembler instance

Data Container¶

class surround.data.container.DataContainer(path=None, metadata_version='v0.1')[source]¶

Represents a data container which holds both data and metadata.

Responsibilities:

Import files into a container and export
Load existing containers
Extract files

Parameters:	path (str) – path for container to load (default: None) metadata_version (str) – the version of metadata being used (default: v0.1)

export(export_to)[source]¶

Import all staged files into the container, hash the contents, set the hash to the metadata and import the metadata file.

Parameters:	export_to (str) – path to export the file to

extract_all(extract_to)[source]¶

Extract all files in the current data container to a path on disk

Parameters:	extract_to (str) – path to extract files to
Returns:	true on success, false otherwise
Return type:	bool

extract_file(internal_path, extract_path='.')[source]¶

Extract a file in the current data container to a path on disk

Parameters:	internal_path (str) – path inside the container extract_path – path to extract file to
Returns:	true on success, false otherwise
Return type:	bool

extract_file_bytes(path)[source]¶

Extract the bytes of a file in the current data container

Parameters:	path (str) – path inside the container
Returns:	the bytes extracted or None if it doesn’t exist
Return type:	bytes

extract_files(internal_paths, extract_path='.')[source]¶

Extract files in the current data container to a path on disk

Parameters:	internal_paths (list) – list of files to extract extract_path (str) – path to extract files to
Returns:	true on success, false otherwise
Return type:	bool

file_exists(path)[source]¶

Checks whether file exists in current data container

Returns:	true if the file exists
Return type:	bool

get_files()[source]¶

Returns all the files in the current data container

Returns:	list of the files
Return type:	list

import_directory(path, generate_metadata=True, reimport=True)[source]¶

Stage the directory provided for importing when export is requested.

Parameters:	path (str) – the directory of files to import generate_metadata (bool) – whether metadata should be generated for this folder reimport (bool) – whether or not files that are already staged should be staged again

import_file(import_path, internal_path, generate_metadata=True)[source]¶

Stage file for importing when the next export operation is called.

Parameters:	import_path (str) – path to the file on the users drive internal_path (str) – path to the file inside the container generate_metadata (bool) – whether metadata should be generated for this file

import_files(files, generate_metadata=True)[source]¶

Stage the list of files for importing when export is requested.

Parameters:	files (list) – list of files to import generate_metadata (bool) – whether metadata should be generated for this file

load(path)[source]¶

Load an existing data container, preparing it for extracting files.

Parameters:	path (str) – path to the container

Metadata¶

class surround.data.metadata.Metadata(version='v0.1')[source]¶

Represents metadata of a Data Container.

Responsibilities:

Create metadata, explorting to YAML string and/or file
Generate default metadata as per schema
Automatically generate values to fields based on files given
Get/set properties

Parameters:	version (str) – the version of the schema to use (default: v0.1)

generate_default(version)[source]¶

Generate a dictionary with all required fields created as per the schema.

Parameters:	version (str) – which version of the schema to use
Returns:	the dictionary with default values
Return type:	dict

generate_from_directory(directory)[source]¶

Automatically generate metadata from a directory, such as:

Formats (mime types)
Types (types from vocab)
Group manifests (each root level directory is considered a group)

Parameters:	directory (str) – path to the directory to generate from

generate_from_file(filepath)[source]¶

Automatically generate metadata from a single file

Parameters:	filepath (str) – path to the file

generate_from_files(files, root, root_level_dirs)[source]¶

Automatically generate metadata from a list of files such as:

Formats (mime types)
Types (types from vocab)
Group manifests (each root level directory is considered a group)

Parameters:	files (list) – list of files to generate from root (str) – path to the root of the folder container the files root_level_dirs (list) – list of directories in the root

generate_manifest_for_group(group_name, files, formats=None)[source]¶

Generate a manifest for a group of files where the manifest contains:

path
description
language
formats (mime types)
types (from vocab)

Store the manifest in the metadata storage plus return it.

Parameters:	group_name (str) – name of the group files (list) – list of files in the group formats (list) – list of formats in the group
Returns:	the manifest created
Return type:	dict

get_property(path)[source]¶

Get the value of a property given a path in dot notation e.g. summary.title

metadata.get_property('summary.title') would retrieve Test name from the following:

summary:
    title: Test name

Parameters:	path (str) – path to the property using dot notation
Returns:	the value of the property, none otherwise
Return type:	any

load_from_data(data)[source]¶

Load metadata from a YAML string

Parameters:	data (str) – YAML string

load_from_path(path)[source]¶

Load metadata from file (YAML)

Parameters:	path (str) – path to the YAML file

save_to_data()[source]¶

Returns metadata as string formatted in YAML

Returns:	the data in YAML string
Return type:	str

save_to_json(indent=4)[source]¶

Returns metadata as string formatted in JSON

Parameters:	indent (int) – number of spaces in indentations
Returns:	the data in JSON format
Return type:	str

save_to_json_file(path, indent=4)[source]¶

Saves metadata to JSON file

Parameters:	path (str) – path to file to export to indent (int) – number of spaces in indentations

save_to_path(path)[source]¶

Save metadata to YAML file

Parameters:	path (str) – path to save file to

set_property(path, value)[source]¶

Set the value of a property given a path in dot notation e.g. summary.title

metadata.set_property('summary.title') would set the title of the data container.

Parameters:	path (str) – path to the property in dot notation value (any) – value to set to the property

API Reference¶

Assembler¶

Config¶

State¶

Stage¶

Estimator¶

Runner¶

Data Container¶

Metadata¶

Navigation

Related Topics