.. _getting-started: Getting Started =============== Installation ************ Prerequisites ^^^^^^^^^^^^^ - Python 3+ (Tested on 3.6.5) - Docker - Supports MacOS, Linux, and Windows Install via Pip ^^^^^^^^^^^^^^^ Run the following command to install the latest version of Surround:: $ pip3 install surround .. note:: If this doesn't work make sure you have pip installed. See `here `_ on how to install it. Now the Surround library and command-line tool should be installed! To make sure run the following command to test:: $ surround If it works then you are ready for the :ref:`project-setup` stage. .. _project-setup: Project Setup ************* Before we can create our first pipeline, we need to generate an empty Surround project. Use the following command to generate a new project:: $ surround init -p test_project -d "Our first pipeline" When it asks the following, respond with ``n`` (we'll cover this in later sections):: Does it require a web runner? (y/n) n This will create a new folder called ``test_project`` with the following file structure:: test_project ├── test_project/ │ ├── stages │ │ ├── __init__.py │ │ ├── input_validator.py │ │ ├── baseline.py │ │ └── assembler_state.py │ ├── __main__.py │ ├── __init__.py │ ├── config.yaml │ └── file_system_runner.py ├── input/ ├── docs/ ├── models/ ├── notebooks/ ├── output/ ├── scripts/ ├── spikes/ ├── tests/ ├── __main__.py ├── __init__.py ├── dodo.py ├── Dockerfile ├── requirements.txt └── README.md The generated project comes with an example pipeline that can be ran straight away using the command:: $ cd test_project $ surround run batchLocal Which should output the following:: INFO:surround.assembler:Starting 'baseline' INFO:surround.assembler:Validator InputValidator took 0:00:00 secs INFO:surround.assembler:Estimator Baseline took 0:00:00 secs Now you are ready for :ref:`create-first-pipeline`. .. seealso:: Not sure what a pipeline is? Checkout our :ref:`about` section first! .. _create-first-pipeline: Creating your first pipeline **************************** For our first Surround pipeline, we are going to do some very basic data transformation and convert the input string from lower case to upper case. This pipeline is going to consist of two stages, ``InputValidator`` and ``MakeUpperCase``. Open the script ``stages/validator.py`` and you should see the following code already generated:: from surround import Validator class InputValidator(Validator): def validate(self, state, config): if not state.input_data: raise ValueError("'input_data' is None") As you can see we are already given the ``InputValidator`` stage, we just need to edit the ``operate`` method to check if the input data is the correct data type (:class:`str`):: def validate(self, state, config): if not isinstance(state.input_data, str): # Raise an exception, this will stop the pipeline raise ValueError('Input is not a string!') Now we need to create our ``MakeUpperCase`` stage, so head to ``stages/baseline.py``, you should see:: from surround import Estimator class Baseline(Estimator): def estimate(self, state, config): state.output_data = state.input_data def fit(self, state, config): LOGGER.info("TODO: Train your model here") Make the following changes:: class MakeUpperCase(Estimator): def estimate(self, state, config): # Convert the input into upper case state.output_data = state.input_data.upper() # Print the output to the terminal (to check its working) LOGGER.info("Output: %s" % state.output_data) def fit(self, state, config): # Leave the fit method the same # We aren't doing any training in this guide LOGGER.info("TODO: Train your model here") Since we renamed the estimator, we need to reflect that change when we create the ``Assembler``. First head to the ``stages/__init__.py`` file and rename ``Baseline`` to ``MakeUpperCase``:: from .baseline import MakeUpperCase from .input_validator import InputValidator from .assembler_state import AssemblerState Then in ``__main__.py`` where the estimator is imported make sure it looks like so:: from stages import MakeUpperCase, InputValidator And where the assembler is created, make sure it looks like so:: assemblies = [ Assembler("baseline") .set_stages([InputValidator(), MakeUpperCase()]) ] That's it for the pipeline! To test the pipeline with default input (``"TODO Load raw data here"`` string) just run the following command:: $ surround run batchLocal The output should be the following:: INFO:surround.assembler:Starting 'baseline' INFO:stages.baseline:Output: TODO: LOAD RAW DATA HERE INFO:surround.assembler:Estimator MakeUpperCase took 0:00:00 secs To change what input is fed through the pipeline, modify ``batch_runner.py`` and change what is given to ``data.input_data``:: import logging from surround import Runner from stages import AssemblyState logging.basicConfig(level=logging.INFO) class FileSystemRunner(Runner): def load_data(self, mode, config): state = AssemblyState() # Load data to be processed raw_data = "This daTa wiLL end UP captializED" # Setup input data state.input_data = raw_data return state .. note:: To test training mode (``fit`` will be called instead in the estimator), run the following command: ``$ surround run trainLocal`` Running your first pipeline in a container ****************************************** First you must build an image for your container. To do this just run the following command:: $ surround run build Then to run the container in dev mode just use the following command:: $ surround run dev This will run the container linking the folder ``testproject/testproject`` with the working directory in the container. So during development when you make small changes, there is no need to build the image, just run this command again. Then when you are ready for production you can use the following command:: $ surround run prod Which will first build the image and then run the container without any linking to the host machine. The image created in the build can also then be committed to a Docker Hub repository and shared. .. note:: Both ``dev`` and ``prod`` will use the default mode of the project, which in non-web projects is ``RunMode.BATCH_PREDICT``, otherwise it's ``RunMode.WEB``. The following commands will force which mode to use:: $ surround run batch $ surround run train .. note:: To see a list of available tasks, just run the command ``$ surround run`` Serving your first pipeline via Web Endpoint ******************************************** When generating a project, you get asked:: Does it require a web runner? (y/n) If we say yes to this then Surround will generate a generic ``batch_runner.py`` but it will also generate a new script called ``web_runner.py``. This script contains a new ``Runner`` which will use `Tornado `_ to host a web server which will allow your pipeline to be accessible via HTTP request. By default the ``WebRunner`` will host two endpoints: - ``/info`` - access via GET request, will return ``{'version': '0.0.1'}`` - ``/estimate`` - access via POST request, body must have a JSON document containing input data:: { "message": "this text will be processed" } So lets create a new pipeline that does the same data processing as the one in :ref:`create-first-pipeline` but we will send strings via web endpoint and get the results in the response of the request. First generate a new project, this time saying yes to the require web prompt, and make all the changes we did in :ref:`create-first-pipeline` and test it is still working locally. Next we are going to build an image for our pipeline using the command:: $ surround run build Then we are going to run our default server using the command:: $ surround run web You should get output like so:: INFO:root:Server started at http://localhost:8080 .. note:: If you would like to run it on the host machine instead of in a container, you must install Tornado using this command: ``$ pip3 install tornado==6.0.2`` Now hopefully if you load ``http://localhost:8080/info`` in your preferred browser, you should see the following:: {"version": "0.0.1"} .. note:: If you are running this on Windows and don't see the above, try using ``http://192.168.99.100:8080/info`` instead. Next we are going to test the ``/estimate`` endpoint by using the following command in another terminal: On Linux/MacOS:: $ curl -d "{ \"message\": \"test phrase\" }" http://localhost:8080/estimate On Windows (in Powershell):: $ Invoke-WebRequest http://192.168.99.100:8080/estimate -Method POST -Body "{ ""message"": ""test phrase"" }" You should see the following output in the terminal running the pipeline:: INFO:surround.assembler:Starting 'baseline' INFO:surround.assembler:Estimator MakeUpperCase took 0:00:00 secs INFO:root:Message: TEST PHRASE INFO:tornado.access:200 POST /estimate (::1) 1.95ms So our data is successfully being processed! But what if we need the result? Head to the script ``web_runner.py`` and append the following to the ``post`` method of ``EstimateHandler``:: # Return the result of the processing self.write({"output": self.data.output_data}) Restart the web server, use the same command as before and you should see the following output: On Linux/MacOS:: % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 53 100 25 100 28 806 903 --:--:-- --:--:-- --:--:-- 1709 {"output": "TEST PHRASE"} On Windows (in Powershell):: StatusCode : 200 StatusDescription : OK Content : {"output": "TEST PHRASE"} RawContent : HTTP/1.1 200 OK Content-Length: 25 Content-Type: application/json; charset=UTF-8 Date: Mon, 17 Jun 2019 06:43:54 GMT Server: TornadoServer/6.0.2 {"output": "TEST PHRASE"} Forms : {} Headers : {[Content-Length, 25], [Content-Type, application/json; charset=UTF-8], [Date, Mon, 17 Jun 2019 06:43:54 GMT], [Server, TornadoServer/6.0.2]} Images : {} InputFields : {} Links : {} ParsedHtml : mshtml.HTMLDocumentClass RawContentLength : 25 Thats it, you are now serving a Surround pipeline! Now you could potentially use this pipeline in virtually any application. .. note:: Since this project was generated with a web runner, the default mode is ``web``, to run the pipeline using the ``FileSystemRunner`` instead, use the command ``$ surround run batch`` or ``$ surround run train``.