Cerberus For Schema Validation

updated: 17th of July 2018

published: 16th of July 2018

Intro

Cerberus is a lightweight python library that can be used to validate the correct data is being supplied to configuration management tools such as Ansible and Salt or perhaps even Jinja directly. Along with having many useful built in features, Cerberus also makes it relatively straight forward to extend and customize to your needs.

During this post I will cover the basic usage of Cerberus and extending it to add support for validating IPv4/6 and MAC address data types.

For reference the following software will be used in this post.

Python - 3.6.1
Cerberus - 1.2

Install Cerberus

I will use pipenv to manage a virtual environment for this post. First create a directory to install the virtual environment.

    cmd
  
    mkdir ~/cerberus && cd ~/cerberus

Now install Cerberus along with PyYaml and activate the virtual environment.

    cmd
  
    pipenv install cerberus pyyaml && pipenv shell

Schema

A schema is defined as a python dictionary or optionally JSON/YAML. The schema enforces the data structure and value types that are required. As an example, the below code snippet defines both that the hostname key is required and must be of the type string .

    python
  

    schema = {
  "hostname": {
    "type": "string",
    "required": True
  }
}
  

Validation Rules

Validation rules are used to ensure the values provided are correct according to the intentions of the schema author. Cerberus has a nice set of validation rules built in, I will cover two kinds of validators; types and required values. See here for a full list of validators.

In the above example the hostname value must be of the type string . Out of the box, Cerberus supports validation of many common types including; string, int, boolean, etc ... Consult the documentation for a full list of supported types.

Additionally, the required validation rule specifies that the hostname key is required.

Data

Now that we have a schema defined, the next piece to the puzzle is creating some data to be validated. Some example data that aligns to the schema is as follows.

    python
  
    data = {
  "hostname": "rt01"
}

Basic Usage

Lets start with a simple example to demonstrate how to use Cerberus.

    python
  

    # import the Validator class.

from cerberus import Validator

schema = {
  "hostname": {
    "type": "string",
    "required": True
  }
}

data = {
  "hostname": "rt01"
}

# create an instance of the Validator class.

v = Validator()

# Validate the data against the schema.

v.validate(data, schema)
True
  

Since the data we provided to the schema was correct the validation was successful and the output of True is returned.

Lets try to validate some data that does not contain a hostname field. Because the hostname field is required, we can expect to get a data validation error.

    python
  
    data = {}

v.validate(data, schema)
False

# Check what went wrong with the validation

v.errors
{'hostname': ['required field']}

As you can see Cerberus give us a nice informational message describing any problems with the data validation.

Further Usage

A more elaborate schema will be used in the next section of this post. The schema defines hostname , system_mac , vlans and interfaces keys. Both the vlans and interfaces keys are a list of dictionaries so they contain a nested schema.

    python
  

    schema = {
  "hostname": {
    "type": "string",
    "required": True
  },
  "system_mac": {
    "type": "mac_address",
  },
  "vlans": {
    "type": "list",
    "schema": {
      "type": "dict",
      "schema": {
        "name": {
          "type": "string"
        },
        "number": {
          "type": "integer"
        }
      }
    }
  },
  "interfaces": {
    "type": "list",
    "schema": {
      "type": "dict",
      "schema": {
        "name": {
          "type": "string"
        },
        "ipv4_address": {
          "type": "ipv4_address"
        },
        "ipv4_prefix": {
          "type": "integer"
        },
        "ipv6_address": {
          "type": "ipv6_address"
        },
        "ipv6_prefix": {
          "type": "integer"
        }
      }
    }
  }
}
  

The above schema can also be defined in YAML as an example see below.

    yaml
  

    ---
hostname:
  type: "string"
  required: True

system_mac:
  type: "mac_address"

vlans:
  type: "list"
  schema:
    type: "dict"
    schema:
      name:
        type: "string"
      number:
        type: "integer"

interfaces:
  type: "list"
  schema:
    type: "dict"
    schema:
      ipv4_address:
        type: "ipv4_address"
      ipv4_prefix:
        type: "integer"
      ipv6_address:
        type: "ipv6_address"
      ipv6_prefix:
        type: "integer"
      name:
        type: "string"
  

As with the schema, data can be provided as either Python, JSON or YAML. See below for example data that matches the above schema.

Python

    json
  

    data = {
  "hostname": "rt01",
  "system_mac": "00:11:22:aa:bb:cc",
  "vlans": [
    {
      "name": "data",
      "number": 100
    },
    {
      "name": "voice",
      "number": 200
    }
  ],
  "interfaces": [
    {
      "name": "eth1",
      "ipv4_address": "10.100.10.10",
      "ipv4_prefix": 24,
      "ipv6_address": "2001:0db8:abcd:100:10:100:20:10",
      "ipv6_prefix": 64
    },
    {
      "name": "eth2",
      "ipv4_address": "10.100.20.10",
      "ipv4_prefix": 24,
      "ipv6_address": "2001:0db8:abcd:200:10:100:20:10",
      "ipv6_prefix": 64
    }
  ]
}
  

YAML

    yaml
  

    ---
hostname: "rt01"

system_mac: "00:11:22:aa:bb:cc"

vlans:
  - name: "data"
    number: 100
  - name: "voice"
    number: 200

interfaces:
  - name: "eth1"
    ipv4_address: "10.100.10.10"
    ipv4_prefix: 24
    ipv6_address: "2001:0db8:abcd:100:10:100:20:10"
    ipv6_prefix: 64
  - name: "eth2"
    ipv4_address: "10.100.20.10"
    ipv4_prefix: 24
    ipv6_address: "2001:0db8:abcd:200:10:100:20:10"
    ipv6_prefix: 64
  

Extending Cerberus

In the above schema we have defined keys that are of the ipv4_address , ipv6_address and mac_address types. Cerberus does not have built in support for these types so we will need to add this functionality.

The below code snippet adds type validation for the ipv4_address , ipv6_address and mac_address types. We start by subclassing the Validator class and adding the new type methods prefixed with _validate_type_ . When the CustomValidator class is instantiated the new types will be available.

    python
  

    import re

from ipaddress import ip_address


def validate_ip_address(address):
    """
    Validate address is either an IPv4 or IPv6 address.
    :param address: String - IPv4 or IPv6 address
    :return: Int - 0, 4 or 6
    """
    if not isinstance(address, str):
        return 0
    try:
        return ip_address(address).version
    except ValueError:
        return 0

class CustomValidator(Validator):
    """
    Add type checking for network related fields.
    """
    def _validate_type_ipv4_address(self, value):
        """
        Check the value is a valid IPv4 address
        :param value: String - IPv4 Address
        :return: Bool
        """
        if validate_ip_address(value) == 4:
            return True

    def _validate_type_ipv6_address(self, value):
        """
        Check the value is a valid IPv6 address
        :param value: String - IPv4 Address
        :return: Bool
        """
        if validate_ip_address(value) == 6:
            return True

    def _validate_type_mac_address(self, value):
        """
        Check the value is a valid MAC address.
        Valid format: 00:11:22:aa:bb:cc
        :param value: String - MAC address in unix format
        :return: Bool
        """
        try:
            if bool(re.match('([a-fA-F0-9]{2}[:]){5}([a-fA-F0-9]{2})', value)):
                return True
        except TypeError:
            pass
  

Testing

To make use of the new validators, create an instance of the CustomValidator class and use it just like you previously used the Validator class.

    python
  
    v = CustomValidator()

v.validate(data, schema)
True

Alternate Validation Method

Instead of subclassing the Validator class it's also possible to define a function and specify the function as a validator in the schema. Lets use a slightly modified version of the _validate_type_mac_address method to illustrate this point.

    python
  

    def mac_address(field, value, error):
    """
    Validate MAC address
    Takes a MAC address in colon ':' separated format unix format
    EG: '00:11:22:aa:bb:cc'
    :param field: Passed from Cerberus
    :param value: String - MAC address in unix format
    :param error: Passed from Cerberus
    """
    try:
        if not bool(re.match('([a-fA-F0-9]{2}[:]){5}([a-fA-F0-9]{2})', value)):
            error(field, 'Not a valid MAC address')
    except TypeError:
        error(field, 'Not a valid MAC address')

schema = {'mac': {'validator': mac_address}}
data = {'mac': '00:11:22:aa:bb:cc'}


v = Validator()

v.validate(data, schema)
True
  

In the above example we defined a validator key and assigned the mac_address function as the means of validation.

Outro

More and more we are using tools like Ansible, Salt and Jinja templating in our day to day work. Having a schema validation tool such as Cerberus allows you to ensure the correct data is being passed to your configuration templates. If you dont have a database feeding your variables, Cerberus is a pretty good alternative to ensure data consistency.