This is an internal documentation. There is a good chance you’re looking for something else. See Disclaimer.

Infrastructure Overview

Overview of our infrastructure operated by VSHN.

OpenShift 4 Platform

Platform based on OpenShift which in turn is built around Kubernetes.

Tocco

For every customer we operate an independent instance and every instance is in an independent OpenShift project with the name nice-${INSTALLATION_NAME}.

Infrastructure Overview

digraph {
    rankdir=LR
    newrank=true
    label="OpenShift Setup - One Installation with Two Instances"

    ### Nodes ###

    inet [ label="Internet/\nClient" shape=pentagon ]
    postgres [ label="Postgres\nDB server" ]
    elasticsearch [ label="Elasticsearch\nfulltext search" ]
    s3 [ label="S3\nobject storage" ]

    subgraph cluster_openshift {
        label="Openshift"

        haproxy [ label="HAProxy\nload balancer" ]

        subgraph cluster_pod1 {
            label="Pod"

            nginx1 [ label="Nginx\nreverse proxy" ]
            nice1 [ label="Tocco\napplication" ]
        }

        subgraph cluster_pod2 {
            label="Pod"

            nginx2 [ label="Nginx\nreverse proxy" ]
            nice2 [ label="Nice\napplication" ]
        }

        addr_service1 [ label="Address Service" ]
        addr_service2 [ label="Address Service" ]
        image_service1 [ label="Image Service" ]
        image_service2 [ label="Image Service" ]
    }

    ### Edges ###

    addr_service1 -> { postgres elasticsearch } [ style=invis ]

    inet -> haproxy [ color=blue fontcolor=blue ]
    haproxy -> nginx1 [ color=blue ]
    haproxy -> nginx2 [ color=gray ]
    nginx1 -> nice1 [ color=blue ]
    nginx2 -> nice2 [ color=gray ]
    nice1 -> { addr_service1 image_service1 elasticsearch postgres } [ color=blue style=dashed ]
    nice1 -> { addr_service2 image_service2 } [ color=gray style=dashed ]
    nice2 -> {
        addr_service1
        addr_service2
        image_service1
        image_service2
        elasticsearch
        postgres } [ color=gray ]
    inet -> s3 [ label="object fetch" color=blue style=dashed ]
    nice1 -> s3 [ label="object store" color=blue style=dashed ]
    nice2 -> s3 [ color=gray style=dashed ]

    ### Legend ###

    subgraph cluster_legend {
        label=Legend

        a [ shape=point ]
        b [ shape=point ]
        c [ shape=point ]
        d [ shape=point ]
        e [ shape=point ]
        f [ shape=point ]

        a -> b [ color=blue label="an HTTP request" ]
        c -> d [ color=blue style=dashed label="possible, additional requests required to statisfy request" ]
        e -> f [ color=gray label="(gray) alternative path using secondary instance" ]
    }
    { rank=same a c e inet }
    { rank=same b d f nginx1 }
}

Deployment

digraph {
    rankdir=LR
    newrank=true
    label="OpenShift - Interaction during Deployment"

    teamcity [ label="TeamCity" ]
    postgres [ label="Postgres" ]

    subgraph cluster_openshift {
        label="Openshift"

        docker_registry [ label="Docker Registry" ]

        subgraph cluster_pod1 {
            label="Pod"

            nginx [ label="Nginx\nreverse proxy" ]
            nice [ label="Tocco\napplication" ]
        }
    }

    teamcity -> docker_registry [ label="2. push image" ]
    teamcity -> postgres [ label="1. create backup" ]
    docker_registry -> nice [ label="3. trigger deployment" ]
    teamcity -> nice [ label="4. poll status" ]
}
  1. Application is built from source on TeamCity and resulting image pushed to Docker registry.

  2. Create a database backup.

  3. Automatic deployment triggered by OpenShift’s ImageChange trigger.

  4. Wait for application to be deployed.

Configuration Management

digraph {
    rankdir=LR
    newrank=true
    label="Ansible - Configuration Management"

    ansible [ label="Ansible" ]
    dns [ label="Nine (via API)" ]
    elasticsearch [ label="Elasticsearch (via API)" ]
    postgres [ label="Postgres (via SSH)" ]
    teamcity [ label="TeamCity (via API)" ]
    S3 [ label="S3 (via API)" ]

    subgraph cluster_openshift {
        label="Openshift (via Kubernetes API)"

        deployment_config [ label="DeploymentConfig"]
        ingress [ label="Ingress"]
        project [ label="Project/Namespace" ]
    }

    subgraph cluster_puppet {
        label="Puppet (Delegated, via Git)"

        puppet_elasticsearch [ label="Elasticsearch" ]
        puppet_user [ label="Linux of Postgres" ]
        puppet_monitoring [ label="Monitoring"]
    }

    subgraph cluster_cloudscale {
        label="Cloudscale API"

        s3_user [ label="object users" ]
    }

    ansible -> deployment_config [ label="configure application" ]
    ansible -> elasticsearch [ label="configure indexes" ]
    ansible -> ingress [ label="configure HTTP / TLS" ]
    ansible -> project [ label="configure project" ]
    ansible -> postgres [ label="configure DBs / users" ]
    ansible -> S3 [ label="configure buckets" ]
    ansible -> puppet_elasticsearch [ label="configure users" ]
    ansible -> puppet_monitoring [ label="configure monitoring" ]
    ansible -> puppet_user [ label="configure users / ssh access" ]
    ansible -> s3_user [ label="configure S3 users" ]
    ansible -> dns [ label="configure DNS (unimplemented)", style="dotted" ]
    ansible -> teamcity [ label="configure deployments" ]
}

List of Components

On a Kubernetes-level our setup looks something like this:

Name

Provided Service

Management

dc/nice

There are two containers:

Name

Description

nice

The Tocco Application

Our main application

nginx

Nginx reverse proxy Provides compression, caching and support for custom headers.

Ansible except for:

  • PVCs (see below)

svc/nice
ingress/nice
ingress/nice-*

There is one service, svc/nice, that handles all traffic going to our application.

There is always a ingress called nice using ${INSTALLATION_NAME}.tocco.ch as FQDN. Additional ingresses may exist that follow the naming convention nice-${FQDN}.

All ingresses use ACME to issue and renew TLS certificates. All connections are upgraded to HTTPS by Nginx (in nginx container).

Connection timeout has been increased to 15 minutes. This is required for our old, legacy client. The default platform-wide connection limit for the ingress had to be raised too.

The following setting is used to ensure the X-Forwarded-For header is not blindly trusted when coming from outside the OpenShift platform:

haproxy.router.openshift.io/set-forwarded-headers: replace

This is configured on cluster level. See also Route-specific annotations. Nice assume X-Forwarded-For can be trusted.

Ansible

A tocco.ansible-managed: "true" annotation is used to ensure Ansible does not touch ingresses created manually or by other tools (like the ACME controller).

No such manually create ingresses exist as of today.

Docker registry
is/nice

Docker image of our main application, Tocco. Built and then pushed from outside OpenShift by our CD tool TeamCity.

Pushed images are deployed automatically using an imageChange trigger.

Images are backed up daily.

Ansible

is/nginx

There are two global nginx images in use:

Name / Image

Description

nginx:stable

Production Nginx

nginx:latest

Staging Nginx

Both images reside in the project shared-imagestreams.

Manually

Updating and promoting from staging to production is done manually.

monitoring

Currently only a simple http check is used to check if our status page (/status-tocco) returns code 200 within a given time.

Solr cores are also monitored by checking their response times. You are able to specify a response warn and critical time and also if we should get a mail.

Ansible

Ansible generates a definition in the Puppet Hiera format as required by VSHN’s monitoring. The configuration is then committed to monitoring.yaml.

logging

Logs are written to stdout as JSON. Those logs are then collected and made available using Elastic Search and Kibana.

DNS

Domains managed by us are hosted at Nine. However, many domains are hosted by customers themselves or third parties in the customer’s name.

Manually via web interface.

PVC for LMS

Our e-learning solution stores files in a PVCs.

Deprecated:

With Nice 3.0, these files have been moved into the DB. 4 systems remain with such volumes.

Manually

PVC for out-of-memory dumps

For debugging purposes, we use PVCs to extract memory dumps from Tocco.

Manually

Tocco Manual

Manual of Tocco consisting of static HTML and hosted on OpenShift.

Deprecated: will be deprecated with Nice ~3.4.

Name

Provided Service

Management

dc/documentation-${VERSION}

For every version of Tocco, a manual is released and deployed separately.

Manually via template

ingress/documentation-${VERSION}

Manually via template

monitoring

Puppet

Added to VSHN’s Puppet config manually.

logs

Default Nginx logs written to stdout

DNS

Manually

Jira Commit Info Service

Integration of deployment, merge and commit information into Jira. See also Commit-Info-Service.

Name

Provided Service

Management

dc/commit-info
jira-addon

Manually

pvc/repository

Clone of our main Git repository. Used to display commit and deployment information in Jira.

Manually

ingress/*
svc/*

Manually

is/*

Deployed via GitLab CI

Sonar

SonarQube code inspection tool.

An instance of SonarQube is running to analyze the source code of Tocco. Analyses are started from TeamCity for backend code and Gitlab CI for the client. See SonarQube for details.

Name

Provided Service

Management

dc/*

Manually

is/*

Deployed manually

Address Provider

External addressprovider service

The service is deployed via GitLab CI and the service definition is managed via Ansible (playbook, role).

Deployment:

$ cd ${ANSIBLE_REPO/services
$ ansible-playbook playbook.yml -t address-provider

Name

Provided Service

Management

dc/*

Manually

ingress/*
svc/*

Manually

is/*

Production: Deployed via TeamCity
Test: Deployed via GitLab

Image service

We use a service called imaginary running in its own pod. The Openshift project containing the service is called image-service. All calls to the service require a header API-Key be used, containing the key as defined in image_service_api_key in secrets2.yml.

From the backend we call the /crop endpoint of the service to generate thumbnails. Other endpoints may be used freely if the need ever arises, nothing is blocked.

The service is deployed via GitLab CI and the service definition is managed via Ansible (playbook, role).

Deployment:

$ cd ${ANSIBLE_REPO/services
$ ansible-playbook playbook.yml -t image-service

Name

Provided Service

Management

dc/*

Manually

is/*

Deployed manually

Managed Servers - VSHN

Postgres

Postgres database server used for the primary database of Tocco.

Version

Postgres 12

Required extensions

lo, pg_trgm, uuid-ossp

Extensions are installed on database via Ansible (CREATE EXTENSION).

Backups

7 daily database dumps + 4 weekly

Users / databases

Databases and users are managed by Ansible.

Locale

Locale settings on Postgres impact ordering (ORDER BY) and is required to be en_US.UTF-8.

See OPS-772

Solr

Apache Solr used to provide full-text search capabilities.

Version

Solr 7.4

Authentication

Via Basic Authentication Plugin providing HTTP Auth support.

Transport security

HTTPS with TLS cert signed by globally trusted authority.

Backups

7 daily + 4 weekly

Implemented using LVM snapshots.

Cores (AKA indexes)

Created via Ansible

Monitoring

Every Solr core is monitored in Icinga against reachability and response time

Mail Relay

SMTP server used for outgoing mails.

The mail server admits all incoming mails. Restricting Sender domains/addresses is left up to Tocco.

Transport Security

STARTTLS with TLS cert signed by globally trusted authority.

DKIM

Mails are signed using DKIM. Generally, one and the same key is used for all mails. However, for a few domains we use another key to avoid name clashes.

See also DNS Records for Outgoing Mails

S3

S3 storage is used for files uploaded to Tocco.

No data is currently being deleted, however we are doing backups

Keys

Each installation has it’s own key

Buckets

There is one bucket per installation

Managed Servers - Nine

DNS

DNS Servers are managed by Nine and the login for the cockpit is stored in ansible vault.

DNS Records

We are using ALIAS and ANAME Records

Configuration

DNS Management has to be done manually at the Nine cockpit. We’d love to have an API to manage the records through ansible.