This is an internal documentation. There is a good chance you’re looking for something else. See Disclaimer.

Inter-Platform Migration

Here is documented how to move all our services from one Kubernetes platform to another. This is based on the documentation created for moving from OpenShift 3 to OpenShift 4. In the process, everything was recreated. Additionally, information has been incorporated concerning the move to a non-OpenShift platform.

Overview

This guide assumes that the new platform is located in the same datacenter and that the following services do not have to be moved:

  • Elastic search

  • Postgres servers

  • Solr servers

  • S3

The steps, as provided, allow a migration without downtime.

Non-OpenShift Platforms

Several OpenShift-specific features are in use that will hinder a migration to another non-OpenShift Kubernetes platform.

(Lists below are non-exhaustive.)

The following OpenShift-specific features are used:

  • oc new-project (via API) is used to create projects.

  • DeploymentConfig

    • Image triggers (to trigger a deployment on docker push)

  • ImageStream

  • Ingress annotations:

    • haproxy.router.openshift.io/timeout - HTTP read timeout

    • haproxy.router.openshift.io/hsts_header - HSTS HTTP header

The following service come with OpenShift and need to be replaced:

  • Docker registry

  • Logging (Kibana)

    • Logging via line-based JSON

  • Prometheus (cluster monitoring)

Platform Preparation

Service Account for Ansible

To allow a full recreation, Ansible needs to be granted access. That is a service account needs to be created and granted the required permissions:

$ oc -n serviceaccounts get serviceaccount ansible
NAME      SECRETS   AGE
ansible   2         165d

See also Service Accounts.

Grant admin access:

apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: ansible-admin
  namespace: serviceaccounts
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: admin
subjects:
- kind: ServiceAccount
  name: ansible
  namespace: serviceaccounts

Other Service Accounts

Other service accounts may need to be recreated as well:

oc -n serviceaccounts get serviceaccounts
NAME                    SECRETS   AGE
ansible                 2         165d
builder                 2         165d
default                 2         165d
deployer                2         165d
teamcity                2         165d
tocco-registry-backup   2         63d

toco-registry-backup is managed by VSHN. Default, builder and deployer are used by OpenShift internally.

Currently, teamcity is the only other global service account. A gitlab account is likely to be created in the future.

Groups

Three groups exist partitioning our users:

$ oc get groups
NAME           USERS
tocco-admin    …
tocco-dev      …
tocco-viewer   …

Those need to be recreated. Ansible will grant namespace-level access to these groups.

A custom ClusterRoleBinding exist to grant access to cluster metrics:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  annotations:
  name: tocco-cluster-reader
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-reader
subjects:
- apiGroup: rbac.authorization.k8s.io
  kind: Group
  name: tocco-admin
- apiGroup: rbac.authorization.k8s.io
  kind: Group
  name: tocco-viewer

Shared Docker Images

Project shared-imagestreams contains various image streams shared amongst projects. Some of them where uploaded manually while others are pulled from a remote repository.

For instance, imagestream maintenance-page is pulled from a remote repository:

apiVersion: image.openshift.io/v1
kind: ImageStream
metadata:
  generation: 3
  name: nginx-maintenance
  namespace: shared-imagestreams
spec:
  lookupPolicy:
    local: false
  tags:
  - annotations: null

    # Fetch from remote repository
    from:
      kind: DockerImage
      name: registry.gitlab.com/toccoag/maintenance-page:latest

    generation: 3

    importPolicy:
      # Check and fetch new upstream images regularly (--scheduled below)
      scheduled: true

    name: latest
    referencePolicy:
      type: Source

You can recreate the above imagestream like so:

oc tag --scheduled registry.gitlab.com/toccoag/maintenance-page:latest nginx-maintenance:latest

Images with no from need to be pushed directly to the registry:

docker login -u any --password-stdin $REGISTRY < <(oc whoami -t)
docker push $REGISTRY/$PROJECT/$IMAGE_NAME

Migration of Nice

Two approaches were used for the migration from OpenShift 3 to OpenShift 4:

  1. Change DNS directly:

    1. Start installation on OpenShift 4

    2. Change DNS

    3. Wait for DNS TTL to expire

    4. Stop installation on OpenShift 3

  2. Reverse proxy from old to new platform:

    1. Start installation on OpenShift 4

    2. Reverse proxy traffic from OpenShift 3 to OpenShift 4

    3. Stop installation on OpenShift 3

    4. Change DNS

The latter approach was used where we did not managed DNS ourselves. With this approach DNS changes can be done afterward without coordination with the customer.

Set up Reverse Proxy (Approach A)

A Nginx image was used to reverse-proxy from OpenShift 3 to 4.

default.conf:

# WebSocket support
map $http_upgrade $connection_upgrade {
    default upgrade;
    '' '';
}

server {
    listen 8080;

    server_name _;

    location / {
        # FIXME
        # adjust upstream
        proxy_pass https://proxy.apps.openshift.tocco.ch;

        # FIXME
        # Set that to (at least) whatever the upstream limit is.
        client_max_body_size 400M;

        # verify upstream TLS certificate
        proxy_ssl_verify on;
        proxy_ssl_trusted_certificate /etc/ssl/certs/ca-certificates.crt;

        # FIXME
        # Adjust according to upstream limit
        proxy_read_timeout 30m;

        proxy_set_header X-Forwarded-For $remote_addr;
        proxy_set_header Host $http_host;

        # WebSocket support
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection $connection_upgrade;
    }
}

Dockerfile:

FROM nginx:stable
COPY default.conf /etc/nginx/conf.d/default.conf

For a possible strategy to deploy it on OpenShift, see:

Migration for Installations where we Control DNS (Approach A)

  1. Set location: cloudscale-os4 in config.yml

  2. Create installation on OpenShift 4:

    ansible-playbook playbook.yml --skip-tags skip_route_dns_verification,acme,monitoring,teamcity -l <installation>
    
  3. Obtain tokens:

    oc login -u <username> https://api.c-tocco-ocp4.tocco.ch:6443
    os4_token=$(oc whoami -t)
    oc login -u <username> https://console.appuio.ch
    os3_token=$(oc whoami -t)
    
  4. Copy TLS certificate from OpenShift 3 to OpenShift 4:

    ./copy_tls_cert --disable-acme-on-os3 --os3-token $os3_token --os4-token $os4_token <installation>
    

    This will copy the TLS certificates from the routes on OpenShift 3 to the ingresses on OpenShift 4 and disable certificate renewal on OpenShift 3.

    (copy_tls_cert can be found in 5322a5d49757689be695b3bfeef98c1a6079c431.)

  5. Deploy installation

    Alternatively, copy the existing image from OpenShift 3.

    Pull from OpenShift 3:

    docker login -u any --password-stdin registry.appuio.ch <<<$os3_token
    docker pull registry.appuio.ch/toco-nice-<installation>/nice
    

    Copy and push to OpenShift 3:

    docker login -u any --password-stdin registry.apps.openshift.tocco.ch <<<$os3_token
    docker tag registry.appuio.ch/toco-nice-<installation>/nice registry.apps.openshift.tocco.ch/nice-<installation>/nice
    docker push registry.apps.openshift.tocco.ch/nice-<installation>/nice
    
  6. Verify installation is running on OpenShift 4

  7. Adjust DNS

  8. Wait for TTL to expire, then stop installation on OpenShift 3

  9. Enable ACME on OpenShift 4:

    ansible-playbook playbook.yml -l <installation>
    

    You can check if a valid cert is available on OpenShift like so:

    gnutls-cli os4.tocco.ch --sni-hostname <hostname> --verify-hostname <hostname> </dev/null
    

    Or look at the Certificates:

    oc edit certificate
    
  10. Correct Docker pull URL on production:

    ansible-playbook playbook.yml -t teamcity -l <prod_installation>
    

    Docker image needs to be pulled from OpenShift 4 when updating production. Let’s tell TeamCity about it.

Migration for Installations using Nginx Reverse Proxy (Approach B)

  1. OpenShift 3: switch project:

    oc project toco-nice-<installation>
    
  2. OpenShift 3: start nginx reverse proxy:

    oc scale --replicas 1 dc/nginx-reverse-proxy
    
  3. OpenShift 3: Disable ACME certificate renewal:

    for name in $(oc get route -o json | jq -r '.items[]|if (.spec|has("path")|not) and (.metadata.annotations["tocco.ansible-managed"] == "true") then .metadata.name else empty end'); do
        oc annotate "route/$name" kubernetes.io/tls-acme-
    done
    
  4. OpenShift 3: Add route for letsencrypt:

    ansible-playbook playbook.yml -t letsencrypt-migration-routes -l <customer>
    
  5. OpenShift 4: add location in config.yml:

    location: cloudscale-os4
    
  6. OpenShift 4: create installation:

    ansible-playbook playbook.yml --skip-tags skip_route_dns_verification,monitoring,teamcity -l <installation>
    
  7. Obtain tokens:

    oc login -u <username> https://api.c-tocco-ocp4.tocco.ch:6443
    os4_token=$(oc whoami -t)
    oc login -u <username> https://console.appuio.ch
    os3_token=$(oc whoami -t)
    
  8. Deploy installation

    Alternatively, copy the existing image from OpenShift 3.

    Pull from OpenShift 3:

    docker login -u any --password-stdin registry.appuio.ch <<<$os3_token
    docker pull registry.appuio.ch/toco-nice-<installation>/nice
    

    Copy and push to OpenShift 3:

    docker login -u any --password-stdin registry.apps.openshift.tocco.ch <<<$os4_token
    docker tag registry.appuio.ch/toco-nice-<installation>/nice registry.apps.openshift.tocco.ch/nice-<installation>/nice
    docker push registry.apps.openshift.tocco.ch/nice-<installation>/nice
    
  9. Verify installation is running on OpenShift 4

  10. OpenShift 3: comment out location temporarily:

    # location: cloudscale-os4
    
  11. OpenShift3: route all traffic to new OS4:

    ansible-playbook playbook.yml -t full-migration-routes -l <customer>
    
  12. Comment in location again

  13. OpenShift 3: stop installation:

    oc scale --replicas 0 dc/nice
    

Other Services

Other services (than Nice) are setup according to Set up Application/Service on OpenShift and the corresponding Ansible plays need to be modified.

Service use OpenShift-specific features too. See Non-OpenShift Platforms.