• The Enterprise Grade Rancher Deployment Guide

    Photo by Priscilla Du Preez on Unsplash

    Introduction

    There are many ways to have an Enterprise Grade Kubernetes cluster running with Rancher Server and Rancher Kubernetes Engine (RKE). This guide provides a very, perhaps too much detailed, step by step guide which we have been using by various projects in “production” on Bare-Metal servers and in the cloud.

    In this guide we describe how to use the rke tool to setup a RKE cluster and use helm to deploy Rancher Server on top and use this Rancher Server to deploy other RKE clusters. We name the first RKE cluster with Rancher Server as our Infra cluster to deploy other RKE worker clusters.

    You can use the Infra cluster to setup EKS, AKS oder GCP clusters or RKE clusters on top of vSphere, OpenStack, etc.. as well.

    Take it easy, this guide provides the hard way to get an Enterprise Grade Rancher deployment and show how upgrade, backup and recovery works, with some hints about troubleshooting.

    Please note: there are many other easier ways to have an automated enterprise grade Rancher cluster deployment, e.g. with Terraform Provider Rancher2.

    We hope you’d like this guide and use it to setup Enterprise Grade Rancher Deployments for production!

    Prerequisites

    For this guide, you may want to have a minimal Ubuntu LTS (e.g. 18.04) installed on all machines. We are using 3 Bare-Metal Servers. You can perform the same deployment on RancherOS, CentOS and any Redhatish OS family in a similar way on any environment, e.g. cloud / non-cloud.

    Before we start, you need kubectl on your local machine:

    $ curl -LO https://storage.googleapis.com/kubernetes-release/release/$(curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt)/bin/linux/amd64/kubectl

    Make the kubectl executable with:

    $ chmod +x ./kubectl

    and move it into your $PATH

    $ sudo mv ./kubectl /usr/local/bin/kubectl

    Verify, that the install is correct with

    $ kubectl version

    Setup Your Machines

    After the installation of the OS and the openssh-server, you need to secure the machine from unauthorised access. Therefore we deactivate password authentication so you can only login with your key.

    For this we create a new directory with

    $ mkdir -p ~/.ssh/

    and create the authorized_keys file with

    $ vi ~/.ssh/authorized_keys

    Paste all keys in this file, which may need to access it via SSH.

    Next, we should make some changes on the SSH server config. You can still make custom changes for your needs. In your sshd_config file you may want to make the following changes:

    – Enabled SSH via Port 22

    – Enabled Pubkey Authentication

    – Enabled usage of “authorized_keys”

    – Disabled password auth

    – Disabled challenge-response auth

    – Enable use of PAM

    – Enable X11-forwarding

    – Disabled “Message of the day”

    – allow to pass local environment variables

    – override of subsystem for sftp

    After that, we need to add the user (which was created during the installation of the OS) to the group of “sudoers”. For that, open

    $ (sudo) vi /etc/sudoers

    and add your user to “Allow members of group sudo to execute any command” and “includedir /etc/sudoers.d”. The file should have the following passage:

    (...)
    # Allow members of group sudo to execute any command
    %sudo ALL=(ALL:ALL) ALL
    $YOURUSER ALL=(ALL) NOPASSWD: ALL

    # See sudoers(5) for more information on "#include" directives:

    #includedir /etc/sudoers.d
    $YOURUSER ALL=(ALL) NOPASSWD: ALL
    (...)

    Since this is a minimal server installation, we need to install Curl for the next step:

    $ apt install curl

    Then we need to install with:

    $ curl -fsSL https://get.docker.com -o get-docker.sh

    and

    $ sh get-docker.sh

    After the curl command is finished and docker is installed, we need to add the ubuntu user to docker group:

    $ sudo usermod -aG docker ubuntu

    Run docker info to see if the installation was completed. You are then ready for the next step.

    Setup RKE

    In this step we will deploy Rancher and the Rancher Kubernetes Engine (RKE). To start, we need to obtain the RKE binary. You can check RKE Releases for the latest RKE version and download it via wget on one of the machines.

    Download the latest RKE Binary:

    $ wget https://github.com/rancher/rke/releases/download/v0.1.18/rke_linux-amd64

    in the users $PATH to get the binary and rename it with:

    $ mv rke_linux-amd64 rke

    to rke. To finish the install of the rke binary we need to make it executable with:

    $ chmod +x rke

    Confirm this step with:

    $ rke --version

    Next the cluster needs to be created. We create a YAML file, which contains all information the RKE needs, to spin up the environment.

    So we create the rancher-cluster.yaml with:

    $ vi rancher-cluster.yaml

    with the following content (please refer to the advanced_rke_cluster_redacted.yaml in appendix for more options):

    ----
    ignore_docker_version: true
    network:
    plugin: flannel
    system_images:
    kubernetes: rancher/hyperkube:v1.13.5-rancher1
    nodes:
    - address: $NAMEOFNODE1
    internal_address: $IPOFNODE1
    user: ubuntu
    role: [controlplane,etcd]
    ssh_key_path: ~/.ssh/id_rsa
    - address: $NAMEOFNODE2
    internal_address: $IPOFNODE2
    user: ubuntu
    role: [worker]
    ssh_key_path: ~/.ssh/id_rsa
    - address: $NAMEOFNODE3
    internal_address: $IPOFNODE3
    user: ubuntu
    role: [worker]
    ssh_key_path: ~/.ssh/id_rsa

    With this YAML definition we will create three nodes; one as controlplane and etcd and the other two as workers. The user “ubuntu” is created on all nodes and the SSH keys of these nodes will be stored in the stated path. It is possible to add more nodes to this yaml (you can also add nodes after the installation).

    Note 1: If you’d like to have the controlplane, etcd and worker on all nodes, you’ve to set the role as:

    [controlplane,etcd,worker]

    for all nodes.

    Note 2: If you need Canal as a CNI plugin, please refer to the extended rancher-cluster.yaml in the appendix and set the

    canal_iface 

    accordingly.

    To create this cluster, we run:

    $ rke up --config rancher-cluster.yaml

    You can see the created nodes after the installation with (you can get the kubeconfig from the Rancher GUI):

    $ kubectl get nodes

    Setup HELM

    Helm is the package management tool of choice for Kubernetes. Helm “charts” provide templating syntax for Kubernetes YAML manifest documents. With Helm we can create configurable deployments instead of just using static files. For more information about creating your own catalog of deployments, check out the docs at https://helm.sh/. To be able to use Helm, the server-side component tiller needs to be installed on your cluster.

    To install this, we first need to create a service account for tiller with:

    $ kubectl -n kube-system create serviceaccount tiller

    and assign rights and permissions with

    $ kubectl create clusterrolebinding tiller --clusterrole=cluster-admin — serviceaccount=kube-system:tiller

    Then we install helm with:

    $ curl -LO https://git.io/get_helm.sh
    $ chmod 700 get_helm.sh
    $ ./get_helm.sh

    Then we initialise helm with:

    $ helm init --service-account tiller

    To verify the installation we check the rollout status with:

    $ kubectl -n kube-system rollout status deploy/tiller-deploy

    If the console prints “deployment “tiller-deploy” successfully rolled out”, the process was successful. Finally we verify if helm can talk with the tiller service with:

    $ helm version

    If the output looks like:

    Client: &version.Version{SemVer:”v2.12.1", GitCommit:”02a47c7249b1fc6d8fd3b94e6b4babf9d818144e”, GitTreeState:”clean”}
    Server: &version.Version{SemVer:”v2.12.1", GitCommit:”02a47c7249b1fc6d8fd3b94e6b4babf9d818144e”, GitTreeState:”clean”}

    the install is finished.

    Setup Rancher HA with Helm and Cert Manager

    In this step we will install Rancher with certificates managed by Let’s encrypt. First we need to install the cert-manager with:

    $ helm install stable/cert-manager --name cert-manager --namespace kube-system --version v0.5.2

    We can check the install status with:

    $ kubectl -n kube-system rollout status deploy/cert-manager

    Next, we install the Rancher Server on our RKE cluster with:

    $ helm repo add rancher-latest https://releases.rancher.com/server-charts/latest
    $ helm install rancher-latest/rancher \
    — name rancher \
    — namespace cattle-system \
    — set hostname=rancher.my.org \
    — set ingress.tls.source=letsEncrypt \
    — set letsEncrypt.email=somebody@someorg.org

    Change the hostname to your public DNS record and the letsEncrypt.email to a real email address. We recommend an email address of a team which monitors this environment.

    If you want to use your own certs, replace:

     — set ingress.tls.source=letsEncrypt

    with

     — set ingress.tls.source=secret

    Copy your .crt and .key in the install directory and execute following:

    kubectl -n cattle-system create secret tls tls-rancher-ingress \
    — cert=tls.crt \
    — key=tls.key

    Check the status of this deployment with:

    $ kubectl -n cattle-system rollout status deploy/rancher
    deployment “rancher” successfully rolled out

    Navigate to the address set in “set hostname” to continue from there.

    After you’ve logged in, you need to change the admin password. Change this password regarding you password policies. You can also add members in the Members tab and give them roles (Owner, Member or custom) regarding their tasks in this cluster.

    Create a RKE Cluster via Rancher GUI

    One of Ranchers major advantages is that you can create clusters from the GUI. To do this, switch to the Global tab. Next, click on Add Cluster.

    You can choose multiple options, where you want to create your cluster. Choose Custom to create a custom cluster. Give the new cluster a name and assign members or create new ones. Leave the options in the Cluster Options section as they are and click on Next.

    In the next window, you need to assign node roles. For the master it is recommended to assign etcd and Control Plane; other nodes will have the Worker role assigned. On the bottom of the page you will see a command. Copy this to your clipboard and execute it on the machine(s), which you want to assign your role to (Please note, that Docker must be installed on these machines). Click on Done when you are finished.

    You will see on the Global tab, that your cluster is being provisioned. This process may need some time. While it’s being provisioned, you can click on Kubeconfig File to get the contents for your kubeconfig file on your local machine. Add these contents to your file, so you will be able to control your fresh and new cluster via your local machine.

    That’s it! You now have created a K8s cluster via RKE and deployed Rancher so you can have insights on the performance of your cluster(s).

    7. Upgrade RKE

    RKE supports version upgrade by changing the image tags of the system-images. You can upgrade by editing the cluster.yml and update the tag e.g. from v1.9.3 to v1.10.3

    To upgrade your cluster simply run:

    $ rke up --config cluster.yml

    RKE will check if the versions you desired are available and upgrade if everything looks fine. Please be aware that a rollback to previous versions is not supported.

    Upgrade Rancher — Single Node

    If you’re running Rancher on a single node, please follow this upgrade procedure. To do this, you first have to login to the node, which runs your Rancher Server. Then stop the container which is running Rancher with

    $ docker stop $RANCHER_CONTAINER_NAME

    Next a backup of your current Rancher data should be created. You can create a data container with (replace the placeholders):

    $ docker create --volumes-from $RANCHER_CONTAINER_NAME --name rancher-data rancher/rancher:$RANCHER_CONTAINER_TAG

    To prevent data loss during an upgrade, we also create a tarball. This can be done with:

    $ docker run --volumes-from rancher-data -v $PWD:/backup alpine tar zcvf /backup/rancher-data-backup-$RANCHER_VERSION-$DATE.tar.gz /var/lib/rancher

    and is your restore point. Verify that the tarball is created by ls the contents of your current directory. Move this tarball to an external safe location outside of your Rancher Server.

    Continue with:

    $ docker pull rancher/rancher:latest

    to get the latest Rancher version.

    Then start a new container with the contents of the data container with:

    $ docker run -d --volumes-from rancher-data --restart=unless-stopped -p 80:80 -p 443:443 rancher/rancher:latest

    After the creation, log in to Rancher and verify the installation. You can most easily check it if you check the version number in the bottom left corner of the browser window. Also, remove the previous container to prevent unwanted restarts and data hassels.

    Upgrade Rancher — HA

    Just like in the previous section, we need to create a backup first. See section 10 “Backup and Recovery” and come back here.

    Other prerequisites for this step are working kubectl which can connect to the cluster running Rancher server, Helm and Tiller. We need to make sure that Tiller is up to date with:

    $ helm init --upgrade --service-account tiller

    As we installed these in the sections before, we can continue with the update of the local helm repo cache:

    $ helm repo update

    Next, we need to get repo which we used to install rancher. We use

    $ helm repo list

    And see if rancher-stable is available. In addition, we need to gather the set values of the current Rancher install. This can be achieved with:

    $ helm get values rancher

    And will print out the hostname of the install (e.g. hostname: rancher.my.org). We start the upgrade with (make sure to change the hostname to your current host):

    $ helm upgrade rancher rancher-stable/rancher --set hostname=rancher.my.org

    We can verify that the upgrade was successful when we log in into Rancher.

    Backup and Recovery

    a. Backup of Single Node

    The Backup process of a single node is explained in section 8 “Upgrade Rancher — Single Node”. Follow these steps until you have to save your backup to external location outside your Rancher installation.

    b. Backup of HA environment

    There are two options here. First option is recurring snapshots, the second one is for one time snapshots.

    To enable recurring snapshots, edit the rancher-cluster.yml and add (and modify if needed) the following code block to the file:

    services:
    etcd:
    snapshot: true # enables recurring etcd snapshots
    creation: 6h0s # time increment between snapshots
    retention: 24h # time increment before snapshot purge

    Save and close the file, and change your working directory to your RKE binary (please note, that the rancher-cluster.yml must reside in the same directory). Run

    $ rke up --config rancher-cluster.yml

    RKE is configured to take recurring snapshots of etcd on all nodes running the etcd role. Snapshots are saved to the following directory: /opt/rke/etcd-snapshots/

    To take a one-time snapshot, change your working directory to the one with your RKE binary and execute

    $ rke etcd snapshot-save --name $SNAPSHOT-NAM --config rancher-cluster.yml

    RKE takes a snapshot of etcd running on each etcd node. The file is saved to /opt/rke/etcd-snapshots. Save these snapshots to a safe location.

    c. Restore of Single Node

    Connect to the instance running Rancher Server and stop the conatiner running Rancher server with

    $ docker stop $RANCHER_CONTAINER_NAME

    Next, you should move the tarball back on this instance and change your working directory to the directory where you imported the tarball. Execute the following to delete your data and replace it with the data from your backup

    $ docker ru --volumes-from $RANCHER_CONTAINER_NAME -v $PWD:/backup \
    alpine sh -c “rm /var/lib/rancher/* -rf && \
    tar zxvf /$BACKUP-PATH/$RANCHER-BACKUP-NAME.tar.gz”

    Restart your container after restore process with

    $ docker start $RANCHER_CONTAINER_NAME

    and login into your Rancher to check if the restore succeeded.

    d. Restore of HA environment

    • Pick one of the clean nodes. That node will be the “target node” for the initial restore. Place the snapshot and PKI certificate bundle files in the /opt/rke/etcd-snapshots directory on the “target node”.

    Copy your rancher-cluster.yml and make the following changes in the copy:

    • Remove or comment out entire the addons: section. The Rancher deployment and supporting configuration is already in the etcd database.
    • Change your nodes: section to point to the restore nodes.
    • Comment out the nodes that are not your “target node”. We want the cluster to only start on that one node.

    Use RKE with the copied and modified yaml and restore the database with

    $ rke etcd snapshot-restore --name $RANCHER_SNAPSHOT.db --config ./rancher-cluster-restore.yml

    Bring up the cluster on the single target node with

    $ rke up --config ./rancher-cluster-restore.yml

    Remember to configure your kubectl to maintain the restored cluster. Give the restore some time to complete and stabilize. You can check the status with

    $ kubectl get nodes

    Clean up old nodes with

    $ kubectl delete node $OLD_NODE_IP1 $OLD_NODE_IP2 (…)

    Restart the target node to make sure that cluster networking and services are working before you continue. Also, wait for the pods running in kube-system, ingress-nginx and the rancher pod in cattle-system to return to the Running state.

    Uncomment the rancher-cluster-restore.yml options and use RKE to add the nodes to the new cluster:

    $ rke up --config ./rancher-cluster-restore.yml

    Troubleshooting

    Extended Cleanup

    In some cases if you need to cleanup and reset the cluster, the following script does the work in most cases, be aware to cleanup your IP tables if the script doesn’t work out of the box:

    https://gist.github.com/superseb/06539c6dcd377e118d72bfefdd444f81

    RKE configuration file issues:

    • SSH connectivity issues:

    Symptoms:

    Failed to set up SSH tunneling for host [xxx.xxx.xxx.xxx]: Can’t retrieve Docker Info
    Failed to dial to /var/run/docker.sock: ssh: rejected: administratively prohibited (open failed)

    User specified to connect with does not have permission to access the Docker socket.

    This can be checked by logging into the host and running the command docker ps:

    $ ssh -i ssh_private_key ubuntu@server
    ubuntu@server$ docker ps
    CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES

    Resolution:

    Make sure to give ubuntu user correct rights to run docker commands. Basically, you missed out running: $ sudo usermod -aG docker ubuntu

    • SSH key issues

    Symptoms:

    Failed to dial ssh using address [xxx.xxx.xxx.xxx:xx]: Error configuring SSH: ssh: no key found

    Reasons:

    • The key file specified as ssh_key_path cannot be accessed.
    • The key file specified as ssh_key_path is malformed.

    Resolution:

    Symptoms:

    Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?

    The node is not reachable on the configured address and port.

    Resolution:

    Check if the nodes are reachable.

    If, in any rare cases

    rke up --config rancher-cluster.yaml

    fails without giving enough information about the error, run the same command in debug mode:

    rke --debug up --config rancher-cluster.yaml

    Related Resources

    Deploy RKE with a single TK8 command on AWS

    Deploy an HA Kubernetes Cluster with RKE and TK8 CLI

    TK8 Cattle AWS Provisioner with Terraform Rancher Provider

    Questions?

    Please feel free to join uns on Kubernauts’ Slack.

    We’re hiring!

    We are looking for engineers who love to work in Open Source communities like Kubernetes, Rancher, Docker, etc.

    If you wish to work on such projects please do visit our job offerings page.

    Appendix

    I. Advanced RKE YAML


    The Enterprise Grade Rancher Deployment Guide, the hard way was originally published in Kubernauts on Medium, where people are continuing the conversation by highlighting and responding to this story.

    Mehr zu Kubernetes Services, Kubernetes Training und Rancher dedicated as a Service lesen unter https://blog.kubernauts.io/enterprise-grade-rancher-deployment-guide-ubuntu-fd261e00994c?source=rss—-d831ce817894—4