The Enterprise Grade Rancher Deployment Guide
Introduction
There are many ways to have an Enterprise Grade Kubernetes cluster running with Rancher Server and Rancher Kubernetes Engine (RKE). This guide provides a very, perhaps too much detailed, step by step guide which we have been using by various projects in “production” on Bare-Metal servers and in the cloud.
In this guide we describe how to use the rke tool to setup a RKE cluster and use helm to deploy Rancher Server on top and use this Rancher Server to deploy other RKE clusters. We name the first RKE cluster with Rancher Server as our Infra cluster to deploy other RKE worker clusters.
You can use the Infra cluster to setup EKS, AKS oder GCP clusters or RKE clusters on top of vSphere, OpenStack, etc.. as well.
Take it easy, this guide provides the hard way to get an Enterprise Grade Rancher deployment and show how upgrade, backup and recovery works, with some hints about troubleshooting.
Please note: there are many other easier ways to have an automated enterprise grade Rancher cluster deployment, e.g. with Terraform Provider Rancher2.
We hope you’d like this guide and use it to setup Enterprise Grade Rancher Deployments for production!
Prerequisites
For this guide, you may want to have a minimal Ubuntu LTS (e.g. 18.04) installed on all machines. We are using 3 Bare-Metal Servers. You can perform the same deployment on RancherOS, CentOS and any Redhatish OS family in a similar way on any environment, e.g. cloud / non-cloud.
Before we start, you need kubectl on your local machine:
$ curl -LO https://storage.googleapis.com/kubernetes-release/release/$(curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt)/bin/linux/amd64/kubectl
Make the kubectl executable with:
$ chmod +x ./kubectl
$ sudo mv ./kubectl /usr/local/bin/kubectl
Verify, that the install is correct with
$ kubectl version
Setup Your Machines
After the installation of the OS and the openssh-server, you need to secure the machine from unauthorised access. Therefore we deactivate password authentication so you can only login with your key.
For this we create a new directory with
$ mkdir -p ~/.ssh/
and create the authorized_keys file with
$ vi ~/.ssh/authorized_keys
Paste all keys in this file, which may need to access it via SSH.
Next, we should make some changes on the SSH server config. You can still make custom changes for your needs. In your sshd_config file you may want to make the following changes:
– Enabled Pubkey Authentication
– Enabled usage of “authorized_keys”
– Disabled challenge-response auth
– Disabled “Message of the day”
– allow to pass local environment variables
– override of subsystem for sftp
After that, we need to add the user (which was created during the installation of the OS) to the group of “sudoers”. For that, open
$ (sudo) vi /etc/sudoers
and add your user to “Allow members of group sudo to execute any command” and “includedir /etc/sudoers.d”. The file should have the following passage:
(...)
# Allow members of group sudo to execute any command
%sudo ALL=(ALL:ALL) ALL
$YOURUSER ALL=(ALL) NOPASSWD: ALL
# See sudoers(5) for more information on "#include" directives:
#includedir /etc/sudoers.d
$YOURUSER ALL=(ALL) NOPASSWD: ALL
(...)
Since this is a minimal server installation, we need to install Curl for the next step:
$ apt install curl
$ curl -fsSL https://get.docker.com -o get-docker.sh
and
$ sh get-docker.sh
After the curl command is finished and docker is installed, we need to add the ubuntu user to docker group:
$ sudo usermod -aG docker ubuntu
Run docker info to see if the installation was completed. You are then ready for the next step.
Setup RKE
In this step we will deploy Rancher and the Rancher Kubernetes Engine (RKE). To start, we need to obtain the RKE binary. You can check RKE Releases for the latest RKE version and download it via wget on one of the machines.
Download the latest RKE Binary:
$ wget https://github.com/rancher/rke/releases/download/v0.1.18/rke_linux-amd64
in the users $PATH to get the binary and rename it with:
$ mv rke_linux-amd64 rke
to rke. To finish the install of the rke binary we need to make it executable with:
$ chmod +x rke
$ rke --version
Next the cluster needs to be created. We create a YAML file, which contains all information the RKE needs, to spin up the environment.
So we create the rancher-cluster.yaml with:
$ vi rancher-cluster.yaml
with the following content (please refer to the advanced_rke_cluster_redacted.yaml in appendix for more options):
----
ignore_docker_version: true
network:
plugin: flannel
system_images:
kubernetes: rancher/hyperkube:v1.13.5-rancher1
nodes:
- address: $NAMEOFNODE1
internal_address: $IPOFNODE1
user: ubuntu
role: [controlplane,etcd]
ssh_key_path: ~/.ssh/id_rsa
- address: $NAMEOFNODE2
internal_address: $IPOFNODE2
user: ubuntu
role: [worker]
ssh_key_path: ~/.ssh/id_rsa
- address: $NAMEOFNODE3
internal_address: $IPOFNODE3
user: ubuntu
role: [worker]
ssh_key_path: ~/.ssh/id_rsa
With this YAML definition we will create three nodes; one as controlplane and etcd and the other two as workers. The user “ubuntu” is created on all nodes and the SSH keys of these nodes will be stored in the stated path. It is possible to add more nodes to this yaml (you can also add nodes after the installation).
Note 1: If you’d like to have the controlplane, etcd and worker on all nodes, you’ve to set the role as:
[controlplane,etcd,worker]
Note 2: If you need Canal as a CNI plugin, please refer to the extended rancher-cluster.yaml in the appendix and set the
canal_iface
accordingly.
To create this cluster, we run:
$ rke up --config rancher-cluster.yaml
You can see the created nodes after the installation with (you can get the kubeconfig from the Rancher GUI):
$ kubectl get nodes
Setup HELM
Helm is the package management tool of choice for Kubernetes. Helm “charts” provide templating syntax for Kubernetes YAML manifest documents. With Helm we can create configurable deployments instead of just using static files. For more information about creating your own catalog of deployments, check out the docs at https://helm.sh/. To be able to use Helm, the server-side component tiller needs to be installed on your cluster.
To install this, we first need to create a service account for tiller with:
$ kubectl -n kube-system create serviceaccount tiller
and assign rights and permissions with
$ kubectl create clusterrolebinding tiller --clusterrole=cluster-admin — serviceaccount=kube-system:tiller
$ curl -LO https://git.io/get_helm.sh
$ chmod 700 get_helm.sh
$ ./get_helm.sh
$ helm init --service-account tiller
To verify the installation we check the rollout status with:
$ kubectl -n kube-system rollout status deploy/tiller-deploy
If the console prints “deployment “tiller-deploy” successfully rolled out”, the process was successful. Finally we verify if helm can talk with the tiller service with:
$ helm version
Client: &version.Version{SemVer:”v2.12.1", GitCommit:”02a47c7249b1fc6d8fd3b94e6b4babf9d818144e”, GitTreeState:”clean”}
Server: &version.Version{SemVer:”v2.12.1", GitCommit:”02a47c7249b1fc6d8fd3b94e6b4babf9d818144e”, GitTreeState:”clean”}
Setup Rancher HA with Helm and Cert Manager
In this step we will install Rancher with certificates managed by Let’s encrypt. First we need to install the cert-manager with:
$ helm install stable/cert-manager --name cert-manager --namespace kube-system --version v0.5.2
We can check the install status with:
$ kubectl -n kube-system rollout status deploy/cert-manager
Next, we install the Rancher Server on our RKE cluster with:
$ helm repo add rancher-latest https://releases.rancher.com/server-charts/latest
$ helm install rancher-latest/rancher \
— name rancher \
— namespace cattle-system \
— set hostname=rancher.my.org \
— set ingress.tls.source=letsEncrypt \
— set letsEncrypt.email=somebody@someorg.org
Change the hostname to your public DNS record and the letsEncrypt.email to a real email address. We recommend an email address of a team which monitors this environment.
If you want to use your own certs, replace:
— set ingress.tls.source=letsEncrypt
with
— set ingress.tls.source=secret
Copy your .crt and .key in the install directory and execute following:
kubectl -n cattle-system create secret tls tls-rancher-ingress \
— cert=tls.crt \
— key=tls.key
Check the status of this deployment with:
$ kubectl -n cattle-system rollout status deploy/rancher
deployment “rancher” successfully rolled out
Navigate to the address set in “set hostname” to continue from there.
After you’ve logged in, you need to change the admin password. Change this password regarding you password policies. You can also add members in the Members tab and give them roles (Owner, Member or custom) regarding their tasks in this cluster.
Create a RKE Cluster via Rancher GUI
One of Ranchers major advantages is that you can create clusters from the GUI. To do this, switch to the Global tab. Next, click on Add Cluster.
You can choose multiple options, where you want to create your cluster. Choose Custom to create a custom cluster. Give the new cluster a name and assign members or create new ones. Leave the options in the Cluster Options section as they are and click on Next.
In the next window, you need to assign node roles. For the master it is recommended to assign etcd and Control Plane; other nodes will have the Worker role assigned. On the bottom of the page you will see a command. Copy this to your clipboard and execute it on the machine(s), which you want to assign your role to (Please note, that Docker must be installed on these machines). Click on Done when you are finished.
You will see on the Global tab, that your cluster is being provisioned. This process may need some time. While it’s being provisioned, you can click on Kubeconfig File to get the contents for your kubeconfig file on your local machine. Add these contents to your file, so you will be able to control your fresh and new cluster via your local machine.
That’s it! You now have created a K8s cluster via RKE and deployed Rancher so you can have insights on the performance of your cluster(s).
7. Upgrade RKE
RKE supports version upgrade by changing the image tags of the system-images. You can upgrade by editing the cluster.yml and update the tag e.g. from v1.9.3 to v1.10.3
To upgrade your cluster simply run:
$ rke up --config cluster.yml
RKE will check if the versions you desired are available and upgrade if everything looks fine. Please be aware that a rollback to previous versions is not supported.
Upgrade Rancher — Single Node
If you’re running Rancher on a single node, please follow this upgrade procedure. To do this, you first have to login to the node, which runs your Rancher Server. Then stop the container which is running Rancher with
$ docker stop $RANCHER_CONTAINER_NAME
Next a backup of your current Rancher data should be created. You can create a data container with (replace the placeholders):
$ docker create --volumes-from $RANCHER_CONTAINER_NAME --name rancher-data rancher/rancher:$RANCHER_CONTAINER_TAG
To prevent data loss during an upgrade, we also create a tarball. This can be done with:
$ docker run --volumes-from rancher-data -v $PWD:/backup alpine tar zcvf /backup/rancher-data-backup-$RANCHER_VERSION-$DATE.tar.gz /var/lib/rancher
and is your restore point. Verify that the tarball is created by ls the contents of your current directory. Move this tarball to an external safe location outside of your Rancher Server.
$ docker pull rancher/rancher:latest
to get the latest Rancher version.
Then start a new container with the contents of the data container with:
$ docker run -d --volumes-from rancher-data --restart=unless-stopped -p 80:80 -p 443:443 rancher/rancher:latest
After the creation, log in to Rancher and verify the installation. You can most easily check it if you check the version number in the bottom left corner of the browser window. Also, remove the previous container to prevent unwanted restarts and data hassels.
Upgrade Rancher — HA
Just like in the previous section, we need to create a backup first. See section 10 “Backup and Recovery” and come back here.
Other prerequisites for this step are working kubectl which can connect to the cluster running Rancher server, Helm and Tiller. We need to make sure that Tiller is up to date with:
$ helm init --upgrade --service-account tiller
As we installed these in the sections before, we can continue with the update of the local helm repo cache:
$ helm repo update
Next, we need to get repo which we used to install rancher. We use
$ helm repo list
And see if rancher-stable is available. In addition, we need to gather the set values of the current Rancher install. This can be achieved with:
$ helm get values rancher
And will print out the hostname of the install (e.g. hostname: rancher.my.org). We start the upgrade with (make sure to change the hostname to your current host):
$ helm upgrade rancher rancher-stable/rancher --set hostname=rancher.my.org
We can verify that the upgrade was successful when we log in into Rancher.
Backup and Recovery
The Backup process of a single node is explained in section 8 “Upgrade Rancher — Single Node”. Follow these steps until you have to save your backup to external location outside your Rancher installation.
There are two options here. First option is recurring snapshots, the second one is for one time snapshots.
To enable recurring snapshots, edit the rancher-cluster.yml and add (and modify if needed) the following code block to the file:
services:
etcd:
snapshot: true # enables recurring etcd snapshots
creation: 6h0s # time increment between snapshots
retention: 24h # time increment before snapshot purge
Save and close the file, and change your working directory to your RKE binary (please note, that the rancher-cluster.yml must reside in the same directory). Run
$ rke up --config rancher-cluster.yml
RKE is configured to take recurring snapshots of etcd on all nodes running the etcd role. Snapshots are saved to the following directory: /opt/rke/etcd-snapshots/
To take a one-time snapshot, change your working directory to the one with your RKE binary and execute
$ rke etcd snapshot-save --name $SNAPSHOT-NAM --config rancher-cluster.yml
RKE takes a snapshot of etcd running on each etcd node. The file is saved to /opt/rke/etcd-snapshots. Save these snapshots to a safe location.
Connect to the instance running Rancher Server and stop the conatiner running Rancher server with
$ docker stop $RANCHER_CONTAINER_NAME
Next, you should move the tarball back on this instance and change your working directory to the directory where you imported the tarball. Execute the following to delete your data and replace it with the data from your backup
$ docker ru --volumes-from $RANCHER_CONTAINER_NAME -v $PWD:/backup \
alpine sh -c “rm /var/lib/rancher/* -rf && \
tar zxvf /$BACKUP-PATH/$RANCHER-BACKUP-NAME.tar.gz”
Restart your container after restore process with
$ docker start $RANCHER_CONTAINER_NAME
and login into your Rancher to check if the restore succeeded.
- Pick one of the clean nodes. That node will be the “target node” for the initial restore. Place the snapshot and PKI certificate bundle files in the /opt/rke/etcd-snapshots directory on the “target node”.
Copy your rancher-cluster.yml and make the following changes in the copy:
- Remove or comment out entire the addons: section. The Rancher deployment and supporting configuration is already in the etcd database.
- Change your nodes: section to point to the restore nodes.
- Comment out the nodes that are not your “target node”. We want the cluster to only start on that one node.
Use RKE with the copied and modified yaml and restore the database with
$ rke etcd snapshot-restore --name $RANCHER_SNAPSHOT.db --config ./rancher-cluster-restore.yml
Bring up the cluster on the single target node with
$ rke up --config ./rancher-cluster-restore.yml
Remember to configure your kubectl to maintain the restored cluster. Give the restore some time to complete and stabilize. You can check the status with
$ kubectl get nodes
Clean up old nodes with
$ kubectl delete node $OLD_NODE_IP1 $OLD_NODE_IP2 (…)
Restart the target node to make sure that cluster networking and services are working before you continue. Also, wait for the pods running in kube-system, ingress-nginx and the rancher pod in cattle-system to return to the Running state.
Uncomment the rancher-cluster-restore.yml options and use RKE to add the nodes to the new cluster:
$ rke up --config ./rancher-cluster-restore.yml
Troubleshooting
In some cases if you need to cleanup and reset the cluster, the following script does the work in most cases, be aware to cleanup your IP tables if the script doesn’t work out of the box:
https://gist.github.com/superseb/06539c6dcd377e118d72bfefdd444f81
RKE configuration file issues:
Symptoms:
Failed to set up SSH tunneling for host [xxx.xxx.xxx.xxx]: Can’t retrieve Docker Info
Failed to dial to /var/run/docker.sock: ssh: rejected: administratively prohibited (open failed)
User specified to connect with does not have permission to access the Docker socket.
This can be checked by logging into the host and running the command docker ps:
$ ssh -i ssh_private_key ubuntu@server
ubuntu@server$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
Resolution:
Make sure to give ubuntu user correct rights to run docker commands. Basically, you missed out running: $ sudo usermod -aG docker ubuntu
Symptoms:
Failed to dial ssh using address [xxx.xxx.xxx.xxx:xx]: Error configuring SSH: ssh: no key found
Reasons:
- The key file specified as ssh_key_path cannot be accessed.
- The key file specified as ssh_key_path is malformed.
Resolution:
- Make sure that you specified the private key file (not the public key, .pub), and that the user that is running the rke command can access the private key file.
- Check if the key is valid by running ssh-keygen -y -e -f private_key_file. This will print the public key of the private key, which will fail if the private key file is not valid.
- Cannot connect to the Docker daemon
Symptoms:
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
The node is not reachable on the configured address and port.
Resolution:
Check if the nodes are reachable.
rke up --config rancher-cluster.yaml
fails without giving enough information about the error, run the same command in debug mode:
rke --debug up --config rancher-cluster.yaml
Related Resources
Deploy RKE with a single TK8 command on AWS
Deploy an HA Kubernetes Cluster with RKE and TK8 CLI
TK8 Cattle AWS Provisioner with Terraform Rancher Provider
Questions?
Please feel free to join uns on Kubernauts’ Slack.
We’re hiring!
We are looking for engineers who love to work in Open Source communities like Kubernetes, Rancher, Docker, etc.
If you wish to work on such projects please do visit our job offerings page.
Appendix
I. Advanced RKE YAML
The Enterprise Grade Rancher Deployment Guide, the hard way was originally published in Kubernauts on Medium, where people are continuing the conversation by highlighting and responding to this story.
Mehr zu Kubernetes Services, Kubernetes Training und Rancher dedicated as a Service lesen unter https://blog.kubernauts.io/enterprise-grade-rancher-deployment-guide-ubuntu-fd261e00994c?source=rss—-d831ce817894—4
Kommentarfunktion ist geschlossen