Provision and Configure K8S MultiNode Cluster over AWS using Ansible

Yogesh
7 min readMar 11, 2021

This blog’s primary aim is to describe the Ansible Roles I created to provision and configure K8S Multi-Node Cluster over AWS Cloud. I will then use those roles in a playbook to set up a K8S cluster. I assume you already have a sound knowledge of Ansible roles and AWS cloud.

To summarize the steps on how I went about creating my first useful roles:

✔️ Create an Ansible Playbook to launch 1 master node and n worker nodes over AWS using EC2 Instances.

️✔️ Create an Ansible Playbook to configure K8S Master, K8S Worker Nodes on the above created EC2 Instances.

✔️ Then convert these Playbooks into roles.

✔️ Create a README.md document describing the roles.

✔️ Upload these roles to GitHub which can then be imported to Ansible Galaxy.

Then I used these roles to create another Ansible Playbook that configures a K8S MultiNode Cluster over AWS. I’ll walk you through the roles first and then elaborate on the playbook that configures a K8S MultiNode Cluster over AWS.

Roles

yogesh174.ansible_role_provisioning

This role provisions EC2 instances on AWS cloud meant for Kubernetes HA cluster.

This role has three tasks:

provision_master.yaml

- name: "Install boto"
pip:
name: "boto"
- name: "Provision master instance"
ec2:
key_name: "{{ keypair }}"
instance_type: "{{ instance_type }}"
image: "{{ image }}"
wait: "yes"
group: "{{ sg }}"
count_tag:
cluster: k8s
node: master
exact_count: 1
instance_tags:
Name: "k8s-master"
cluster: "k8s"
node: "master"
vpc_subnet_id: "{{ vpc_subnet_id }}"
region: "{{ region }}"
assign_public_ip: "{{ assign_public_ip }}"
register: ec2_master
notify: "Set worker instances to 0"
- name: "Run pending handlers"
meta: flush_handlers
- name: "Refresh the inventory"
meta: refresh_inventory

This basically provisions an instance for the master node. Looking at the code in more detail:

- name: "Install boto"
pip:
name: "boto"

This installs boto if already not installed.

- name: "Provision master instance"
ec2:
key_name: "{{ keypair }}"
instance_type: "{{ instance_type }}"
image: "{{ image }}"
wait: "yes"
group: "{{ sg }}"
count_tag:
cluster: k8s
node: master
exact_count: 1
instance_tags:
Name: "k8s-master"
cluster: "k8s"
node: "master"
vpc_subnet_id: "{{ vpc_subnet_id }}"
region: "{{ region }}"
assign_public_ip: "{{ assign_public_ip }}"
register: ec2_master
notify: "Set worker instances to 0"

This is the actual piece of code that contacts AWS API and ensures that an instance with the above tags exists. The code exactly provisions one instance if not already present and tags it with the key-value pairs “cluster: k8s” and “node: master”. It takes the variables keypair, instance_type, image, sg, vpc_subnet_id, region and assign_public_ip from the vars file in the role if not assigned any values. This sets the output of this to the “ec2_master” variable which can be further used. Lastly, whenever a new master instance is launched all the worker instances are terminated(I could've instead reset them with kubeadm reset) so that all the new worker nodes will be able to join the cluster.

- name: "Run pending handlers"
meta: flush_handlers

This runs the pending handlers (which is “Set worker instances to 0” in the previous code block)

- name: "Refresh the inventory"
meta: refresh_inventory

This just refreshes the inventory for the master node to appear.

provision_worker.yaml

---
# tasks file for provisioning worker instance
- name: "Install boto"
pip:
name: "boto"
- name: "Provision worker instances"
ec2:
key_name: "{{ keypair }}"
instance_type: "{{ instance_type }}"
image: "{{ image }}"
wait: "yes"
group: "{{ sg }}"
count_tag:
cluster: k8s
node: worker
exact_count: "{{ worker_count }}"
instance_tags:
Name: "k8s-worker"
cluster: "k8s"
node: "worker"
vpc_subnet_id: "{{ vpc_subnet_id }}"
region: "{{ region }}"
assign_public_ip: "{{ assign_public_ip }}"
register: ec2_worker
- name: "Refresh the inventory"
meta: refresh_inventory

This essentially provisions worker nodes. Some tasks are common in this and the previous one.

- name: "Provision worker instances"
ec2:
key_name: "{{ keypair }}"
instance_type: "{{ instance_type }}"
image: "{{ image }}"
wait: "yes"
group: "{{ sg }}"
count_tag:
cluster: k8s
node: worker
exact_count: "{{ worker_count }}"
instance_tags:
Name: "k8s-worker"
cluster: "k8s"
node: "worker"
vpc_subnet_id: "{{ vpc_subnet_id }}"
region: "{{ region }}"
assign_public_ip: "{{ assign_public_ip }}"
register: ec2_worker

This code ensures that the required number of instance for worker nodes with the above tags and if there aren’t then it provisions and tags it with the key-value pairs “Name: “k8s-worker”, “cluster: k8s” and “node: master”. It takes the variables keypair, instance_type, image, sg, vpc_subnet_id, region and assign_public_ip from the vars file in the role if not assigned any values. This also sets the output of this to the “ec2_worker” variable which can be further used.

set_worker_zero.yaml

- name: "Terminate instances"
ec2:
key_name: "{{ keypair }}"
instance_type: "{{ instance_type }}"
image: "{{ image }}"
group: "{{ sg }}"
count_tag:
cluster: k8s
node: worker
exact_count: 0
instance_tags:
Name: "k8s-worker"
cluster: "k8s"
node: "worker"
vpc_subnet_id: "{{ vpc_subnet_id }}"
region: "{{ region }}"
assign_public_ip: "{{ assign_public_ip }}"

This is a task to terminate all the worker nodes with the tags “cluster: k8s” and “node: worker”.

This role also has a handler named “Set worker instances to 0” that uses the “set_worker_zero.yaml” task.

- name: "Set worker instances to 0"
include_tasks: tasks/set_worker_zero.yaml

The default values of the variables used in this role are:

worker_count: 2
keypair: "k8s"
instance_type: "t2.micro"
image: "ami-047a51fa27710816e"
assign_public_ip: "yes"
sg: "k8s"

You can find this role on my GitHub Repo and Ansible Galaxy.

yogesh174.ansible_role_k8s

This role configures a High Availability(HA) k8s cluster.

This role has two tasks:

all_nodes.yaml

- name: "Install selinux python library"
package:
name: "libselinux-python"
state: "present"
- name: "Disable SELinux"
selinux:
state: "disabled"
- name: "Copy k8s repo"
copy:
src: "kubernetes.repo"
dest: "/etc/yum.repos.d/"
- name: "Install software"
package:
name: "{{ item }}"
state: "present"
loop: "{{ software }}"
- name: "Copy docker daemon file"
copy:
src: "daemon.json"
dest: "/etc/docker/"
- name: "Copy k8s modules conf file"
copy:
src: "modules-load.d_k8s.conf"
dest: "/etc/modules-load.d/k8s.conf"
- name: "Copy k8s sysctl conf file"
copy:
src: "sysctl.d_k8s.conf"
dest: "/etc/sysctl.d/k8s.conf"
- name: "Enable and start services"
service:
name: "{{ item }}"
state: "restarted"
enabled: "yes"
loop: "{{ services }}"
- name: "Reload sysctl confs"
command: "sysctl --system"

Breaking it down this first disables SELinux, transfers the k8s repo, installs the necessary software, copies various configuration files and finally starts all the services.

master_node.yaml

- name: "Init kubeadm"
shell: "kubeadm init --pod-network-cidr={{ network_name }} --ignore-preflight-errors=NumCPU --ignore-preflight-errors=Mem"
ignore_errors: yes
register: init_cmd
- name: "Init kubeadm block"
block:
- name: "Configure kubectl - create directory"
file:
path: $HOME/.kube
state: directory
mode: 0755
- name: "Configure kubectl - copy kubectl config file"
shell: "cp -i /etc/kubernetes/admin.conf $HOME/.kube/config"
args:
warn: false
- name: "Configure kubectl - change permissions"
shell: "chown $(id -u):$(id -g) $HOME/.kube/config"
args:
warn: false
- name: "Download and modify kube-flannel"
shell: "curl https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml | sed 's@10.244.0.0/16@{{ network_name }}@' > kube-flannel.yml"
args:
warn: false
- name: "Deploy kube-flannel"
shell: "kubectl apply -f kube-flannel.yml"
args:
warn: false
when: init_cmd.rc == 0- name: "Create token command for worker nodes"
shell: "kubeadm token create --print-join-command"
register: join_cmd
args:
warn: false

This configures the control-plane(master) node and makes it a client/user(it can use kubectl). It also uses Flannel to set up an overlay network when a new master is provisioned and customizes the pod's network names. This also creates a join command and registers it to “join_cmd” variable so that the worker nodes can join the master nodes.

The default values of the variables used in this role are:

network_name: "10.240.0.0/16"services:
- "docker"
- "kubelet"
software:
- "kubelet"
- "kubeadm"
- "kubectl"
- "docker"
- "iproute-tc"

You can find this role on my GitHub Repo and Ansible Galaxy.

Playbook to launch the cluster

Now, I’ll walk you through the playbook that actually does the task to launch the cluster using the above roles. Don’t forget to configure your server with the pre-requisites for these roles. You can check them out in the Ansible galaxy or in the GitHub repos(Links are above).

First, install the above roles before you actually start writing the playbook

ansible-galaxy install yogesh174.ansible_role_k8sansible-galaxy install yogesh174.ansible_role_provisioning

Then create a playbook that looks something like this

- hosts: localhost
gather_facts: false
vars_prompt:
- name: my_vpc_subnet_id
prompt: "Enter the subnet id (Ex: subnet-2b3b2b4c)"
private: no
- name: my_region
prompt: "Enter the region name (Ex: us-east-1)"
private: no
- name: my_worker_count
prompt: "Enter the number of worker nodes to be present"
default: 2
private: no
- name: my_keypair
prompt: "Enter the number keypair"
default: "k8s"
private: no
- name: my_instance_type
prompt: "Enter the number keypair"
default: "t2.micro"
private: no
- name: my_image
prompt: "Enter the image id"
default: "ami-047a51fa27710816e"
private: no
- name: my_assign_public_ip
prompt: "Do you want to assign a public IP"
default: "yes"
private: no
- name: my_sg
prompt: "Enter the Security Group name"
default: "k8s"
private: no

tasks:
- name: "Provision master node"
import_role:
name: provisioning
tasks_from: provision_master
vars:
keypair: "{{my_keypair}}"
instance_type: "{{my_instance_type}}"
image: "{{my_image}}"
vpc_subnet_id: "{{my_vpc_subnet_id}}"
assign_public_ip: "{{my_assign_public_ip}}"
sg: "{{my_sg}}"
region: "{{my_region}}"
- name: "Provision worker nodes"
import_role:
name: provisioning
tasks_from: provision_worker
vars:
worker_count: "{{my_worker_count}}"
keypair: "{{my_keypair}}"
instance_type: "{{my_instance_type}}"
image: "{{my_image}}"
vpc_subnet_id: "{{my_vpc_subnet_id}}"
assign_public_ip: "{{my_assign_public_ip}}"
sg: "{{my_sg}}"
region: "{{my_region}}"
- set_fact:
ec2_instances: "{{ ec2_master.instances + ec2_master.tagged_instances + ec2_worker.instances + ec2_worker.tagged_instances }}"
- name: "Wait for ssh to start"
wait_for:
host: "{{ item.public_dns_name }}"
port: 22
search_regex: OpenSSH
delay: 20
state: started
loop: "{{ ec2_instances }}"
- name: Refresh the inventory
meta: refresh_inventory
- hosts: tag_cluster_k8s
become: yes
gather_facts: no
- name: "Import the provisioning role"
include_role:
name: create_k8s_cluster
tasks_from: all_nodes
- hosts: tag_node_master
become: yes
gather_facts: no
vars_prompt:
- name: my_network_name
prompt: "Enter the network name"
default: "10.240.0.0/16"
private: no
tasks:
- name: "Import master role"
include_role:
name: create_k8s_cluster
tasks_from: master_node
vars:
network_name: "{{ my_network_name }}"
- hosts: tag_node_worker
become: yes
gather_facts: no
tasks:
- name: "Join with master"
shell: "{{ hostvars[groups['tag_node_master'][0]]['join_cmd']['stdout'] }}"
ignore_errors: yes

The prompts for the variables makes the playbook more dynamic. We first use the “provision_master” and “provision_worker” tasks to provision the instances for master and worker nodes. Then we combine the ec2_master and ec2_worker instances which are created with the role into ec2_instances variable. Next, we configure all the nodes(master and worker) with the “all_nodes” task and use the “master_node” task which sets up the control plane. Finally, we use the join command received from the master node in the worker nodes to join it with the cluster.

Thank You!

--

--