This report outlines the process of monitoring a Kubernetes cluster using Prometheus and Grafana. The deployment is performed using Terraform and Kubernetes, with the infrastructure provisioned on AWS. Key steps include setting up Terraform and AWS CLI, provisioning Kubernetes master and worker nodes, installing necessary dependencies, and deploying a React application. The monitoring stack consists of Prometheus for metrics collection and Grafana for data visualization. Helm is utilized for package management, streamlining the installation of both Prometheus and Grafana.
-
Use scp to transfer Terraform and Docker files to the Ubuntu instance.
-
Set appropriate permissions using chown and chmod.
-
Install Terraform and verify its installation.
-
Install AWS CLI, create AWS account, and configure CLI with access keys.
- Allocate and associate an Elastic IP to ensure a consistent public IP address.
-
Initialize, validate, and apply the Terraform configuration.
-
Output displays IP addresses of Kubernetes master and worker nodes.
- SSH into the Kubernetes master node using the public IP address.
- Update the system, disable swap, install containerd, and Kubernetes components (kubeadm, kubelet, kubectl).
-
Create a Kubernetes configuration file and initialize the cluster.
-
Install Calico network plugin and verify node status.
- Install necessary dependencies and join the worker node to the master node using the provided join command.
-
Create a YAML file for the React app deployment and expose it via NodePort.
-
Access the app through the NodePort on both master and worker nodes.
- Install Helm and create a role for the default service account.
-
Add the Prometheus Helm repository and install Prometheus using a custom YAML configuration.
-
Expose Prometheus through NodePort for external access.
-
Add Grafana Helm repository and install Grafana with custom configurations.
-
Expose Grafana using NodePort and configure the Prometheus data source.
-
Import pre-built Grafana dashboards for visualization.
- Destroy the provisioned infrastructure using the terraform destroy command to prevent unnecessary costs
Use scp
to transfer your Terraform and Docker files from your local machine to your Ubuntu instance.
Note: Run the following command in Command Prompt (CMD) to copy the Terraform and Kubernetes code to your Linux machine:
scp -r -v "C:\Users\Gurpreet\OneDrive\Desktop\York Univ\Assignments\Assignment-7-Kubernetes\Terraform-Kubernetes" [email protected]:/home/administrator/
After entering the password, you will be logged into your Ubuntu Linux machine and will see the files in your home directory as shown below
Note: To avoid permission issues, please run the following commands to ensure the appropriate permissions are set:
sudo chown -R administrator:administrator /home/administrator/Terraform-Kubernetes
sudo chmod -R u+rwx /home/administrator/Terraform-Kubernetes
These commands will assign ownership to the administrator user and grant the necessary read, write, and execute permissions for the Terraform-Kubernetes directory.
sudo apt update && sudo apt install -y gnupg software-properties-common curl
curl -fsSL https://apt.releases.hashicorp.com/gpg | sudo gpg --dearmor -o /usr/share/keyrings/hashicorp-archive-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/hashicorp-archive-keyring.gpg] https://apt.releases.hashicorp.com $(lsb_release -cs) main" | \
sudo tee /etc/apt/sources.list.d/hashicorp.list
sudo apt update
sudo apt install terraform -y
terraform -v
To install the AWS CLI, run the following command
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip awscliv2.zip
sudo ./aws/install
Run the following command to check if AWS CLI is installed correctly:
aws –version
You see the following output
After Creating
Click on account name - Select Security Credentials
Click Create access key.
Note: Download the key file or copy the Access Key ID & Secret Access Key (Secret Key is shown only once!).
After install and creating AWS account configure the AWS
Configure AWS CLI with the New Access Key
aws configure
It will prompt you for:
1. AWS Access Key ID: Your access key from AWS IAM.
2. AWS Secret Access Key: Your secret key from AWS IAM.
3. Default region name: (e.g., us-east-1, us-west-2).
4. Default output format: (json, table, text --- default is json).
Enter access key and secret key which you will get from aws account
Check credentials added to aws configure correctly:
aws sts get-caller-identity
If your AWS CLI is properly configured, you'll see a response like this:
To maintain a consistent public IP address for an EC2 instance after stopping and restarting, an Elastic IP must be associated with the instance. This ensures that the public IP remains unchanged, preventing disruptions in connectivity or configuration dependencies that rely on a stable IP address
- Open the AWS Management Console.
- In the Services menu, select EC2.
- In the left navigation pane, click Elastic IPs under Network & Security.
- Click Allocate Elastic IP address.
- Choose the scope (VPC) and click Allocate.
- Note down the newly allocated Elastic IP address.
- Select the allocated Elastic IP.
- Click Actions → Associate Elastic IP address.
- In the Resource type dropdown, select Instance.
- Select the desired EC2 instance from the list.
- Choose the Private IP address to which the Elastic IP will be associated (if the instance has multiple private IPs).
- Click Associate.
- Go to Instances in the EC2 dashboard.
- Select the instance and confirm that the Public IPv4 address matches the allocated Elastic IP.
- prepares your environment and configures everything Terraform needs to interact with your infrastructure.
- used to automatically format your Terraform configuration files to a standard style. It ensures that your code is consistently formatted, making it easier to read and maintain.
- used to check the syntax and validity of your Terraform configuration files. It helps you catch errors in the configuration before you attempt to run other Terraform commands, like terraform plan or terraform apply.
-
used to preview the changes Terraform will make to your infrastructure based on the current configuration and the existing state. It shows what actions will be taken (such as creating, modifying, or deleting resources) when you apply the configuration
-
Before running terraform apply to check exactly what changes Terraform will make.
Before Running Terraform Plan must update the location of public and private ssh keys under modules -compute - variables.tf
As shown in following image:
After applying the Terraform plan, you will see the following output:
Provision terraform managed infrastructure. You must confirm by trying yes if you would like to continue and perform the actions described to provision your infrastructure resources
After successfully applying the Terraform configuration, you will see the public IP addresses assigned to your Kubernetes master and node instances as output.
k8s-master-Public-IP: The public IP address assigned to the Kubernetes master node.
k8s-node-Public-IP: A list of public IP addresses assigned to the Kubernetes worker nodes.
You can log in to your AWS account to view the infrastructure resources that have been provisioned.
Using the public IP address provided in the Terraform output, connect to the EC2 instance by executing the following command in your terminal:
ssh -i /root/.ssh/docker [email protected]
Run the following commands to update the system and install essential packages:
sudo yum update -y
sudo yum install -y curl wget git
Kubernetes disables swap to prevent unpredictable latency and ensure consistent memory management across nodes. Swapping can bypass Kubernete's memory limits, leading to instability and performance degradation.
Kubernetes requires swap to be disabled. Execute:
sudo swapoff -a
sudo sed -i \'/ swap / s/\^\\.\*\\\$/#\1/g\' /etc/fstab
verify swap is disable
free -h
swapon --show
Note: If swap is disabled, this command will produce no output.
Following commands are used to load kernel modules necessary for container networking and filesystem overlay in a containerized environment like containerd or Kubernetes.
Run the following commands
sudo modprobe overlay
# Enables the overlay filesystem, which allows container runtimes to layer filesystems efficiently.
sudo modprobe br_netfilter
# Enables bridging between containers for networking, essential for Kubernetes networking components like kube-proxy.
cat <<EOF | sudo tee /etc/sysctl.d/99-kubernetes-cri.conf
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
net.bridge.bridge-nf-call-ip6tables = 1
EOF
sudo sysctl --system
Verify that the modules are loaded, by running the following command
lsmod | grep overlay
lsmod | grep br_netfilter
cd \~
curl -LO https://download.opensuse.org/repositories/isv:/kubernetes:/core:/stable:/v1.30/rpm/x86_64/cri-tools-1.30.0-150500.1.1.x86_64.rpm
sudo yum localinstall -y cri-tools-1.30.0-150500.1.1.x86_64.rpm
sudo sysctl ---system
crictl --version
sudo yum update -y
sudo yum install -y containerd
sudo mkdir -p /etc/containerd
containerd config default | sudo tee /etc/containerd/config.toml
sudo systemctl enable containerd \--now
sudo systemctl restart containerd
sudo systemctl status containerd
containerd --version
cat <<EOF | sudo tee /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://pkgs.k8s.io/core:/stable:/v1.32/rpm/
enabled=1
gpgcheck=1
gpgkey=https://pkgs.k8s.io/core:/stable:/v1.32/rpm/repodata/repomd.xml.key
exclude=kubelet kubeadm kubectl cri-tools kubernetes-cni
EOF
sudo setenforce 0
sudo sed -i \'s/\^SELINUX=enforcing\$/SELINUX=permissive/\' /etc/selinux/config
sudo yum install -y kubelet kubeadm kubectl --disableexcludes=kubernetes
sudo systemctl enable --now kubelet
vi kube-config.yml
apiVersion: kubeadm.k8s.io/v1beta3
kubernetesVersion: 1.32.0
# This is a configuration file for kubeadm to set up a Kubernetes cluster.
kind: ClusterConfiguration
networking:
podSubnet: 192.168.0.0/16
apiServer:
extraArgs:
service-node-port-range: 1024-1233
sudo kubeadm init --config kube-config.yml --ignore-preflight-errors=all
Note: The purpose of --ignore-preflight-errors=all flag is to ignore the K8s HW requirements
mkdir -p \$HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf \$HOME/.kube/config
sudo chown \$(id -u):\$(id -g) \$HOME/.kube/config
kubectl get nodes
kubectl apply -f https://docs.projectcalico.org/manifests/calico.yaml
Wait a few minutes, then verify the node status: Run the following command
kubectl get nodes
- Use the same steps as the Master node, using the worker node's public IP.
- Follow the same installation steps as for the Master Node to install containerd, kubeadm, kubelet, and kubectl.
- On the master node, generate the join command by running the following command
kubeadm token create --print-join-command
you will see like following
sudo kubeadm join 10.0.1.45:6443 --token kl3rnb.gj2syfp4bjnri1xu --discovery-token-ca-cert-hash sha256:0805a221754221c412c58ee47f3a38e7f2ccd9baaa3b57a3ede5fc8975de3189 --ignore-preflight-errors=all
copy the above command and paste to your worker node
In the Master (Control Plane) Node, check the cluster status (It could take few moments until the node become ready)
kubectl get nodes
vi react-app-pod.yml
insert the following yml
code
apiVersion: v1
kind: Service
metadata:
name: react-app
spec:
type: NodePort
ports:
- port: 80
targetPort: 80
nodePort: 1233
selector:
app: react-app
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: react-app
spec:
replicas: 1
selector:
matchLabels:
app: react-app
template:
metadata:
labels:
app: react-app
spec:
containers:
- name: react-app
image: wessamabdelwahab/react-app:latest #docker image
ports:
- containerPort: 80
kubectl create -f react-app-pod.yml
kubectl get pods
kubectl get services
kubectl get pods -o wide
curl < react-app IP address>
Example:
curl 192.168.206.129
kubectl get deployment
Go to the pubic IP of your Master server, worker node and port 1233 <Public IP>:1233. The sample react application should be running.
http://54.227.118.240:1233/ #Public IP of master Node
http://52.204.114.58:1233/ #Public of Node 1 (Worker Node)
http://54.165.83.46:1233/ #Public Ip of Node 0 (Worker Node)
- Helm is a package manager for Kubernetes that simplifies the deployment, management, and scaling of applications in a Kubernetes cluster.
- It uses pre-configured application templates called Charts, which define the structure and configuration of Kubernetes resources.
curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3
chmod 700 get_helm.sh
./get_helm.sh
helm version
What It Does:
- Creates a Cluster Role Binding named add-on-cluster-admin.
- Binds the cluster-admin role to the default service account in the kube-system namespace.
Why This Step?
- Helm uses Kubernetes service accounts for access control.
- This command grants the default service account in kube-system full administrative access to the cluster.
Potential Risks:
- Granting cluster-admin access is very permissive and is generally not recommended for production.
- Consider creating a more restrictive role with only the necessary permissions.
Run the following code
kubectl --namespace=kube-system create clusterrolebinding add-on-cluster-admin \
--clusterrole=cluster-admin \
--serviceaccount=kube-system:default
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
vi prometheus.yml
insert the following yml
code
server:
persistentVolume:
enabled: false
alertmanager:
persistentVolume:
enabled: false
Why Disable Persistent Volume?
-
In this example, you disable persistence, so Prometheus does not store data permanently.
-
Warning: In a real production environment, disabling persistence means you lose metrics if pods restart or are terminated.
helm install -f prometheus.yml prometheus prometheus-community/Prometheus
What It Does:
-
Installs Prometheus using the Helm chart from the community repo.
-
Applies your config from prometheus.yml (with persistence disabled here).
Kubectl get nodes
kubectl expose service prometheus-server --type=NodePort --target-port=9090 --name=prometheus-server-np
kubectl get svc
What It Does:
-
Creates a Kubernetes service of type NodePort to expose Prometheus outside the cluster.
-
The Prometheus UI runs on port 9090 inside the cluster.
-
kubectl get svc shows the NodePort assigned (random port in the range 30000-32767).
Why NodePort?
- To access Prometheus UI from your local machine or browser via the node's IP address and assigned port.
Open a browser and enter Public IP Address of master node or worker node
Format:
http://:
Example:
http://54.165.83.46:1213/query
To check NodePort run the following command
kubectl get svc
In the Prometheus UI, use the Expression Browser to search metrics, for example:
- Search for CPU or Memory metrics.
Example query:
kubelet_http_requests_total
Click Execute
Grafana is an open-source platform used for monitoring, visualization, and data analysis. It allows you to query, visualize, and understand your metrics from various data sources in real-time through customizable dashboards.
Common Use Cases:
-
Kubernetes Monitoring: Visualize CPU, memory, and network usage across clusters.
-
Application Performance Monitoring (APM): Track application metrics and log data.
-
Infrastructure Monitoring: Monitor server health, disk usage, and network traffic.
-
Business Metrics: Display business KPIs like sales data, transaction counts, etc.
Grafana is a robust open-source platform used for monitoring, visualization, and analysis of metrics from various data sources, including Prometheus.
The following steps outline the process for installing and configuring Grafana in a Kubernetes cluster using Helm.
First, we need to add the official Grafana repository to our Helm configuration. This repository contains the Helm charts necessary to deploy Grafana.
Run the following commands:
helm repo add grafana https://grafana.github.io/helm-charts
helm repo update
We need to create a grafana.yml file to customize our Grafana deployment. This file will contain values that Helm will use to configure the deployment.
Craete grafana.yml file by running following command
vi grafana.yml
insert following
adminUser: admin
adminPassword: YUDevOps
service:
type: NodePort
port: 3000
Now, we use Helm to deploy Grafana using the values file we created:
helm install -f grafana.yml grafana grafana/grafana
-
-f grafana.yml: Specifies the custom configuration file.
-
grafana: The release name.
-
grafana/grafana: The Helm chart we are using.
After running the command, check the status of the pods to ensure Grafana is running:
kubectl get pods
you see grafana pod is running
By default, the Grafana service is only accessible within the Kubernetes cluster. To access it externally, we expose it using a NodePort service.
Run the command:
kubectl expose service grafana --type=NodePort --target-port=3000 --name=grafana-np
Verify the service and note the NodePort assigned by runnig following command:
kubectl get svc
Now, open a browser and navigate to:
Format:
<Public IP>:<NodePort>
Example:
http://54.227.118.240:1189/login #Master Node
Access Grafana
Username: admin Password: YUDevOps
http://52.204.114.58:1189/login #Worker Node1
http://54.165.83.46:1189/login #Worker Node0
Username: admin Password: YUDevOps
Grafana needs a data source to visualize metrics. We will use Prometheus as our data source.
In the URL field, enter the following address:
http://prometheus-server.default.svc.cluster.local
Click Save & Test to verify the connection.
The address http://prometheus-server.default.svc.cluster.local is the internal DNS address of the Prometheus service within the Kubernetes cluster. This address is automatically generated by Kubernetes when a service is created. Let's break down how to get this address.
-
service-name -- The name of the service.
-
namespace -- The namespace where the service is running (e.g., default).
-
svc -- A subdomain used for services.
-
cluster.local -- The default cluster domain. This can vary if custom DNS settings are configured.
Grafana supports pre-built dashboards that can be imported using their unique IDs.
Import Dashboard ID 10000:
-
Click the Create icon > Import
-
Enter 10000 in the Import via ID field and click Load
-
Select Prometheus from the data source dropdown and click Import
Import Dashboard ID 13770:
Repeat the above steps with ID 13770.
Now, you will see two dashboards populated with Prometheus metrics.
Get All Resources in the Current Namespace:
kubectl get all
Once you are done with the lab, it's crucial to clean up the provisioned infrastructure to avoid unnecessary costs.
Run the command:
terraform destroy
-
Terraform will list all the resources it will destroy and prompt for confirmation.
-
Type yes to proceed with the destruction.