Kubernetes

RKE2 on AWS with Rancher

Jun 25, 2025

min read

----------- DOWNLOAD EXECUTIVE SUMMARY (3 min read) -----------

DOWNLOAD EXECUTIVE SUMMARY (2 min read)

TL;DR

In five weeks I went from AWS certification theory to hands-on Kubernetes experience, building a production-ready RKE2 cluster from scratch. I deployed a 2-node cluster (server + agent) on AWS EC2, installed Rancher via Helm for cluster management, and containerized a Python quotes application with Docker. The full pipeline flows from VSCode development through GitHub, Docker Hub, to Kubernetes deployment with 1 running replica. Everything runs on AWS with a pre-configured security group, an Elastic IP, and DNS routing. Key technologies exposure: EC2, RKE2, Kubernetes, Docker, Rancher, Helm, kubectl, systemd, and container orchestration. Biggest lessons: configuration details matter (wrong IP addresses break everything), Docker images are immutable snapshots, Git hygiene prevents 133MB disasters, and systematic debugging beats panic. The result is a fully functional cloud-native deployment pipeline I can scale, monitor through Rancher, and confidently add to my résumé as real production experience.

From Zero to Kubernetes: The Summer Project

Deploying a Python App on AWS with RKE2 and Rancher

Introduction

As part of a research internship, I started this summer project knowing almost nothing about Kubernetes or Rancher. While I had recently completed my AWS Cloud Practitioner and AI Practitioner certifications, giving me some foundational cloud knowledge, I was still a student with only basic coding skills. I was both excited and nervous about the challenge. My goal was to deploy a simple Python "quotes" API on a Kubernetes cluster in AWS, using Rancher (RKE2) to help manage it. Over five weeks of the Early Start program, I went from logging into AWS for hands-on work for the first time to seeing my app running in a pod. The journey involved applying my AWS fundamentals, learning containerization, and diving deep into Kubernetes tools: a lot to unpack, but highly rewarding.

Week 1: Applying AWS Knowledge - Setting Up EC2

In week 1, I focused on applying my AWS foundational knowledge to hands-on work. While I had studied AWS concepts for my Cloud Practitioner certification, this was my first time actually building infrastructure. I logged into the AWS console, created a key pair (a public/private SSH credential for EC2), and launched an EC2 instance (Ubuntu, t2.xlarge) in us-east-1. Fortunately, my instructor had already created and configured the security group (basically a virtual firewall for the instance) with all the necessary ports open: SSH (port 22), HTTPS (443), Kubernetes API (6443), and RKE2 node join (9345). This saved me from having to figure out the networking requirements myself. With the EC2 running, I used my private key to SSH in (for example: ssh -i mykey.pem ubuntu@<EC2-public-IP>). Having my own dedicated server for this project felt like a solid foundation to build upon!

Key accomplishments:

I created an EC2 key pair (you store the private key locally) so I could SSH securely.
I used the pre-configured security group that my instructor had set up with all necessary ports (22, 443, 6443, 9345) for the full Rancher/Kubernetes setup.
I SSH'ed into the instance and made sure everything was running (just basic Linux commands at first).

By the end of week 1 I had a single EC2 VM up, with SSH access working and a minimal setup. I was ready to install Kubernetes (RKE2) on it.

Week 2: Installing RKE2 (Server/Control Plane)

In week 2, I turned that EC2 into an RKE2 (Rancher Kubernetes Engine 2) control plane node. Following the RKE2 quick-start guide, I ran the install script on the VM. In practice, this meant first updating the system and preparing it for RKE2:

# Switch to root user
sudo su -

# Update Ubuntu EC2 instance for server and agent nodes
systemctl disable --now ufw
apt update
apt install nfs-common open-iscsi -y
apt upgrade -y

Then installing and starting RKE2 on the server node:

# Install RKE2 server
curl -sfL <https://get.rke2.io> | INSTALL_RKE2_TYPE=server sh -

# Enable and start the service
systemctl enable rke2-server.service
systemctl start

This installed the RKE2 server and set up the rke2-server service. After enabling and starting the service, the machine became the Kubernetes server (aka control plane). RKE2 automatically wrote a kubeconfig file (rke2.yaml) and generated a join token (node-token). Essentially, the cluster with one node (itself) was now live. I tested connectivity using kubectl get nodes (after exporting KUBECONFIG=/etc/rancher/rke2/rke2.yaml), the one node showed up as Ready.

Meanwhile, I learned about RKE2's ports: by default, the server listens on port 9345 for agent nodes to register, and on 6443 for the Kubernetes API. Thankfully, the security group was already configured with these ports open (along with SSH 22), so I didn't need to worry about networking configuration. This was also when I reserved an Elastic IP for the server instance, so its public IP wouldn't change. Having a stable IP would make it easier to ssh into the server and also when my instructor later set up a DNS record (so we could use a custom domain like rancher.example.com to reach Rancher).

Week 3: Joining the RKE2 Agent Node

Week 3 was about scaling up the cluster. First, I needed to set up kubectl properly on the server node to manage the cluster, then I spun up a second EC2 (same type, Ubuntu) to act as a worker (agent) node.

Setting up kubectl on the Server Node

Before adding the agent, I configured kubectl on the server to properly manage the cluster:

# Install kubectl (symlink to RKE2's kubectl)
ln -s $(find /var/lib/rancher/rke2/data/ -name kubectl) /usr/local/bin/kubectl

# Set up KUBECONFIG environment variable
export KUBECONFIG=/etc/rancher/rke2/rke2.yaml

# Verify KUBECONFIG is set
echo $KUBECONFIG

# Test cluster connectivity
kubectl get node -o wide
ip addr | grep inet
cat /var/lib/rancher/rke2/server/node-token
kubectl get pod -A

Installing the Agent Node

On the new agent VM, after switching to root and updating the server, I installed RKE2 agent and configured it to join the existing cluster:

# Install RKE2 agent
curl -sfL <https://get.rke2.io> | INSTALL_RKE2_TYPE=agent sh -

# Create RKE2 configuration directory
mkdir -p /etc/rancher/rke2/
ls /etc/rancher/rke2/

# Create config file with server details
vim

Then I created the file /etc/rancher/rke2/config.yaml on the agent and added:

server: https://<MASTER_NODE_PRIVATE_IP>:9345
token

After saving the configuration, I started the agent service:

# Verify configuration
cat /etc/rancher/rke2/config.yaml

# Enable and start the agent service
systemctl enable rke2-agent.service
systemctl start

Debugging Agent Connection Issues

However, I ran into my first real debugging challenge here. When I tried to start the rke2-agent service with systemctl start rke2-agent.service, it just sat there hanging with no output or error messages. The service wasn't starting properly, but it wasn't giving me any clues about why.

After running

and doing some investigation on the logs, I discovered the problem was in my config.yaml file. I had mistakenly used the private IP address of the agent instance itself instead of the server instance's private IP. This meant the agent was trying to connect to itself rather than to the actual RKE2 server.

Once I corrected the config.yaml file to use the server's private IP address, I needed to reload the systemd configuration and restart the service:

# Reload systemd configuration after config changes
systemctl daemon-reload

# Restart the agent service with correct configuration
systemctl restart

This time it worked perfectly! The agent successfully connected to the server.

This pointed the agent to the server: it knew which server to connect to on port 9345. After resolving the initial configuration issue, the rke2-agent service started successfully, and soon the agent reached out and registered itself with the control plane.

Now the cluster had two nodes (one control-plane, one worker). Using kubectl get nodes (with the same KUBECONFIG as before) I saw both nodes listed: the server and the agent were both Ready. At this point, the core Kubernetes cluster was up on AWS. The pre-configured security group already allowed all needed traffic (SSH 22, Rancher 443, Kubernetes 6443, RKE2 9345) so the nodes could communicate properly without any additional networking setup on my part.

Figure 1: Network communication flow between RKE2 server, agent, and external services

Figure 2: Rancher dashboard showing both server and agent(worker) nodes successfully joined to the cluster

Week 4: Installing Rancher with Helm and Developing the Python App

Having the Kubernetes cluster, I had two main tasks for week 4: installing Rancher on the cluster and developing the Python application that I would eventually deploy.

Installing Rancher Management Interface

The first part of week 4 focused on setting up the cluster management interface. I SSH'ed into the server node and set up Helm (the Kubernetes package manager).I needed to install Helm, add the necessary repositories, and install cert-manager before installing Rancher itself.

First, I installed Helm:

# Install Helm
curl -#L <https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3> | bash

Then I added the required Helm repositories:

# Add Rancher Helm repository
helm repo add rancher-latest <https://releases.rancher.com/server-charts/latest>

# Add Jetstack repository (for cert-manager)

Next, I installed cert-manager, which is required for Rancher's TLS certificate management:

# Install cert-manager
kubectl apply -f <https://github.com/jetstack/cert-manager/releases/download/v1.6.1/cert-manager.crds.yaml>

# Install cert-manager via Helm
helm upgrade -i cert-manager jetstack/cert-manager --namespace cert-manager --create-namespace

Finally, I installed Rancher itself:

# Install Rancher
helm upgrade -i rancher rancher-latest/rancher --create-namespace --namespace cattle-system --set hostname=rancher.example.com --set bootstrapPassword=abc123 --set replicas=1 --set global.cattle.psp.enabled=false

# Verify installation
kubectl get pod -A

(Where rancher.example.com was the DNS name used, pointing to the Elastic IP of the server.) Rancher's docs mention that on RKE2 clusters, an Ingress controller comes pre-installed (so there's no need to add one manually). This meant as soon as the Helm chart applied, an Ingress pointed the DNS name at Rancher's service. I waited a few minutes for the pods to become healthy.

Finally, my instructor pointed the DNS record rancher.example.com to the EC2's Elastic IP that I had set up earlier. Having that stable IP made the DNS configuration straightforward. When I navigated to https://rancher.example.com, I saw the Rancher login page! That was a big win – Rancher was up. The local cluster (my RKE2 cluster) appeared under Cluster Management. At this point, Rancher acted as a nice dashboard over the cluster I'd built.

Figure 3: Rancher cluster management interface showing the local RKE2 cluster

Developing the Python Quotes Application

The second part of week 4 involved actually coding the Python application that would become the centerpiece of my deployment. Using VSCode as my IDE, I created a simple quotes application that would fetch quotes from an API (reference here). Following the provided reference materials, I built out the core functionality:

Created quotes.py with the main application logic, using a for loop instead of an infinite while loop to make it more manageable
Developed a requirements.txt file listing all the necessary Python dependencies
Tested the application locally on my computer to ensure it worked correctly before moving to deployment

This was an important step in understanding the full development lifecycle; you need working code before you can containerize and deploy it. Once I had the application running locally and was satisfied with its functionality, I organized the files in a directory structure, committed them to Git, and pushed to my branch on the project GitHub repository.

Week 5: Building and Containerizing the Python App

Week 5 was supposed to be the culmination of everything I'd learned, taking my Python application and deploying it to the Kubernetes cluster I'd spent weeks building. What I didn't expect was that this simple deployment would turn into an intensive debugging bootcamp that taught me more about Docker, Git, and production workflows than a tutorial ever could.

The Missing Pieces

I fired up my EC2 server, ready to containerize the Python quotes application I had developed in the second part of Week 4, only to discover a fundamental problem: the files weren't there. I had them locally on my computer and in the GitHub repository but not on the server. This was my first lesson in assumptions, I had expected my application code to magically appear on the server. But of course, I'd never actually pulled the project GitHub repository onto the EC2 instance to get the files on the server.

The solution seemed straightforward until I hit GitHub's authentication wall. When I tried the usual git clone approach and tried to authenticate with my password, I was greeted with a firm rejection: "Support for password authentication was removed on August 13, 2021." This wasn't just an inconvenience: it was GitHub's security evolution in action. I needed to generate a Personal Access Token and use it for authentication. With my files finally on the server, I was ready to start containerizing.

Docker Adventures and Discoveries

Creating the Dockerfile felt like writing a recipe for my application. I specified the Python base image, set up the working directory, copied my requirements, and defined how to run the app.

# Use Python 3.9 as base image
FROM python:3.9-slim

# Set working directory inside container
WORKDIR /app

# Copy requirements file first (for better caching)
COPY requirements.txt .

# Install Python dependencies
RUN pip install --no-cache-dir -r requirements.txt

# Copy application code
COPY quotes.py .

# Command to run when container starts
CMD ["python", "quotes.py"

But when I tried my first build after installing docker on the server,

docker build -t

Docker threw an error that seemed obvious in hindsight: I'd forgotten the build context (that crucial "." at the end that tells Docker what files to include).

It's one of those mistakes you make once and never forget. The real adventure began when I started testing my containerized application. My Python script seemed to hang forever, and I couldn't figure out why. The first culprit was embarrassingly simple: my script had time.sleep(60) between API calls, making each test cycle painfully slow. A quick change to time.sleep(7) made development much more bearable.

But even after fixing the sleep time, something was still off. When I finally let the application run to completion, I discovered I was hitting the quotes API too aggressively and getting HTTP 429 "Too Many Requests" responses. This was my introduction to real-world API constraints: rate limiting isn't just a concept in documentation, it's a daily reality. The solution was patience and respect for the API's limits: longer delays between requests and graceful handling of rate limit responses.

Perhaps the most confusing issue was when my Docker container seemed to "hang" after running. I'd press Ctrl+C and suddenly see all the quotes appear. It turned out the container wasn't hung at all, it had completed successfully but was just buffering output. This taught me about container lifecycle management and the importance of understanding what "completion" looks like for different types of applications.

The Caching Conundrum

Here's where things got really interesting. I made changes to my Python code, reducing the sleep time, but when I tested the Docker image, it was still using the old 60-second delays. This was my crash course in Docker's layer caching system. Docker images aren't dynamic: they're snapshots taken at build time. When I changed my source code, the existing image still contained the old version. I needed to rebuild:

docker build --no-cache -t

This forced Docker to start fresh and include my latest changes. It's a fundamental concept that separates containers from traditional development: your container image is a moment frozen in time until you explicitly update it.

Git Growing Pains

With my Docker image working perfectly, I felt confident pushing my changes to GitHub. That's when I encountered the 133MB wall. My Docker export file (quotes-app.tar) exceeded GitHub's 100MB file size limit, and my push was rejected. This led to a valuable lesson about repository hygiene. Large binary files like Docker images don't belong in Git repositories. I added a .gitignore file to prevent future accidents.

But the damage was done, the large file was already in my Git history. Even after removing it, Git was still trying to push the entire history, including the problematic commit. I had to take more drastic action:

git reset --hard HEAD~5
git push --force

This rolled back my commits, removing the large file from history entirely. Then I used a force push to overwrite the remote branch with my cleaned-up version.

The reset came with an unexpected consequence: I'd accidentally removed my recently created Dockerfile and manifest.yaml because they only existed in the commits I'd just deleted. But by this point, I understood these files well enough to recreate them quickly: sometimes losing work teaches you more than keeping it.

Kubernetes Success

With all the Git drama resolved, I finally deployed my application to Kubernetes. Creating the deployment manifest felt like writing instructions for a very smart robot:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: quotes-app
  labels:
    app: quotes-app
spec:
  replicas: 1
  selector:
    matchLabels:
      app: quotes-app
  template:
    metadata:
      labels:
        app: quotes-app
    spec:
      containers:
      - name: quotes-app
        image: andyyaro/quotes-app:latest
        imagePullPolicy

The manifest told Kubernetes exactly how to run my Docker image: how many copies to run, which image to use, and how to manage the application lifecycle. When I applied it with kubectl apply -f manifest.yaml, I watched my containerized Python application come alive in the cluster.

Seeing my application running in Rancher's web interface was incredibly satisfying. What had started as a simple Python script was now a containerized application running on a production Kubernetes cluster, complete with automatic restart capabilities and scaling potential.

Figure 4: Python quotes application pods running successfully in the Rancher interface

Figure 5: Application logs showing the Python quotes script fetching and displaying quotes

The Real Lessons

This week taught me that production deployment is as much about process as it is about technology. Every issue I encountered, authentication, caching, file management, API rate limits, were real challenges that developers face daily.

I learned to approach problems systematically: test the Python script alone, then the Docker build, then the Kubernetes deployment. When something breaks, isolate the component and debug it individually rather than trying to fix everything at once.

Perhaps most importantly, this experience emphasized that making mistakes isn't failure: it's education. Each error message became a clue, each failed attempt taught me something new about how these systems work together.

The Complete Pipeline

By the end of the week, I had built something remarkable: a complete CI/CD pipeline that took my Python application from source code to running containers in a Kubernetes cluster. The path included:

Source code management with Git and GitHub
Container packaging with Docker and Docker Hub
Orchestration with Kubernetes and Rancher
Production monitoring and management (via Rancher's built-in tools)

Figure 6: Complete development pipeline from source code to production deployment

This had evolved into something much more significant: it was a real application running on real infrastructure, using the same tools and processes that power applications at major companies. The best part? Everything was automated and reproducible. I could tear down the entire deployment and rebuild it with a few commands, or scale it up to handle more traffic as needed.

That's the power of modern cloud-native development, and now I had hands-on experience with every piece of the puzzle.

Reflections and Next Steps

Looking back, it's amazing to see the path from basic AWS knowledge to a working Kubernetes deployment. I started with AWS Cloud Practitioner and AI Practitioner certifications under my belt, but this was my first hands-on implementation. Along the way, I learned how to practically apply concepts like key pairs and security groups for EC2 access, how to install and join nodes in an RKE2 cluster, how to develop a Python application with proper project structure, and how Rancher (via Helm) can give a friendly interface on top. I also got hands-on with Docker: writing a Dockerfile, building images, and pushing to Docker Hub, which is crucial for any cloud-native workflow.

As a beginner, I found the project breaking into weekly goals really helped. Each step (from spinning up VMs to writing YAML manifests) became more understandable.

Figure 7: Complete architecture overview of the final deployment

In the future, I might look at deploying a real web frontend alongside the API, or even learning about Helm charts for packaging my app. I'd also like to explore more Kubernetes features (scaling, monitoring). But for now, I'm proud: I learned enough to confidently say I have Kubernetes, RKE2, Rancher, and AWS on my resume.

Next Steps: I plan to continue tinkering, maybe use this same cluster architecture to try out Kubeflow or set up continuous deployment. Having already completed my AWS Cloud Practitioner certification, I now feel much more confident about implementing cloud solutions in practice. The hands-on experience really reinforced the theoretical knowledge I gained during my certification studies. Overall, this summer project, part of a research internship, turned a steep learning curve into a clear path. I went from having basic AWS knowledge to being able to build and explain a complete cloud-native deployment, quite a journey from where I started!

Subscribe To My Newsletter

Get the latest tech insights delivered directly to your inbox!

Subscribe To My Newsletter

Share It On:

Cloud Computing

My Cloud Resume Challenge

June 23, 2025

min read

Cloud Computing

My Cloud Resume Challenge

June 23, 2025

min read

Cloud Computing

My Cloud Resume Challenge

June 23, 2025

min read

RKE2 on AWS with Rancher

From Zero to Kubernetes: The Summer Project

Deploying a Python App on AWS with RKE2 and Rancher

Introduction

Week 1: Applying AWS Knowledge - Setting Up EC2

Week 2: Installing RKE2 (Server/Control Plane)

Week 3: Joining the RKE2 Agent Node

Setting up kubectl on the Server Node

Installing the Agent Node

Debugging Agent Connection Issues

Week 4: Installing Rancher with Helm and Developing the Python App

Installing Rancher Management Interface

Developing the Python Quotes Application

Week 5: Building and Containerizing the Python App

The Missing Pieces

Docker Adventures and Discoveries

The Caching Conundrum

Git Growing Pains

Kubernetes Success

The Real Lessons

The Complete Pipeline

Reflections and Next Steps

Subscribe To My Newsletter

Subscribe To My Newsletter

Subscribe To My Newsletter

Share It On:

Related Articles

Related Articles

Related Articles

My Cloud Resume Challenge

My Cloud Resume Challenge

My Cloud Resume Challenge