Ready-to-use AWS Account Deploy Kubernetes cluster on AWS EC2

AWS Account / 2026-05-15 15:53:56

Deploy Kubernetes cluster on AWS EC2: the “please don’t panic” guide

Deploying Kubernetes on AWS EC2 sounds like the kind of task that should come with a cape and a montage. In reality, it comes with a lot of careful decisions, a handful of commands you’ll copy-paste with the confidence of someone who absolutely knows what they’re doing, and then one small mistake that turns your cluster into a very expensive plant.

This article is an original, practical walkthrough to get a Kubernetes cluster running on AWS EC2. You’ll learn what you need to decide up front, how to provision the cluster using a common approach, and how to validate that it’s actually ready to host workloads. You’ll also see troubleshooting tips for the classic issues that crop up when networking, IAM permissions, and kubeconfig settings don’t agree with each other.

We’ll aim for high readability and a clear structure. Think of it as a checklist with jokes. (Mostly jokes. The checklist is the important part.)

Before you begin: choose your deployment style

There are multiple ways to run Kubernetes on AWS EC2. Some are managed, some are semi-managed, and some are “you own everything, including the consequences.” The most important distinction is whether you want to manage the control plane yourself.

Option A: Managed Kubernetes (quickest, least chaos)

AWS offers Amazon EKS, which is the fully managed Kubernetes experience. You still run nodes on EC2, but AWS manages the control plane and many operational concerns. If your goal is “get Kubernetes running without babysitting,” EKS is usually the winner.

However, your title says “Deploy Kubernetes cluster on AWS EC2,” and you may want the flexibility of self-managed Kubernetes or you may be learning for educational purposes. So this guide focuses on a self-managed style.

Option B: Self-managed Kubernetes on EC2 (maximum learning)

With self-managed Kubernetes, you provision EC2 instances for control plane and worker nodes. You install Kubernetes components, configure networking, and manage upgrades. This is more work, but it gives you control and understanding.

In practice, many people use a “cluster bootstrapping” tool like kubeadm, or a cluster management tool that wraps kubeadm. For the purposes of this article, we’ll describe a straightforward approach with modern best practices and validation steps. If you already have a preferred tool (kops, Terraform modules, Cluster API, etc.), you can adapt the principles.

Architecture basics: what you’re actually building

At minimum, you’ll need:

A VPC (or use the default, though you’ll regret it later if you like clarity)
Subnets across Availability Zones (recommended for resilience)
EC2 instances for control plane nodes (often 1 for learning, 3 for “serious”)
EC2 instances for worker nodes
Networking that allows nodes to talk to each other and to cluster endpoints
IAM roles/instance profiles for permissions (especially for AWS integrations)
Storage setup (for persistent volumes)

You’ll also configure Kubernetes networking so pods can communicate. Most common setups use a CNI plugin such as Calico or Cilium. The CNI determines how pod IPs are assigned and how traffic flows between pods on different nodes.

Decisions you should make (so you don’t “solve” problems you didn’t have)

How many nodes?

For learning and testing:

Control plane: 1 node
Workers: 1–3 nodes

For reliability:

Control plane: 3 nodes (to tolerate failures)
Workers: 2+ nodes depending on workload

If you’re building a production cluster, you’ll care about quorum, upgrades, and failure domains. If you’re building a dev cluster, you’ll care about getting it working before your coffee gets cold.

Instance types and sizing

At minimum, consider these rough starters:

Control plane: 2 vCPU, 4–8 GB RAM (more if you run extra components)
Worker nodes: 2–4 vCPU, 8–16 GB RAM

Ready-to-use AWS Account Storage matters too. Use SSD-backed volumes unless you enjoy watching performance graphs cry.

Region and Availability Zones

Pick a region and stick with it. Pick multiple AZs if you want your cluster to survive an AZ-level meltdown. If you’re learning, a single AZ can work, but you should understand the limitation.

Networking: VPC, subnets, and security groups

Your cluster needs rules that allow:

Control plane to communicate with worker nodes
Nodes to communicate with each other (for kubelet, CNI, etc.)
Optional access to the Kubernetes API server (usually restricted)

A common approach is to keep nodes in private subnets and expose the API server via restricted access. For dev, you might expose API server access more openly, but do not do this for production unless you’re aiming for a thrilling security story.

Prerequisites: your AWS and local setup

Local tools

On your workstation, you’ll typically want:

A terminal
A tool for SSH (built-in or your favorite one)
A text editor
A way to generate keys and manage them securely
A package manager if you’re installing utilities locally

Specific command-line tools you may use include:

kubectl
awscli
kubeadm-related tooling (for bootstrapping)
Optional: Terraform or an IaC tool if you want automation

AWS account and credentials

Make sure you have:

An AWS account with permissions to create EC2, VPC resources, and IAM roles
A key pair for SSH access
An IAM user/role credentials configured locally (awscli)

Pro tip: if you’re using SSO or assumed roles, test that your credentials can actually call AWS APIs before you start building. Nothing slows you down like discovering your authorization mid-provisioning.

Provision AWS networking (VPC and subnets)

You can create VPC resources via AWS console, CLI, or Infrastructure as Code. For readability, we’ll describe the goal rather than every single click.

Create or select a VPC

You can use an existing VPC or create a new one.

Use a VPC with at least one subnet in each Availability Zone you plan to use
Prefer private subnets for Kubernetes nodes
Ensure outbound access exists for nodes (NAT gateway or VPC endpoints)

If you don’t know whether you need NAT, ask yourself: “Will my nodes need to download packages and images from the internet?” Usually yes. Unless you plan to run everything through private mirrors, you’ll need egress.

Subnets

Create subnets for nodes. Typically:

Public subnet: optional, for a bastion host or NAT gateway
Private subnet: recommended for control plane and worker nodes

If you’re exposing the Kubernetes API server directly, you might use public subnets (again, dev convenience, not production elegance).

Security groups

Security groups are your gatekeepers. You’ll need at least one for control plane nodes and one for worker nodes. The easiest approach for initial testing is to allow required traffic within your cluster security group references.

Common ports involved in Kubernetes clusters include (exact requirements can vary by setup and Kubernetes version):

API server (often 6443/TCP)
etcd (commonly 2379–2380/TCP) if running etcd on the control plane
kubelet (often 10250/TCP)
node-to-node networking for CNI (can be dynamic)

Instead of trying to memorize everything like it’s the periodic table, consider starting with a known-good configuration from your bootstrap tool or CNI documentation, then tighten later.

Ready-to-use AWS Account IAM roles: permissions that prevent pain

Kubernetes on AWS often needs AWS permissions for:

Load balancers integration (e.g., AWS Load Balancer Controller)
Dynamic storage provisioning (e.g., EBS CSI driver)
Accessing AWS APIs for metadata

For a self-managed cluster, you can attach IAM instance profiles to nodes, or use IAM Roles for Service Accounts (IRSA). IRSA is more secure and modern, but it’s extra configuration.

For dev clusters, instance profiles may be simpler. For production, IRSA is the direction you usually want.

Choose a Kubernetes version and CNI plugin

Pick a Kubernetes version that is compatible with your tooling and CNI. Most clusters follow the standard Kubernetes release cycle, but AWS resources and container images may lag slightly depending on your environment.

For CNI:

Calico is a popular and widely supported choice.
Cilium is powerful, especially if you want eBPF features.

Calico tends to be straightforward for many beginners. Cilium can be equally approachable, but it’s “more configurable,” which is another way of saying “it can turn into a hobby.”

Bootstrap the cluster on EC2 (high-level process)

Now we get to the core. The flow looks like this:

Launch EC2 instances for control plane and workers
Install prerequisites on instances (container runtime, kubelet tools)
Initialize the control plane (kubeadm init or equivalent)
Join worker nodes to the cluster (kubeadm join)
Install CNI and supporting components
Verify cluster health and deploy a test workload

Launch EC2 instances

When launching instances, consider:

Base image: use a supported Linux distribution for Kubernetes installation
Networking: assign correct subnet and security groups
Ready-to-use AWS Account SSH key: for administration
IAM role: if you plan to integrate with AWS services
Disable or configure swap appropriately (Kubernetes dislikes swap being on)

Some Kubernetes setups require a hostname configuration that resolves correctly. You don’t need to be a DNS wizard, but make sure the nodes have stable names (or at least stable behavior).

Static internal IPs (optional but helpful)

For a learning cluster, you can accept ephemeral IPs, but stable internal addressing helps with debugging. Consider assigning Elastic IPs only if you need them, and otherwise lean toward private IPs within the VPC.

Prepare instances for Kubernetes

On each EC2 instance, you’ll need to prepare the system.

Turn off swap

Kubernetes typically requires swap to be disabled. If swap is enabled, kubelet may refuse to start properly.

Install a container runtime

Ready-to-use AWS Account Kubernetes needs a container runtime. Common choices include containerd.

The runtime should be configured and running. Kubernetes uses it to pull images and run pods.

Install kubeadm, kubelet, and kubectl (on nodes)

Use the versions appropriate for your Kubernetes release. Consistency matters. Mismatched versions can cause weird issues, and Kubernetes is already good at producing weird issues without your help.

Configure sysctl settings and networking prerequisites

CNI plugins may require certain kernel parameters. You can follow your chosen CNI’s installation steps to apply required settings.

Initialize the control plane

Pick your control plane node (the “main brain” node). You’ll run a cluster initialization command that:

Creates a local etcd (if needed)
Writes the kubeconfig for kubectl access
Sets up the control plane components (API server, controller-manager, scheduler)

During initialization, you’ll see an output that includes:

A join command for worker nodes
Information about the pod network CIDR (important for CNI)
Paths to the kubeconfig file

The “pod network CIDR” is a key detail. CNI needs to know what IP range it should allocate for pods. If you choose one CIDR, your CNI must match. Otherwise, pods will be assigned addresses, but they won’t be able to talk to each other like they’re stuck in separate universes.

Make kubeconfig accessible

For convenience, copy the kubeconfig from the control plane node to your local workstation. Use secure file handling, because kubeconfig can contain credentials or tokens.

Then set your local kubectl config so your terminal knows where the cluster is.

Join worker nodes

For each worker node:

Run the join command produced by kubeadm init (or equivalent)
Ensure the node can reach the control plane endpoint on the required ports
Verify kubelet starts and registers the node

If joining fails, the most common culprits are:

Security group rules blocking traffic between nodes
Firewall settings on instances
Incorrect or unreachable control plane endpoint (wrong IP/hostname)
Time drift (NTP issues can break token-based auth)

Time drift is sneaky. If your EC2 instances don’t sync time properly, authentication can fail even when everything else looks correct.

Install the CNI plugin

Kubernetes without a CNI is like a restaurant without chairs: the food may be cooked, but nobody can sit down. After the cluster is initialized and workers are joined, you must install a CNI plugin so pods can get network connectivity.

Calico (example approach)

Many clusters use Calico. Installation steps vary by version, but the idea is:

Apply the Calico manifest
Make sure it uses the same pod network CIDR you configured during initialization
Wait for Calico pods to become ready

Confirm CNI readiness

Check that nodes become “Ready.” If nodes remain NotReady, inspect CNI pods and logs. Common causes include IP range mismatches and permissions required by the CNI plugin.

Verify cluster health (the “prove it works” stage)

Once you think everything is running, you need to verify. Kubernetes is like a magic show. The important part is whether your rabbit is actually real, not whether the stage looks fabulous.

Check node status

Use kubectl to list nodes and confirm they are in Ready state. If they aren’t, you’ll need to look at events and logs.

Check system pods

Ready-to-use AWS Account Verify that core system components are up in the kube-system namespace. CNI pods should also be running.

Run a test pod

Deploy a simple pod (for example, using nginx) and confirm it reaches Running state. Then test connectivity by exec’ing into the pod and curling an endpoint, or by exposing it through a service.

If the pod stays stuck, check:

Image pull errors (IAM permissions, network egress, or image registry access)
Scheduling issues (resource requests, taints, or node readiness)
DNS problems (CoreDNS)

Enable storage (persistent volumes on AWS)

If you want to run real applications, you’ll eventually want persistent storage. Kubernetes on AWS can use EBS via the EBS CSI driver.

Ready-to-use AWS Account Install the EBS CSI driver

The CSI driver enables dynamic provisioning of volumes using EBS. The installation process includes applying manifests and configuring IAM permissions (depending on your chosen auth method).

Create a StorageClass and test provisioning

After CSI installation, confirm that a StorageClass exists. Then create a PersistentVolumeClaim (PVC) and deploy a pod that uses it. Confirm that an EBS volume gets created and the pod can mount it.

If provisioning fails, check:

IAM permissions for the CSI driver
Correct region settings
Subnet/availability zone compatibility for volume creation

Enable load balancing and ingress (so users can find your apps)

Kubernetes itself doesn’t automatically expose your apps to the outside world. You need services of type LoadBalancer or an Ingress controller.

AWS Load Balancer Controller (common choice)

Many AWS Kubernetes setups use the AWS Load Balancer Controller, which integrates Kubernetes Ingress resources with AWS Application Load Balancers (ALB). This requires IAM permissions and controller installation.

After installation, you can create:

An Ingress resource for HTTP/HTTPS routing
Service resources if you choose NodePort/LoadBalancer directly

Pick a strategy: Service vs Ingress

For a basic test, a Service of type LoadBalancer may be the fastest way to confirm connectivity. For real multi-service routing, Ingress is usually better.

For beginners: try LoadBalancer first. For “real life”: you’ll likely want Ingress.

Operational hygiene: upgrades, backups, and monitoring

At some point, your cluster will need attention. Maybe for upgrades. Maybe because someone suddenly needs extra resources. Or maybe because you accidentally changed something at 2 a.m. (We won’t judge. We all have seasons.)

Ready-to-use AWS Account Monitoring

Install monitoring and logging (Prometheus/Grafana, Loki, etc.). At minimum, you want to see:

Node resource usage
Pod restarts and error logs
API server health
Networking issues

Backups (etcd)

If your control plane is self-managed, consider etcd backups. Etcd is where your cluster state lives. Losing etcd is not a “learning moment,” it’s a “start over and question your choices” moment.

Upgrade strategy

Kubernetes upgrades should be planned. Node draining, CNI compatibility, and controller compatibility matter. If you keep upgrading without reading release notes, Kubernetes will eventually humble you.

Common problems and how to stop them from ruining your day

Problem: Nodes show NotReady

Likely causes:

CNI not installed or not matching pod CIDR
Ready-to-use AWS Account Security groups blocking required ports between nodes
Kubelet failing due to missing runtime or swap enabled

Fix approach:

Check kubelet logs on the node
Check CNI pod status in kube-system
Check cluster events for clues

Problem: Pods stuck in ImagePullBackOff

Likely causes:

Nodes can’t reach the container registry (no egress/NAT)
Wrong image name or tag
Missing credentials for private registries

Fix approach:

Confirm nodes can reach registry endpoints (egress rules)
Ready-to-use AWS Account Validate image references
Set imagePullSecrets if needed

Problem: kubectl can’t connect to the API server

Likely causes:

Wrong kubeconfig context
Kubernetes API endpoint not reachable from your machine
Ready-to-use AWS Account Security group blocks port 6443 from your IP

Fix approach:

Verify kubectl context
Check security group rules for API server access
If using private subnets, consider a bastion host or VPN

Problem: Service of type LoadBalancer doesn’t create a load balancer

Likely causes:

Missing IAM permissions for load balancer creation
Subnet tag or VPC tagging issues (AWS expects specific tags)
Cluster networking configuration issues

Fix approach:

Confirm controller/IAM permissions
Check AWS subnet tags expected for load balancers
Inspect Kubernetes service events

Security tips: because Kubernetes is powerful, not magical

Security is where you turn “it works” into “it doesn’t get exploited.” Consider:

Restrict API server access to known IP ranges
Use least-privilege IAM roles
Enable RBAC and avoid cluster-admin for everything
Use network policies if you adopt a CNI that supports them
Don’t open node ports broadly to the internet

Also: use secrets properly. Don’t hardcode credentials in manifests unless you enjoy refreshing incident timelines.

Suggested next steps

Once your cluster is up and validated, you can improve it iteratively:

Add ingress controller and TLS
Install metrics-server and dashboards
Set up autoscaling (Cluster Autoscaler or Karpenter)
Implement CI/CD to deploy workloads automatically
Harden security and reduce exposed endpoints
Set up persistent storage and backup routines

FAQ: quick answers to the questions you’ll Google anyway

Do I need to use kubeadm?

Not necessarily. kubeadm is popular for self-managed clusters, but you can use other provisioning tools. The underlying concepts still apply: node prep, control plane init, CNI installation, and joining workers.

Is running 1 control plane node okay?

For development and learning, yes. For production, you should strongly consider a multi-node control plane with etcd quorum.

Why do pod network settings matter so much?

Because pods need IP addressing and routing rules that match your CNI configuration. If you pick inconsistent CIDRs, Kubernetes may appear “up” but networking will fail in subtle ways.

What’s the fastest path to a working cluster?

Use an established bootstrap guide for your chosen tool and CNI, then verify each stage: nodes ready, system pods running, test pod running, then storage/ingress.

Ready-to-use AWS Account Conclusion: you built a cluster, congratulations (and condolences to your future self)

Deploying a Kubernetes cluster on AWS EC2 is a rite of passage. You start with EC2 instances, you install Kubernetes components, you bring up the control plane, you join workers, and then you install a CNI so pods can exist in the same reality. After that, you validate everything and add essentials like storage and ingress.

The key is structured verification: don’t jump straight to deploying a complicated app before the basics are solid. If nodes are NotReady, Kubernetes will not care that your deployment YAML is “beautiful.” It will simply fail and look smug about it.

Follow the steps, sanity-check networking and IAM, and you’ll have a cluster that does what you want, not what it feels like.