Kubernetes Learning Path
Homelab cluster built for learning Kubernetes and showcasing skills to employers.
Progress Overview
| Phase | Status | Tasks Completed |
|---|
| Phase 0: Foundation | ✅ Complete | 7/7 |
| Phase 1: Observability | ✅ Complete | 3/3 |
| Phase 2: Security | ⏳ Pending | 0/4 |
| Phase 3: Scaling & Resilience | ⏳ Pending | 0/3 |
| Phase 4: Storage | ⏳ Pending | 0/2 |
| Phase 5: Networking | ⏳ Pending | 0/2 |
| Phase 6: Advanced GitOps | ⏳ Pending | 0/3 |
| Phase 7: Troubleshooting | ⏳ Pending | 0/3 |
| Documentation | ⏳ Pending | 0/1 |
Overall: 10/28 tasks completed (36%)
Phase 0: Foundation ✅
| # | Task | Description | Guide |
|---|
| 0.1 | Set up k3s cluster | 4-node cluster: 1 control-plane + 3 worker agents on VMware Workstation | — |
| 0.2 | Bootstrap Argo CD | Install via Kustomize from bootstrap/argocd/ | phase-0-argocd-setup |
| 0.3 | Deploy Online Boutique | Google’s microservices demo (12 services) via Kustomize base + overlay | phase-0-argocd-setup |
| 0.4 | App of Apps pattern | Root Application discovers child Applications for GitOps scalability | phase-0-argocd-setup |
| 0.5 | Traefik Ingress setup | Host-based routing for boutique and Argo CD UI | phase-0-argocd-setup |
| 0.6 | Argo CD TLS workaround | IngressRoute + ServersTransport for HTTPS backends | phase-0-argocd-setup |
| 0.7 | Project structure | Clean bootstrap/, apps/, infrastructure/ layout | phase-0-argocd-setup |
Phase 1: Observability ✅
| # | Task | Why It Matters | Guide |
|---|
| 1.1 | Deploy Prometheus + Grafana monitoring stack | Most requested K8s skill; demonstrates metrics collection and dashboarding | phase-1-observability |
| 1.2 | Deploy Loki for centralized logging | Shows log aggregation across microservices | phase-1-observability |
| 1.3 | Create Grafana dashboards for Online Boutique | Visual proof of monitoring competency for interviews | phase-1-observability |
Completed: Reduced resource usage from ~1.18Gi to ~0.56Gi for homelab. Fixed sync issues with Argo CD.
Phase 2: Security (High Priority)
| # | Task | Why It Matters | Status |
|---|
| 2.1 | Implement Pod Security Standards (Restricted) on boutique namespace | Shows security-first mindset | Pending |
| 2.2 | Create RBAC roles and ServiceAccounts per microservice | Core K8s security concept employers test | Pending |
| 2.3 | Implement Network Policies to restrict inter-service traffic | Demonstrates zero-trust networking | Pending |
| 2.4 | Set up Secrets management (External Secrets or Sealed Secrets) | Real-world secrets handling | Pending |
Phase 3: Scaling & Resilience (High Priority)
| # | Task | Why It Matters | Status |
|---|
| 3.1 | Configure HPA on frontend and resource-intensive services | Autoscaling is a key K8s feature | Pending |
| 3.2 | Create PodDisruptionBudgets for critical services | Shows production readiness awareness | Pending |
| 3.3 | Load test frontend and observe HPA in action | Practical demonstration of scaling | Pending |
Phase 4: Storage (High Priority)
| # | Task | Why It Matters | Status |
|---|
| 4.1 | Deploy a stateful app (Redis with persistence or PostgreSQL) | Stateful workloads are common in real jobs | Pending |
| 4.2 | Create StorageClass and demonstrate PV/PVC lifecycle | Storage fundamentals | Pending |
Phase 5: Networking
| # | Task | Why It Matters | Status |
|---|
| 5.1 | Explore CoreDNS and service discovery | Understanding K8s networking internals | Pending |
| 5.2 | Deploy a second app with ingress routing (path-based or host-based) | Multi-tenant ingress patterns | Pending |
Phase 6: Advanced GitOps
| # | Task | Why It Matters | Status |
|---|
| 6.1 | Add Helm-based app to Argo CD | Helm is industry standard | Pending |
| 6.2 | Implement Argo CD ApplicationSet for multi-env (dev/staging) | Shows advanced Argo CD patterns | Pending |
| 6.3 | Configure Argo CD notifications (Slack/webhook) | CI/CD integration maturity | Pending |
Phase 7: Troubleshooting (High Priority)
| # | Task | Why It Matters | Status |
|---|
| 7.1 | Simulate pod failures and practice recovery | Troubleshooting is tested in interviews | Pending |
| 7.2 | Practice OOMKilled, CrashLoopBackOff, ImagePullBackOff debugging | Common failure modes | Pending |
| 7.3 | Simulate node failure and observe pod rescheduling | Resilience and scheduling concepts | Pending |
Related: Troubleshooting & Fixes for real issues encountered
Documentation
| # | Task | Why It Matters | Status |
|---|
| 8.1 | Document all phases in docs/ for portfolio | Showcase your work to employers | Pending |
Key Learnings So Far
Phase 0
- K3s lightweight Kubernetes distribution for homelab
- Argo CD App of Apps pattern for GitOps at scale
- Traefik Ingress with host-based routing
- Kustomize overlays for environment-specific configs
Phase 1
- kube-prometheus-stack for metrics + dashboards
- Loki single-binary mode for resource-constrained environments
- StatefulSets have immutable fields (delete to update)
- Argo CD sync can get stuck (delete Application to force fresh sync)
- Disable unused Helm components to save memory