Phase 1 - Observability with Prometheus, Grafana & Loki

Series: Kubernetes Homelab on VMware Workstation Prerequisites: Phase 0 - Argo CD & GitOps complete Source Code: jmartinez-homelab-gitops

What We’re Building

By the end of this guide, you will have:

  • Prometheus collecting metrics from nodes, pods, and Kubernetes objects
  • Grafana with dashboards for cluster and application monitoring
  • Loki + Promtail for centralized log aggregation
  • All accessible via Traefik ingress at grafana.lab.local

Why Observability

You can’t manage what you can’t measure. In Kubernetes, you need visibility into:

  • Metrics — CPU, memory, network, request rates, error rates
  • Logs — Application output, system events, errors
  • Dashboards — Visual representation of cluster health

Prometheus + Grafana is the de facto standard for Kubernetes monitoring. Loki provides logging without the resource overhead of the ELK stack — ideal for a homelab.

Before You Start

Verify Phase 0 is complete:

# Argo CD running
kubectl get pods -n argocd
 
# Online Boutique deployed and synced
kubectl get application -n argocd
# NAME              SYNC STATUS   HEALTH STATUS
# online-boutique   Synced        Healthy

Step 0: Resize VMs

The monitoring stack requires more resources than the default VM allocations. If you’re running VMware Workstation on a 40GB laptop, allocate:

VMMemoryRationale
k3s-server8 GBControl-plane + Argo CD + scheduling
k3s-agent-14 GBMonitoring stack (Prometheus, Grafana)
k3s-agent-24 GBApplication workloads
k3s-agent-34 GBStateful workloads + overflow
Host~20 GBVMware + OS overhead

Total VM allocation: 20 GB. Leaves 20 GB for laptop OS.

How to Resize

  1. Shut down the VM: sudo shutdown -h now
  2. VMware Workstation → right-click VM → SettingsHardwareMemory
  3. Adjust to recommended value
  4. Start the VM
  5. Verify: kubectl describe node <node-name> | grep -A 5 Capacity

Step 1: Install Helm

Helm is the package manager for Kubernetes — similar to apt or brew, but for cluster applications.

curl -fsSL https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
helm version

Step 2: Add Helm Repos

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo add grafana https://grafana.github.io/helm-charts
helm repo update

Step 3: Deploy Prometheus + Grafana

The kube-prometheus-stack Helm chart bundles Prometheus, Grafana, Alertmanager, kube-state-metrics, and node-exporter in a single install.

kubectl create namespace monitoring
 
helm install kube-prometheus-stack prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --set prometheus.prometheusSpec.resources.requests.memory=512Mi \
  --set prometheus.prometheusSpec.resources.limits.memory=1Gi \
  --set prometheus.prometheusSpec.resources.requests.cpu=200m \
  --set prometheus.prometheusSpec.resources.limits.cpu=500m \
  --set prometheus.prometheusSpec.retention=7d \
  --set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage=5Gi \
  --set grafana.resources.requests.memory=128Mi \
  --set grafana.resources.limits.memory=256Mi \
  --set grafana.resources.requests.cpu=100m \
  --set alertmanager.alertmanagerSpec.resources.requests.memory=128Mi \
  --set alertmanager.alertmanagerSpec.resources.limits.memory=256Mi \
  --set kubeStateMetrics.resources.requests.memory=64Mi \
  --set prometheus-node-exporter.resources.requests.memory=32Mi

Resource limits are tuned for a homelab with ~18 GB total cluster RAM. Adjust if your setup differs.

What this deploys:

ComponentPurpose
PrometheusScrapes and stores time-series metrics
GrafanaDashboarding and visualization
AlertmanagerRoutes alerts to notification channels
kube-state-metricsExposes Kubernetes object states as metrics
node-exporterDaemonSet collecting hardware/OS metrics from each node

Verify:

kubectl get pods -n monitoring
# All pods should be Running

Step 4: Access Grafana

Get the Admin Password

kubectl get secret -n monitoring kube-prometheus-stack-grafana \
  -o jsonpath="{.data.admin-password}" | base64 -d; echo

Default username: admin

Option A: Port-forward (quick)

kubectl port-forward -n monitoring svc/kube-prometheus-stack-grafana 3000:80
# Open http://localhost:3000

Option B: Ingress (permanent)

kubectl apply -f - <<EOF
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: grafana-ingress
  namespace: monitoring
  annotations:
    traefik.ingress.kubernetes.io/router.entrypoints: web
spec:
  rules:
    - host: grafana.lab.local
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: kube-prometheus-stack-grafana
                port:
                  number: 80
EOF

Update your /etc/hosts file:

<NODE_IP> boutique.lab.local argocd.lab.local grafana.lab.local

Access at http://grafana.lab.local

Step 5: Deploy Loki + Promtail

Loki is a log aggregation system designed to be Prometheus-like but for logs. Promtail is the agent that ships logs from nodes to Loki.

# Loki — single-binary mode for small clusters
helm install loki grafana/loki \
  --namespace monitoring \
  --set deploymentMode=SingleBinary \
  --set loki.commonConfig.replication_factor=1 \
  --set loki.storage.type=filesystem \
  --set singleBinary.replicas=1 \
  --set singleBinary.resources.requests.memory=256Mi \
  --set singleBinary.resources.limits.memory=512Mi \
  --set singleBinary.resources.requests.cpu=100m \
  --set singleBinary.persistence.size=5Gi \
  --set monitoring.selfMonitoring.grafanaAgent.installOperator=false \
  --set gateway.enabled=false \
  --set read.replicas=0 \
  --set write.replicas=0 \
  --set backend.replicas=0
 
# Promtail — collects and ships logs
helm install promtail grafana/promtail \
  --namespace monitoring \
  --set config.clients[0].url=http://loki.monitoring.svc:3100/loki/api/v1/push \
  --set resources.requests.memory=64Mi \
  --set resources.limits.memory=128Mi

Step 6: Connect Loki to Grafana

kubectl apply -f - <<EOF
apiVersion: v1
kind: ConfigMap
metadata:
  name: loki-datasource
  namespace: monitoring
  labels:
    grafana_datasource: "1"
data:
  loki-datasource.yaml: |
    apiVersion: 1
    datasources:
      - name: Loki
        type: loki
        access: proxy
        url: http://loki.monitoring.svc:3100
        isDefault: false
        editable: true
EOF

Grafana auto-discovers this ConfigMap and adds Loki as a data source.

Step 7: Import Dashboards

In Grafana → Dashboards → Import, enter these community dashboard IDs:

DashboardIDDescription
Kubernetes Cluster Monitoring315Node/pod CPU, memory, network overview
Node Exporter Full1860Detailed hardware and OS metrics
Kubernetes Pods6417Per-pod resource usage
Loki Logs13639Log search and filtering interface

Step 8: Verify

# All monitoring pods running
kubectl get pods -n monitoring
 
# Prometheus scraping targets
kubectl port-forward -n monitoring svc/kube-prometheus-stack-prometheus 9090:9090
# Open http://localhost:9090/targets — all targets should be UP
 
# Loki is ready
kubectl port-forward -n monitoring svc/loki 3100:3100
curl -s http://localhost:3100/ready
 
# Resource usage
kubectl top nodes
kubectl top pods -n monitoring

Expected Resource Usage

ComponentMemoryCPU
Prometheus512 Mi – 1 Gi200m – 500m
Grafana128 – 256 Mi100m
Alertmanager128 – 256 Mi
Loki256 – 512 Mi100m
Promtail64 – 128 Mi
node-exporter (3 pods)~96 Mi
kube-state-metrics64 Mi
Total~1.2 – 2.3 Gi~400m – 700m

Access Summary

ServiceURLCredentials
Online Boutiquehttp://boutique.lab.local
Argo CD UIhttp://argocd.lab.localadmin / bootstrap password
Grafanahttp://grafana.lab.localadmin / helm-generated password

Troubleshooting

IssueFix
Grafana ingress returns 404Verify ingress exists: kubectl get ingress -n monitoring
Prometheus targets DOWNCheck pod logs: kubectl logs -n monitoring deploy/kube-prometheus-stack-prometheus
Loki not receiving logsVerify Promtail: kubectl logs -n monitoring daemonset/promtail
Pods stuck in PendingNode out of resources: kubectl describe pod <name> -n monitoring

GitOps Integration

To manage this stack via Argo CD instead of manual Helm commands, migrate to Kustomize overlays:

infrastructure/
└── monitoring/
    ├── base/
    │   ├── kustomization.yaml
    │   └── namespace.yaml
    └── overlays/lab/
        ├── kustomization.yaml
        ├── values.yaml         # Helm values
        └── ingress.yaml        # Grafana ingress

Then add an Argo CD Application in bootstrap/argocd/apps/ to manage it (see Phase 0 for the App of Apps pattern).