Fix: Monitoring Stack Sync & Resource Issues
Date: 2026-04-08 Context: Monitoring stack (Prometheus, Loki) stuck in Progressing/OutOfSync state in Argo CD Source: jmartinez-homelab-gitops
Issues Encountered
- Argo CD Application stuck in Progressing state
- loki-chunks-cache pod stuck in Pending (0/2)
- kube-prometheus-stack-prometheus pod not deploying
- StatefulSet update forbidden error
Root Causes & Fixes
1. Missing kustomize.buildOptions for Helm Charts
Error: Kustomize failed to build with helmCharts
error: trouble configuring builtin HelmChartInflationGenerator
`: must specify --enable-helm
Root Cause: Argo CD needs --enable-helm flag to process helmCharts in kustomization.yaml
Fix: Initially tried adding kustomize.buildOptions to Argo CD Application, but this field is invalid in the Application spec. The correct approach is to ensure Argo CD is configured to handle Helm charts via Kustomize.
Resolution: Removed the invalid field. Argo CD with ServerSideApply handles helmCharts natively when includeCRDs: true is set.
# This is INVALID - do not use
spec:
source:
kustomize:
buildOptions: "--enable-helm"2. Insufficient Memory for Pods
Error: Pods stuck in Pending with Insufficient memory
0/4 nodes are available: 4 Insufficient memory
Root Cause: Memory requests/limits too high for homelab environment
Fix: Reduced resource allocations in prometheus-values.yaml and loki-values.yaml
Before → After:
| Component | Memory Request | Memory Limit |
|---|---|---|
| Prometheus | 512Mi → 256Mi | 1Gi → 512Mi |
| Grafana | 128Mi → 64Mi | 256Mi → 128Mi |
| Alertmanager | 128Mi → 64Mi | 256Mi → 128Mi |
| Loki | 256Mi → 128Mi | 512Mi → 256Mi |
| kubeStateMetrics | 64Mi → 32Mi | — → 64Mi |
| node-exporter | 32Mi → 16Mi | — → 32Mi |
Total memory reduced: ~1.18Gi → ~0.56Gi
3. Loki chunks-cache & results-cache Stuck
Error: loki-chunks-cache-0 stuck in Pending, Argo CD waiting for healthy state
Root Cause: Loki Helm chart deploys chunks-cache and results-cache StatefulSets by default, even in SingleBinary mode. These use additional memory.
Fix: Explicitly disable in loki-values.yaml:
chunksCache:
enabled: false
resultsCache:
enabled: falseImportant: Delete existing StatefulSets after changing values:
kubectl delete statefulset loki-chunks-cache -n monitoring
kubectl delete statefulset loki-results-cache -n monitoring4. Prometheus PVC Missing accessModes
Error: Prometheus pod not deploying, PVC issues
Fix: Added accessModes to volumeClaimTemplate in prometheus-values.yaml:
prometheus:
prometheusSpec:
storageSpec:
volumeClaimTemplate:
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 2GiAlso added podAntiAffinity: "soft" for better scheduling.
5. StatefulSet Update Forbidden
Error:
StatefulSet.apps "loki" is invalid: spec: Forbidden: updates to statefulset spec
for fields other than 'replicas', 'ordinals', 'template', 'updateStrategy',
'revisionHistoryLimit', 'persistentVolumeClaimRetentionPolicy' and
'minReadySeconds' are forbidden
Root Cause: StatefulSets have immutable fields. Changing storage size or other non-updatable fields requires recreation.
Fix: Delete and recreate the StatefulSet:
kubectl delete statefulset loki -n monitoring
# PVC is auto-deleted due to persistentVolumeClaimRetentionPolicy.whenDeleted: Delete
# Argo CD will recreate with new configurationArgo CD Sync Stuck on Old Revision
Error: Sync operation stuck on old commit revision
Fix: Delete and recreate the Argo CD Application:
kubectl delete application monitoring -n argocd
kubectl apply -f bootstrap/argocd/apps/monitoring-app.yamlDiagnostic Commands
# Check Argo CD application status
kubectl get application monitoring -n argocd
# Check sync revision
kubectl get application monitoring -n argocd -o jsonpath='{.status.sync.revision}'
# Check operation state
kubectl get application monitoring -n argocd -o jsonpath='{.status.operationState}'
# Check pod events
kubectl describe pod <pod-name> -n monitoring
# Check PVC status
kubectl get pvc -n monitoring
# Check StatefulSets
kubectl get statefulset -n monitoring
# Force sync by deleting operation state
kubectl patch application monitoring -n argocd --type merge -p '{"status":{"operationState":null}}'Final Working Configuration
prometheus-values.yaml
prometheus:
prometheusSpec:
resources:
requests:
memory: 256Mi
cpu: 100m
limits:
memory: 512Mi
cpu: 250m
retention: 7d
storageSpec:
volumeClaimTemplate:
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 2Gi
podAntiAffinity: "soft"
grafana:
resources:
requests:
memory: 64Mi
cpu: 50m
limits:
memory: 128Mi
alertmanager:
alertmanagerSpec:
resources:
requests:
memory: 64Mi
limits:
memory: 128Mi
kubeStateMetrics:
resources:
requests:
memory: 32Mi
cpu: 10m
limits:
memory: 64Mi
cpu: 50m
prometheus-node-exporter:
resources:
requests:
memory: 16Mi
cpu: 10m
limits:
memory: 32Mi
cpu: 50mloki-values.yaml
deploymentMode: SingleBinary
loki:
useTestSchema: true
commonConfig:
replication_factor: 1
storage:
type: filesystem
singleBinary:
replicas: 1
resources:
requests:
memory: 128Mi
cpu: 50m
limits:
memory: 256Mi
persistence:
size: 2Gi
monitoring:
selfMonitoring:
grafanaAgent:
installOperator: false
gateway:
enabled: false
read:
replicas: 0
write:
replicas: 0
backend:
replicas: 0
chunksCache:
enabled: false
resultsCache:
enabled: falseLessons Learned
- StatefulSets are immutable — Storage size and other fields cannot be updated. Delete and recreate.
- Argo CD sync can get stuck — Delete the Application and recreate to force fresh sync.
- Memory is precious in homelab — Start with minimal resource requests and increase as needed.
- Disable unused components — Loki’s cache components consume memory even if not needed.
- Check PVC accessModes — Always specify
ReadWriteOncefor single-node PVCs.