Autoscaling on Google Cloud
The MonkeysLegion stack runs great on three GCP primitives—pick the one that matches your traffic profile and ops comfort level.
Option | When to choose | Highlights |
---|---|---|
Cloud Run | HTTP-only apps, <30-min requests | per-request billing, zero-to-N autoscale, HTTPS out of the box |
Managed Instance Group (MIG) | Stateful uploads, WebSockets, long-lived workers | VM-level control, custom machine types, “autoscaler” based on CPU / Load Balancer |
Google Kubernetes Engine (GKE) | Multiple services, background queues, complex networking | HorizontalPodAutoscaler, service mesh, spot node pools |
1 · Cloud Run (simplest)
FROM gcr.io/google-appengine/php82:latest
COPY . /app
WORKDIR /app
RUN composer install --no-dev --optimize-autoloader \
&& php vendor/bin/ml config:cache \
&& php vendor/bin/ml view:clear
CMD ["php", "-S", "0.0.0.0:8080", "-t", "public"]
gcloud run deploy myapp \
--region=us-central1 \
--source=. \
--allow-unauthenticated \
--min-instances=0 \
--max-instances=20 \
--set-env-vars "APP_ENV=prod,DB_HOST=/cloudsql/myproj:us-central1:mydb,JWT_SECRET=$(cat jwt.pem)"
Scale to zero when idle; scales up to 20 containers (adjustable).
Concurrency defaults to 80—tune via --concurrency 16 for predictable memory.
2 · Compute Engine MIG + Load Balancer
Terraform snippet
resource "google_compute_instance_template" "ml" {
name = "ml-template"
machine_type= "e2-micro"
metadata_startup_script = <<-SH
#!/bin/bash
cd /var/www/myapp
git pull
composer install --no-dev --optimize-autoloader
php vendor/bin/ml migrate --env=prod
systemctl restart php-fpm nginx
SH
disk {
source_image = "projects/debian-cloud/global/images/family/debian-12"
auto_delete = true
boot = true
}
}
resource "google_compute_region_instance_group_manager" "ml" {
name = "ml-group"
base_instance_name = "ml"
region = "us-central1"
versions {
instance_template = google_compute_instance_template.ml.id
}
target_size = 2
auto_healing_policies {
health_check = google_compute_health_check.app.id
initial_delay_sec = 90
}
}
resource "google_compute_autoscaler" "ml" {
name = "ml-auto"
region = "us-central1"
target = google_compute_region_instance_group_manager.ml.id
autoscaling_policy {
max_replicas = 10
min_replicas = 2
cpu_utilization {
target = 0.6
}
}
}
Horizontal scaling based on average CPU ≥ 60 %.
Put a Global HTTP(S) Load Balancer in front; health check hits /healthz.
3 · GKE with HorizontalPodAutoscaler
apiVersion: apps/v1
kind: Deployment
metadata:
name: ml
spec:
replicas: 2
selector:
matchLabels: app: ml
template:
metadata:
labels:
app: ml
spec:
containers:
- name: php
image: gcr.io/myproj/ml:1.4.0
ports: [{ containerPort: 8080 }]
resources:
requests: { cpu: "100m", memory: "128Mi" }
limits: { cpu: "500m", memory: "512Mi" }
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: ml-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: ml
minReplicas: 2
maxReplicas: 15
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 60
Scales pods between 2 and 15 based on CPU.
Add Prometheus metrics + Custom Metrics API if you prefer QPS-based scaling (request_per_second).
Use an internal Cloud SQL Proxy sidecar or Cloud SQL Auth Proxy to reach managed databases.
4 · Sticky uploads & cache
If you store user-uploaded files or cache to disk, mount Filestore or Cloud Storage FUSE; both are supported in MIGs and GKE. Cloud Run requires Cloud Storage.
5 · Secrets and config
Store runtime secrets in Secret Manager; export via --set-secrets (Cloud Run) or secretEnv (GKE).
Keep .mlc config in your repo, override sensitive keys with env vars (env("DB_PASS")).
JWT private key can live in Secret Manager; mount to /var/secrets/jwt.pem.
6 · Zero-downtime deploys
Platform | Strategy |
---|---|
Cloud Run | Default gradual rollout (traffic‐split) or --revision-suffix $SHA + manual promotes |
MIG | instance_group_manager supports rolling update with minimal-action=REPLACE |
GKE | kubectl rollout restart deployment/ml triggers rolling strategy (configurable maxUnavailable) |
7 · Observability
Signal | GCP product | MonkeysLegion tie-in |
---|---|---|
Metrics | Cloud Monitoring (Prometheus integration) | /metrics endpoint via Telemetry package |
Logs | Cloud Logging | Monolog GoogleCloudLoggingHandler |
Traces | Cloud Trace | OpenTelemetry exporter (community recipe) |
8 · Cost tips
Cloud Run: set idle CPU = off, keep min instances = 0 for side projects.
MIG: Prefer E2-micro for steady low traffic; add preemptible group for bursty tasks.
GKE: Use autopilot or enable cluster autoscaler + spot node pools.