Observability on Google Cloud: Logging, Monitoring & Error Reporting
Although Logging, Monitoring, and Error Reporting are all part of Google Cloud’s observability suite, each fills a distinct role and focuses on a different aspect of your application’s health and performance.
Overview
📋 Cloud Logging
This service is the diary of your system. It collects, stores, and analyzes log data and events from your Google Cloud services, your applications, and even your on-premises resources.
Logs are timestamped records of discrete events — a user login, a database transaction, or an application error.
Key Features
- Store and search logs — Browse individual entries, filter them by criteria (time, severity, resource type) and use the LQL query language to find specific events. The Logs Explorer is the main interface for this.
- Debug and troubleshoot — Logs provide detailed information about what happened and why, which is essential for identifying the root cause of an incident.
- Audit and security — Logs capture administrative activities and data accesses, indispensable for audit trails and security monitoring.
- Create log-based metrics — Extract numerical data from logs to build custom metrics.
📊 Cloud Monitoring
This service provides continuous observation of your system’s health and performance over time. It collects metrics — numerical data measured at regular intervals, such as CPU usage, request latency, or network traffic — from your Google Cloud resources and applications.
Key Features
- Visualize performance — Build custom dashboards with charts to track key metrics and identify trends or anomalies. The Metrics Explorer is the dedicated tool for analyzing and graphing this data.
- Detect problems — By configuring alerts, you are notified as soon as a metric exceeds a predefined threshold (e.g. CPU usage above 80% for a certain duration).
- Understand system health — Get an overview of your infrastructure and application performance to proactively identify potential issues.
- Uptime checks and synthetic monitoring — These features verify the availability and performance of your services from different geographical zones.
🚨 Cloud Error Reporting
This service specializes in identifying and aggregating application errors. It automatically analyzes logs (often those ingested by Cloud Logging) to detect exceptions, and groups similar errors together.
Key Features
- Prioritize errors — Highlights new errors or those occurring at high frequency, so you can focus your effort on the most impactful issues.
- Aggregate and analyze crashes — Counts, analyzes, and groups crashes from your cloud services in production, providing information on the number of occurrences, affected versions, and detailed stack traces.
- Receive notifications — Configure alerts to be notified as soon as new error types appear.
- Speed up debugging — By grouping errors and providing context, it helps developers quickly understand and fix application failures.
Alerts
An alert is an automated rule that monitors a condition and triggers a notification when that condition is met. In the context of Cloud Error Reporting, there are two types:
- New error alert — triggered as soon as an error group never seen before is detected. This is Error Reporting’s native alert type, which can be enabled directly from the console without advanced configuration.
- Error frequency alert — triggered when the number of occurrences of an error exceeds a threshold over a given period (e.g. more than 50 errors in 5 minutes). This type relies on a log-based metric routed to Cloud Monitoring (see next section).
An alerting policy in Cloud Monitoring consists of three elements:
Best Practices
A well-configured alert answers a precise, actionable question. A few best practices:
- Alert on new things, not on known ones — a known error that is already being tracked doesn’t need to generate an alert on every occurrence. Use new group alerts to detect regressions.
- Accompany every alert with a runbook — the documentation field of an alerting policy should contain a link to the procedure to follow, not just a generic message.
- Calibrate thresholds — an alert that fires too often gets ignored (alert fatigue). Start with wide thresholds, then refine based on your service’s actual error volume.
- Distinguish urgency from information — a critical production error deserves a synchronous channel (PagerDuty, SMS); a slow degradation can make do with an email or a team chat message.
Structured Logs & Metrics
This is where the three services stop being parallel tools and form a coherent pipeline.
Structured Logs
Structured logs are simply logs written in JSON rather than plain text. Instead of:
ERROR: request failed for user 99, latency 320ms
You emit:
{
"severity": "ERROR",
"message": "request failed",
"userId": 99,
"latencyMs": 320
}
The essential difference is that Cloud Logging can index and query each field individually. In the Logs Explorer, you can filter on jsonPayload.userId = 99 or jsonPayload.latencyMs > 500 directly. With plain text, you would be limited to string search, which is far less powerful.
Structured logs also allow Cloud Logging to recognize special reserved fields such as severity, httpRequest, trace, and spanId, which it uses to automatically enrich the log entry.
Log-Based Metrics
Log-based metrics are metrics derived directly from logs. They are the primary integration point between Cloud Logging and Cloud Monitoring. There are two types:
- Counter metrics — count the number of log entries matching a given filter over time. Example: count all entries where
severity = ERROR - Distribution metrics — extract a numerical value from a structured log field and track its statistical distribution over time. Example: extract
jsonPayload.latencyMsfrom each request log and track percentiles (p50, p95, p99)
Once defined, these metrics appear in Cloud Monitoring just like any native metric. You can chart them on dashboards, create alerts on them, and query them in the Metrics Explorer.
🔁 Summary
These three services are distinct but deeply integrated, and structured logs are the raw material that feeds the entire chain. The Log Router and Log Sinks allow you to go even further by connecting your logs to external systems in real time.
- Cloud Logging provides the raw data → what happened
- Cloud Monitoring monitors the overall system health → is something wrong right now?
- Cloud Error Reporting focuses on application errors → which errors are occurring, and how often?
- Structured logs are the fuel for the entire pipeline → the more structured your logs, the more powerful all three services become
- Log-based metrics are the bridge between Logging and Monitoring → turn any field in your logs into a first-class monitoring signal
- Log Router + Log Sink allow you to route logs to any destination → Pub/Sub, BigQuery, Cloud Storage, or an external system like ClickUp via a Cloud Function
Automation example
Your application
└─► emits structured JSON logs
└─► Cloud Logging (storage, indexing, queries)
├─► Cloud Error Reporting (reads logs, groups exceptions)
└─► Log-based metrics
└─► Cloud Monitoring (dashboards, alerts)
The practical conclusion: if your application writes unstructured plain-text logs, you are only using Cloud Logging as a basic search tool. But if you write structured JSON logs with meaningful numeric fields, you unlock the full pipeline all the way to Monitoring dashboards and alerts — without needing to instrument your application with a separate metrics library.
⚙️ Log Router, Sinks & Cloud Functions
Writing Structured Logs
On Cloud Run, writing structured logs is as simple as writing JSON to stdout — Cloud Logging ingests and indexes it automatically, no library required.
// On Cloud Run: writing to stdout is enough
const log = (severity, message, extra = {}) => {
process.stdout.write(JSON.stringify({
severity, // reserved field, natively recognized by Cloud Logging
message, // reserved field, shown as summary in the Logs Explorer
...extra,
}) + '\n');
};
// Usage
log('INFO', 'Request received', {
service: 'api-gateway',
userId: 'usr_8a3f',
latencyMs: 145,
});
log('ERROR', 'Payment failed', {
service: 'payment',
orderId: 'ord_7c2d',
reason: 'card_declined',
});
The severity and message fields are natively recognized. Every other field (here service, userId, latencyMs) immediately becomes queryable in the Logs Explorer via jsonPayload.latencyMs > 500 or jsonPayload.reason = "card_declined".
The Log Router
The Log Router is the central routing engine of Cloud Logging. It processes every log entry on arrival and evaluates it against all configured sinks to decide where to copy it.
Key points:
- Every entry is evaluated against all sinks simultaneously — the same log can be sent to multiple destinations in parallel
- Routing is based on LQL filters (e.g.
severity >= ERROR,resource.type = "cloud_run_revision") - Logs not captured by a sink are still kept in Cloud Logging’s
_Defaultbucket — a sink is a copy, not a move
Log Sinks
A Log Sink is a rule composed of two elements:
- An LQL filter — which logs to capture
- A destination — where to send them
Automation example
The following example ties everything covered so far into a single, end-to-end flow. The goal: every time a Cloud Run service emits an ERROR-level structured log, your team receives an automatic notification in a ClickUp chat — with no manual monitoring required. The diagram below shows how the pieces connect, followed by the two concrete steps needed to wire it up: creating the Pub/Sub topic and Log Sink, then deploying the Cloud Function that calls the ClickUp API.
Your Cloud Run app
└─► structured log { severity: "ERROR", ... }
└─► Cloud Logging
└─► Log Router
└─► Log Sink (filter: severity >= ERROR)
└─► Pub/Sub Topic [error-notifications]
└─► Cloud Function [notifyClickUp]
└─► ClickUp API
└─► message in the team chat
Step 1 — Pub/Sub Topic & Log Sink
# 1. Create the Pub/Sub topic
gcloud pubsub topics create error-notifications
# 2. Create the Log Sink pointing to that topic
gcloud logging sinks create error-to-pubsub \
pubsub.googleapis.com/projects/MY_PROJECT/topics/error-notifications \
--log-filter='severity >= ERROR AND resource.type = "cloud_run_revision"'
# 3. Retrieve the service account GCP generated for this sink
gcloud logging sinks describe error-to-pubsub \
--format='value(writerIdentity)'
# → serviceAccount:pXXXXX-YYYY@gcp-sa-logging.iam.gserviceaccount.com
# 4. Grant the publisher role to that service account
gcloud pubsub topics add-iam-policy-binding error-notifications \
--member='serviceAccount:WRITER_IDENTITY_HERE' \
--role='roles/pubsub.publisher'
Step 2 — The Cloud Function
const functions = require('@google-cloud/functions-framework');
functions.cloudEvent('notifyClickUp', async (cloudEvent) => {
// 1. Decode the Pub/Sub message (base64-encoded)
const pubsubData = cloudEvent.data.message.data;
const logEntry = JSON.parse(Buffer.from(pubsubData, 'base64').toString());
// 2. Extract relevant information from the log
const severity = logEntry.severity || 'UNKNOWN';
const message = logEntry.jsonPayload?.message
|| logEntry.textPayload
|| 'No message';
const service = logEntry.resource?.labels?.service_name || 'unknown service';
const timestamp = logEntry.timestamp
? new Date(logEntry.timestamp).toLocaleString('en-CA', { timeZone: 'America/Toronto' })
: new Date().toLocaleString('en-CA', { timeZone: 'America/Toronto' });
// Custom fields from your structured logs
const orderId = logEntry.jsonPayload?.orderId || null;
const reason = logEntry.jsonPayload?.reason || null;
// 3. Format the ClickUp message
const lines = [
`🚨 **[${severity}]** \`${service}\``,
`📝 ${message}`,
orderId ? `🔖 Order: \`${orderId}\`` : null,
reason ? `❌ Reason: ${reason}` : null,
`🕐 ${timestamp}`,
].filter(line => line !== null).join('\n');
// 4. Send to ClickUp chat
const viewId = process.env.CLICKUP_VIEW_ID;
const apiToken = process.env.CLICKUP_API_TOKEN;
const response = await fetch(
`https://api.clickup.com/api/v2/view/${viewId}/comment`,
{
method: 'POST',
headers: {
'Authorization': apiToken,
'Content-Type': 'application/json',
},
body: JSON.stringify({
comment_text: lines,
notify_all: true,
}),
}
);
if (!response.ok) {
console.error(JSON.stringify({
severity: 'ERROR',
message: 'Failed to send notification to ClickUp',
status: response.status,
}));
}
});
Step 3 — Deploy
The ClickUp token must never be hardcoded. Store it in Secret Manager and inject it as an environment variable at deploy time.
# Create the secret in Secret Manager
gcloud secrets create clickup-api-token \
--replication-policy="automatic"
# Add the token value (replace with your ClickUp token)
echo -n "pk_YOUR_CLICKUP_TOKEN" | \
gcloud secrets versions add clickup-api-token --data-file=-
# Deploy the Cloud Function (2nd generation)
gcloud functions deploy notifyClickUp \
--gen2 \
--runtime=nodejs20 \
--region=northamerica-northeast1 \
--trigger-topic=error-notifications \
--entry-point=notifyClickUp \
--set-env-vars="CLICKUP_VIEW_ID=YOUR_VIEW_ID" \
--set-secrets="CLICKUP_API_TOKEN=clickup-api-token:latest"
ClickUp View ID
The view_id is found directly in the URL when you open your Chat view in ClickUp:
https://app.clickup.com/{workspace_id}/v/li/{view_id}