Tracing Instrumentation: Where to start

Written by Vincent Martel | Aug 9, 2023 4:00:00 AM

Instrument your microservices for tracing

Distributed tracing becomes necessary when you want to correlate multiple requests that originates from a single user interaction with your SaaS. It becomes useful as soon as you synchronize systems from third-party providers using webhooks or interact with APIs enforcing rate limiting. This introduces asynchronous distributed tasks throughout the web app.

Can you easily tell which request triggered the webhook or caused the schedule of an asynchronous task? What if the cause of an error actually lies in a past request? Instrumenting for distributed tracing and logging gives great insights on a complete flow about :

the involved services
its overall performance
identifying undesired delays
determining the root cause of unexpected errors
validating the expected behavior
whether the behavior is impacted by the performance

Tracing Instrumentation

You can easily instrument your application with Cloud Trace, especially when the app is hosted on Google Cloud Platform. Let's cover how to do it when leveraging Cloud Run & Cloud Tasks in the context of interacting with a third-party API enforcing rate limiting. Node is used for this example, but the concepts are language agnostic.

Exclusions

Security best practices are out of scope of this article and we recommend applying the principle of least privilege when managing access to your cloud services.
CI/CD and IaC related to managing & provisioning these cloud services is also omitted

Example

When processing a request from the user, the task service interacts with a rate limited third-party API (e.g. maximum 2 requests per second). Interactions are queued in Cloud Tasks which dispatches at an acceptable rate.

The complete flow is as follow :

the user interacts
the task service queues to Cloud Tasks targeting the task executor
the task executor fulfills the interaction by consuming the third-party API

(coming soon) Check here how to deploy and manage this infrastructure.

The goal will be to correlate the original request from the user with the actual request that executes the task.
Additionally, we want to group the related logs. Despite the complex nature of distributed systems, we can achieve this with minimal configuration.

Note that Cloud Trace leverages & encourages OpenTelemetry. It is assumed that you already know its concepts. Being open-source, OpenTelemetry allows vendor-agnostic instrumentation of your SaaS product.

Propagate `traceparent`

Cloud Run supports the standard W3C trace context propagation header traceparent. This header is present on each Cloud Run request. So the key idea is to override this header for each request deriving from a user interaction.

You can of course do this whether the request directly targets another service on Cloud Run or through Cloud Tasks.

const task = await tasksClient.createTask({
  parent: tasksClient.queuePath(logging.projectId, location, "queue"),
  task: {
    httpRequest: {
      httpMethod: "POST",
      url: `https://task-executor.com/interactions`,
      headers: {
        traceparent: request.get("traceparent"),
      },
    },
  },
});

Grouping the request logs

This example uses the available Cloud Logging SDK for Node. See Cloud Logging client libraries for alternatives.

For the second part of our goal, we can leverage the header X-Cloud-Trace-Context. This header is present on each Cloud Run request. You can extract this context and include it in your log entry.

Depending on your stack, some librairies will automatically add the trace context when logging.

You might also want to further instrument by creating your own spans instead of reusing it. OpenTelemetry also provides plenty of tools to auto instrument your application.

import { Logging } from "@google-cloud/logging";

const logging = new Logging();
await logging.setProjectId();
await logging.setDetectedResource();

app.use((req, res, next) => {
  // spec: "X-Cloud-Trace-Context: TRACE_ID/SPAN_ID;o=TRACE_TRUE"
  const traceHeader = req.get("X-Cloud-Trace-Context");
  const [traceId, spanId] =
    traceHeader?.split("/").flatMap((id) => id.split(";")) || [];

  const log = logging.logSync("stdout");
  log.info(
    log.entry(
      {
        labels: { tt: "42" },
        spanId,
        trace: `projects/${logging.projectId}/traces/${traceId}`,
      },
      "correlated log"
    )
  );

  next();
});

Results: Visually observing the correlations

In the Log Explorer, for each request log, you now have your related logs nested in the.

And in the Trace Explorer, you have all the logs for the complete flow.

Available source code here. source code here a

What's next:

(coming soon) How to correlate requests when third-party systems are involved (e.g. webhooks)
(coming soon) Tracing instrumentation for the frontend using Firebase Hosing

References

View full post