OTEL SDK: Concepts & Design
OpenTelemetry is an observability framework and an active CNCF project that provides a vendor-neutral and tool-agnostic way to collect observability signals across your heterogeneous system.
In this blog post series, we will dig into two big questions:
- how OpenTelemetry instrumentation works on the application side taking Pythonâs SDK as an example. We will touch traces, metrics, logs and context propagation across services.
- how OpenTelemetry Collector works under the hood and some of the interesting engineering decisions made there (see the OTEL Collector blog post)
The original article was posted on my website. Go and check it out!
Gotta be fun đ
The Problem
With the rise of the open source community, people and organizations donât really want to invest in proprietary protocols and standards anymore. Instead, itâs mainstream to pick a widely-recognized open source project as a basis to build on top of it.
In the observability domain, there are enough open source protocols that cover some of the three key pillars:
Logs
- stdout and stderr streams your services or application outputs while running in the container.Metrics
- aggregations of some measurements over a period of timeTraces
- a visualization of the steps or the execution path that your workflow passed
When you think about metrics, Prometheus and statsd may come to your mind. On the traces side, there are Jaeger and Zipkin. There are also various beats that could scrap your logs like fluentd or Grafana Loki.
Now if you want to cover all three pillars, you need to pick a combination of collectors/protocols to cover each one (well, some are capable of covering a few pillars). However, there are still a few problems left:
- even though the protocols are open source, they may still tightly connect your application to the underlying collector or storage, so it wonât be that easy to switch gears and use something else.
- we have divided observability into three pieces, but in reality, they are three different signals or points of view on the application work, so we may get the whole picture and max value out of them when they are well connected and correlated for us.
To sum it up, we want to have:
- one open source vendor-lock-free protocol to rule all observability signals.
- a stable abstraction for us for the underlying observability storage(s) without a need to migrate our service every time that storage changes.
- a coverage of popular programming languages, not to be limited in what our tech stack looks like.
The OTEL Origin
This ubiquitous open protocol idea is so compelling that at some point there were two projects, OpenTracing and OpenCensus, that were trying to fill the gap.
They were trying to compete with each other, but there was not the right context to do that, so it didnât make sense to try to conquer some market shares, but rather to consolidate their effort and come there much quicker shaping the space.
Thatâs what happened. Both projects were merged into one known as OpenTelemetry (aka OTEL).
The SDK
Collecting logs, metrics and traces in a unified way across services implemented in different technical stacks is the central task of OpenTelemetry. To get there, OpenTelemetry provides:
- SDKs for 11+ of the most popular languages (like Python, Go, NodeJS, Rust, Java, etc) that inits OTEL core components
- library-specific instrumentations that provide tool/framework-specific signals and context automagically (e.g. Starlette, HTTPX, aiopika instrumentations, and so on)
The third thing you could do is to further instrument your codebase with business logic specific traces and metrics.
This process is generally known as codebase instrumentation.
There are two ways to setup OTEL in your application:
- automatic â when you run some agent before the main application entry point that configures OpenTelemetry (but not all languages support it, for example, Golang doesnât)
- manual â when you configure OpenTelemetry yourself to start collecting your observability signals.
To understand how OpenTelemetry SDK is designed and implemented, we will delve into the manual setup.
This is what it takes you to manually setup Pythonâs service:
from opentelemetry import metrics, trace
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.sdk.metrics.export import (
ConsoleMetricExporter,
PeriodicExportingMetricReader,
)
from opentelemetry.sdk.resources import SERVICE_NAME, SERVICE_VERSION, Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor, ConsoleSpanExporter
span_processor = BatchSpanProcessor(ConsoleSpanExporter())
metric_reader = PeriodicExportingMetricReader(ConsoleMetricExporter())
resource = Resource(
attributes={
SERVICE_NAME: "notifications",
SERVICE_VERSION: "v42",
}
)
trace_provider = TracerProvider(resource=resource)
metrics_provider = MeterProvider(metric_readers=[metric_reader])
trace_provider.add_span_processor(span_processor)
# Sets the global default providers
trace.set_tracer_provider(trace_provider)
metrics.set_meter_provider(metrics_provider)
# Creates a custom tracer from the global provider
tracer = trace.get_tracer("users")
# Creates a custom meter from the global provider
meter = metrics.get_meter("users")
Letâs try to unpack whatâs going on there.
Resources
First of all, letâs pay attention to the Resource
. It is an abstraction around entities that could generate signals:
from opentelemetry.sdk.resources import SERVICE_NAME, SERVICE_VERSION, Resource
resource = Resource(
attributes={
SERVICE_NAME: "notifications",
SERVICE_VERSION: "v42",
}
)
Right off the bat, Resource
illustrates a few important concepts in the observability domain.
Most context in the observability signals are going to be conveyed via attributes
or tags which are essentially key-value pairs. Then, the observability backend could do some processing of these values to correlate various pieces of information.
OTEL strives to standardize the attribute names to keep them consistent across your system. Thatâs why they come as constants. In practice, this is an extremely daunting and challenging task, especially when multiple teams are working on different services in parallel.
Providers
Pythonâs SDK comes with two modules called trace
and metrics
. Both modules contain a global variable that holds the current provider and a setter method like set_tracer_provider()
to configure it.
If you donât want to configure a real provider, there are also NoOpTracerProvider
or NoOpMeterProvider
that are helpful to keep all custom instrumentations in place without a need for guarding them via if
s, for example.
Design Patterns in Wild
All providers have NoOp implementations which is a great example of the null object pattern. Since telemetry could spread across codebase, it would be disastrous to have
if
statements everywhere we had it in case we have no proper observability setup under some circumstances like during automated testing.
Then, the rest of the codebase refers to the global providers when it needs to create traces or metrics.
Traces
The TracerProvider
is basically a factory that creates Tracer
s and passes most of its params down to a Tracer
. Tracer
s represent the specific trace
that could contain many spans
.
Conceptually, traces
are connected to the specific workflow or operation in the system. Thatâs why their names should be the same for the same processes e.g. GET /users/{user_id}/
may represent all requests to an API that returns userâs data.
Then, spans
may represent some steps in your workflow. For instance, to get the user information, you may need to perform a request to your database. That action may be wrapped into a span
. Additional attributes
could be added to the span to record some events, action result statuses, etc. In the end, all spans
form a hierarchical tree that could be viewed in the observability backend.
Using the tracer inited above, we can create a new trace via:
# the root span
with tracer.start_as_current_span("users:get-info") as span:
span.set_attribute("user_id", str(...))
# nested/child span
with tracer.start_as_current_span("users:get-info:check-permissions") as span:
...
Spans
In OTEL protocol, traces
are rather virtual entities that hold some execution context. The real data points that are being collected, processed and exported are spans
.
The Span
consists of:
- name â the human-friendly title of the step
- kind like internal, server, client, producer, consumer
- status like
not set
,ok
,error
- the span context
- the parent span context
- the resource context
- the trace or instrumentation scope
- span
attributes
- span events â special entities with the name, timestamp and own set of
attributes
- span links to other spans that have caused the current span
- start_time & end_time as
time.time_ns()
The Span
class is also a context manager, so when it starts and exits, the span signals the SpanProcessor
about that.
Span Sampling
There is also an optional opportunity to configure a span sampler.
Sampling is a way to filter out some spans or other data points if they match some specific criteria or just randomly. The major reasons to sample are:
- to optimize the cost of ingesting and storing observability signals
- filter out boring regular data and mostly keep interesting one e.g. spans with errors, that took more than the threshold
- merely filter based on the presence or absence of
attributes
Sampling on the SDK side could allow filtering as soon as possible in the pipeline (or the head sampling).
OTEL comes with a few span samplers out of the box:
StaticSampler
that always or never drops spansTraceIdRatio
that probabilistically drops a given portion of spans
These two samplers could be also configured to respect parent span decisions, so if the parent span was dropped, all its children spans would be eliminated, too.
They were trying to compete with each other, but there was not the right context to do that, so it didnât make sense to try to conquer some market shares, but rather to consolidate their effort and come there much quicker shaping the space.
Thatâs what happened. Both projects were merged into one known as OpenTelemetry (aka OTEL).
Design Patterns in Wild
Parent span aware sampling is implemented using the composite design pattern.
The parent-based sampler wraps the static or ration-based sampler mentioned above and adds a logic to propagate the parent spanâs sampling decision.
Span Processors
When spans end, they come to span processors. Span processing is the last stage of spanâs lifecycle before itâs exported outside of the service. OpenTelemetry uses it to batch spans and multiplex them to multiple exporters.
The BatchSpanProcessor
collects spans and exports them on schedule or when its queue is full. For that, it maintains a separate daemon thread where this logic is executed. The processor waits until a batch is collected or a timeout is reached. Threading.condition is used to implement this elegantly.
There are also two MultiSpanProcessors
that operate on a list of span processors and dispatch them sequentially or concurrently.
The concurrent span processing happens via ThreadPoolExecutor.
Ctx Propagation
So far we have been talking about trace spans processing in the scope of one services, but the real value of traces unlocks when multiple microservices take part in a workflow and you could connect what they were doing there into one coherent picture.
In order to do that, OpenTelemetry needs to propagate some context during cross-service communication, so the next microservice in the flow would know that all spans it was going to create should be attached to the parent trace created at the start of the workflow.
OTEL holds propagatable information in the runtime context. In Pythonâs SDK, itâs implemented using the convenient contextvars standard library which is a perfect mechanism to have scoped âglobalâ variables on the level of asyncio.Task-s, for example.
By default, OTEL defines two context vars:
- the current span â holds a reference to the currently active OTEL trace span
- the baggage â holds arbitrary key-value information that should be propagated as an additional context for every microservice involved
Another important thing here is the actual context propagation which incurs context serialization and deserialization.
In the realm of HTTP-based communication, OpenTelemetry propagates the context via HTTP headers according to the W3C Trace Context specification. According to the specification, the current span context is propagated as two headers:
traceparent
- with trace span ID, parent span ID and parent span flags (e.g. trace was marked as sampled or not)tracestate
- arbitrary vendor-specific trace context or identifiers
The baggage context doesnât seem to be outlined in the specification, but OTEL shares it as one more identically named HTTP header.
The microservice that receives such requests should be aware of context information in the headers, extract it and set it in the local runtime context.
Since OpenTelemetry is framework- or transport-protocol-agnostic, it just provides all needed functions to extract or inject context. To actually propagate that information, you should instrument your clients and servers and OTEL provides plenty of auto-instrumentors in their registry.
Metrics
Metrics are another observability pillar we are going to review next. The mechanism of collecting metrics is a bit different to traces.
In case of traces, user code actively creates trace spans and as soon as they are completed, OpenTelemetry can process and export them. The metrics dynamic is much more continuous, so the collection really ends when the service shuts down. Since service uptime could be measured in days if not weeks, we need to take a different approach here to export all measurements in between like doing metric data aggregations periodically over a time window.
Just like in the case of traces, OpenTelemetry provides MeterProvider that bridges all metrics with metric exporters. MeterProvider creates new instances of Meter
. Meters are instrumentation-specific measurement components. Each OTEL instrumentation library creates its own meter (e.g. HTTP client or server meters). When you do your custom measurements itâs alright to have one global meter per service (but you certainly could have more).
Using the meter inited above, you can measure a custom metric like that:
# globally defined custom metric
user_info_cache_miss_counter = meter.create_counter(
"users.cache.miss",
description="The number of cache misses when fetching user's info",
)
# later on, you can import the metric and measure what you need
user_info_cache_miss_counter.add(1, {"user.org_id": ...})
Metric Instruments
Now, having a meter, you could create specific metrics (a.k.a. metric instruments) that you want to measure or observe. Generally, OTEL divides metrics into two categories:
Synchronous metrics
- there are metrics that you measure directly right in your service workflows, so you observe them as soon as the event happens (e.g. a service increases a counter metric in the user login workflow)Asynchronous (or observable) metrics
- these metrics are read from âexternalâ sources, so you just observe an aggregated or in-time statistics instead of measuring the value directly (e.g. number of items in a queue given that you cannot instrument the queue directly and you could just read its size property)
OpenTelemetry supports the following metric types:
- Counter (and Observable Counter) â an ever-growing (or monotonically increasing) value (for example, the number of requests processed by service)
- UpDownCounter (and Observable UpDownCounter) â a value that could grow or fall (for example, the number of in-flight requests)
- Histogram (and Observable Histogram) â suitable for measurements on which you want to calculate statistics (for example, request latency)
- Gauge â just like the observable UpDownCounter, but each measurement is treated as a separate data point, so they are not summed up (for example, CPU or RAM utilizations)
Views & Aggregations
With our metrics defined, we could start measuring, aggregating and collecting actual values.
Metric instruments donât collect data directly but rather send it to MeasurementConsumer
which is a global component initiated on the MeterProviderâs level. MeasurementConsumer
collects data for each and all MetricReader
s configured on the provider.
Thinking about our source metrics data, itâs just arrays of numbers with attributes (or one number at the time in the case of observable instruments), so there are multiple options possible how they could be aggregated. OpenTelemetry provides great flexibility there.
First of all, aggregations could be configured on the metric exporter side. Maybe, you are an observability backend vendor like DataDog or Chronosphere and you come up with a better or specific way to deal with metric data points. This would be an opportunity for you to adjust exported data.
Then, you could leverage a concept of views to override the config further. Views effectively allow to specify the aggregation strategy per metric instrument and its metadata (e.g. name, attributes).
By default, OTEL implements the following aggregations:
- Drop Aggregation â a way to drop metric collection completely.
- Last Value Aggregation â keeps the last aggregated value until itâs collected (used by gauges).
- Sum Aggregation â arithmetic sum of provided data points (used by counters).
- Explicit Bucket Histogram Aggregation â assigns collected data points to one of 15 predefined buckets. Besides that, it collects sum, count, min and max across given data.
- Exponential Bucket Histogram Aggregation â similar to the explicit bucket aggregation, but buckets are generated dynamically by the exponentially growing size of the next bucket and much more fine grained (by default, there are 150 buckets).
The sum and histogram aggregations may collect data between probes as:
- deltas e.g. differences between the previous aggregated stats (e.g. sums, counts) and the current ones.
- cumulative data e.g. the previous and the current stats are summed up (so the values keep increasing over time).
This is called aggregation temporality.
Metric Readers
As we briefly have mentioned, metrics are collected on schedule by MetricReader like PeriodicExportingMetricReader which holds a ticker in a separate thread. The ticker initiates the metrics collection process that creates exportable data according to metrics aggregation temporalities.
Logs
Finally, we get to logs. They are the most wide-spread among the signals and have the longest record of being used for diagnosing how code works.
Logs are simple to operate. You could just write them to a standard output or a file, and then dredge them when you need. No need to have special viewers like you would need in case of traces or metrics. Thatâs why pretty much all languages have their off-the-shelf logging libraries to use.
Ironically, logs have gotten the last into OpenTelemetry. The integration is either in the experimental stage or doesnât exist for most languages at the moment.
OpenTelemetry bridges into the existing logging libraries to integrate logs seamlessly for applications. In Pythonâs world, there is the standard logging package. OTEL comes with a LoggingHandler
that plugs the rest of components into the logging system. The LoggingHandler also translates logging.LogRecord into OTELâs LogRecord data class.
Design Patterns in Wild
The OTELâs
LoggingHandler
is a good example of applying the adapter pattern that allowed OpenTelemetry to have its own architecture despite a variety of logging libraries it needed to support.
The remaining architecture resembles what we have reviewed in the trace part. There is a dedicated LoggerProvider that holds log processors and exporters attached to the processors. The processors send logs to exporters for further saving in observability backends.
SemConv
Before wrapping up, we need to touch on another important topic that is tangentially connected to service instrumentations and metrics.
Imagine you have three teams in a company that run different subsystems. Now letâs ask them to define golden signal metrics for their services and see what metric names they come up with. Chances are we would get three different sets of names for semantically the same metrics. Thatâs even more likely if they work with different tech stacks with different conventions and naming standards.
Such a lack of consistency would create a lot of mess and hinder reuse of common dashboards, alerts, etc. The same situations can happen in traces when we instrument database queries, object storage access, etc.
OTEL recognized this problem and came up with a set of common names for common operations across all three signals. So if you use it, you can come up with a very unified view of the whole system when looking at it through an observability lens.
Conclusion
In this article, we have reviewed OpenTelemetry integration from the service development perspective by looking into the internals of the SDK.
In the next part, we will see what happens with logs when they come to the OTEL Collector.