Instrumenting an application with OpenTelemetry without editing code

Robin Kåveland

Eirik Meland

Published 2025-02-10

observabilityopentelemetry

It's a Tuesday afternoon, and management has been in a state of poorly contained hysteria all day, running in and out of meetings continuously. The team is trying to focus on work, but it's hard to avoid gossiping.

"Do you think it's ransomware?" says your colleague, and you shake your head. You've had the same thought, of course, but #security in Slack is in the normal state of blissful silence, and there's nothing going on in #ops either. It's about time for the yearly reorganization, but that usually doesn't cause so much... distress. Maybe there's been some scandal at the recent strategy retreat. Did the CEO lose their executive decision dice again?

"Do you think they're fighting about reintroducing OKRs again?"

You're about to pack up your things and go home when your team lead finally comes into the team office. They have a determined look on their face, like a decision has been made, and they don't like it. You get a sinking feeling in your gut, like you're not going to like it either.

"Rachel has quit, she won't be coming in to work again. Apparently, she has already moved to the Cayman Islands to avoid having to be on-call ever again."

Oh crap, that's bad. Rachel has been there since forever, single-handedly running dozens of important enterprise services that generate tons of revenue for the business. She's never had time to document anything, always running from one fire to another, while somehow magically managing to do maintenance work on absolutely everything. The engineers have known for years that she's the single most important person in the organization.

"Management has decided that we're taking over the Camelo service. Unfortunately, we're not getting any new resources for maintenance. Since it's business-critical and undocumented, we're not allowed to make changes to it. Oh, and from now on, we're on call."

No changes? On call?? You've never heard of this Camelo service before, how critical could that be? It's definitely not in any of the observability platforms... Is it one of those ancient things that are running in some tmux session left hanging by a long retired engineer?

"It's a critical revenue stream to the business. Nobody is really sure what exactly it's doing, but it seems to have something to do with certain dairy products. It's some JVM stack, so I said our team could take it on. They've given us some jar files, but we haven't found the source code yet."

...

Adding observability to an application without changing code

This is the second part in our blog series about using OpenTelemetry; you can read the first one here if you missed it. In this post, we're going to cover what the engineer in the introduction can do
to set up some observability and discover what the Camelo service is doing. The goal is to give you a place to start experimenting and tinkering. The official documentation is vast and somewhat challenging to navigate without going in cycles. In this post, we're introducing 2 software components of the OpenTelemetry project:

The OpenTelemetry Collector, which can serve as a one-stop shop as a destination for all your telemetry and make sure it ends up in the right observability solution.
The OpenTelemetry Java Agent, which can instrument your application without requiring any source code changes.

camelo.jar and all the code is available in this github repository if you'd like to experiment with anything yourself.

Debugging OpenTelemetry setup on a developer machine

It is hard to get anything done without a short feedback loop, so we'll start by setting up an OpenTelemetry collector locally. Later on, we will configure the collector to send data to an observability solution. For now, we will configure it to print the telemetry it receives to a console, so we can find out what this Camelo service is all about. The collector needs a configuration file, we can use this one:

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

service:
  pipelines:
    traces:
      receivers: [ otlp ]
      exporters: [ nop ]
    metrics:
      receivers: [ otlp ]
      exporters: [ nop ]
    logs:
      receivers: [ otlp ]
      exporters: [ debug ]

exporters:
  debug:
    verbosity: detailed
  nop:

This instructs the collector to listen to port 4317 for otel/grpc data and port 4318 for otel/http. Both of these can receive logs, traces and metrics data. We've set up pipelines for all three kinds of telemetry data, we discard metrics and traces for now, and forward all logs to an exporter that prints it on the console.

We can use this collector configuration to see what kind of data we're able to pick up from Camelo, to verify that we've instrumented it correctly. The configuration reference is here, and the most important words in a logical order are:

receivers are used to retrieve data from a myriad of protocols
processors can augment or rewrite data in a pipeline (not used in this configuration)
exporters are used to put that data somewhere
pipelines connect receivers with exporters, optionally using processors to process the data between

It's convenient to run the OpenTelemetry collector in docker for local development, and the configuration file above will work with this docker-compose.yml file:

services:
  otel-collector:
    image: otel/opentelemetry-collector:latest
    container_name: otel-collector
    command: [ "--config=/etc/otel-collector-config.yaml" ]
    volumes:
      - ./otel-collector-debug-config.yaml:/etc/otel-collector-config.yaml
    ports:
      - "4317:4317"   # OTLP gRPC
      - "4318:4318"   # OTLP HTTP

Starting it with docker compose up will yield something like this:

[+] Running 1/1
 ✔ Container otel-collector  Created                                                                                                                                                                                                            0.0s
Attaching to otel-collector
otel-collector  | 2025-02-09T11:56:41.697Z	info	service@v0.119.0/service.go:186	Setting up own telemetry...
otel-collector  | 2025-02-09T11:56:41.697Z	info	builders/builders.go:26	Development component. May change in the future.	{"kind": "exporter", "data_type": "logs", "name": "debug"}
otel-collector  | 2025-02-09T11:56:41.697Z	info	service@v0.119.0/service.go:252	Starting otelcol...	{"Version": "0.119.0", "NumCPU": 12}
otel-collector  | 2025-02-09T11:56:41.697Z	info	extensions/extensions.go:39	Starting extensions...
otel-collector  | 2025-02-09T11:56:41.698Z	info	otlpreceiver@v0.119.0/otlp.go:112	Starting GRPC server	{"kind": "receiver", "name": "otlp", "data_type": "metrics", "endpoint": "0.0.0.0:4317"}
otel-collector  | 2025-02-09T11:56:41.698Z	info	otlpreceiver@v0.119.0/otlp.go:169	Starting HTTP server	{"kind": "receiver", "name": "otlp", "data_type": "metrics", "endpoint": "0.0.0.0:4318"}
otel-collector  | 2025-02-09T11:56:41.698Z	info	service@v0.119.0/service.go:275	Everything is ready. Begin running and processing data.

This is enough to start instrumenting Camelo, to figure out what kind of observability data we can get from it.

What even is a `camelo.jar`

Let's first try to run this using java, just to see what happens:

java -jar camelo.jar
17:17:30.129 [main] INFO CameloServer$ -- Server starting on port 8080
17:17:30.131 [main] INFO CameloServer$ -- Access on i.e. http://localhost:8080/
^C%

Apparently, it's some sort of web service and thankfully, it appears to have logging. That would be perfect for the OpenTelemetry collector we just configured! We've downloaded the java agent for OpenTelemetry from this page. The OpenTelemetry java agent is capable of instrumenting the byte code of our application before it starts, so the telemetry data can be made available to our collector. Let's try it!

java -javaagent:opentelemetry-javaagent.jar -jar camelo.jar
[otel.javaagent 2025-02-09 17:24:14:188 +0100] [main] INFO io.opentelemetry.javaagent.tooling.VersionLogger - opentelemetry-javaagent - version: 2.12.0
17:24:15.382 [main] INFO CameloServer$ -- Server starting on port 8080
17:24:15.392 [main] INFO CameloServer$ -- Access on i.e. http://localhost:8080/

It's working, we're already seeing the logs in the collector. They're very verbose, so here's an excerpt:

...
ScopeLogs #0
ScopeLogs SchemaURL:
InstrumentationScope CameloServer$
LogRecord #0
ObservedTimestamp: 2025-02-09 16:24:15.386833 +0000 UTC
Timestamp: 2025-02-09 16:24:15.382002 +0000 UTC
SeverityText: INFO
SeverityNumber: Info(9)
Body: Str(Server starting on port 8080)
Trace ID:
Span ID:
...

So at least we know we'll get something now. What else could we pick up with the OpenTelemetry agent? One way to check is to run this command:

java \
    -Dotel.javaagent.debug=true \
    -javaagent:opentelemetry-javaagent.jar \
    -jar camelo.jar \
    &> server-otel-startup.log

This creates a lot of output. Let's highlight a few things we've found:

grep -o 'Applying instrumentation: [^ ]*' server-otel-startup.log
Applying instrumentation: executors
Applying instrumentation: internal-lambda
Applying instrumentation: internal-reflection
Applying instrumentation: internal-class-loader
Applying instrumentation: internal-url-class-loader
Applying instrumentation: undertow
Applying instrumentation: logback-appender
Applying instrumentation: logback-mdc
Applying instrumentation: executors
Applying instrumentation: hikaricp
Applying instrumentation: jdbc
Applying instrumentation: java-util-logging
Applying instrumentation: internal-class-loader

It looks like we'll get some data from jdbc and hikaricp, so camelo probably uses a database. undertow is an http engine, and we've already seen that we're getting logs from something -- probably logback. Since these things are instrumented now, we can expect to pick up logs, traces and/or metrics from them. Cool!

What is this tracing thing we keep hearing about?

If we modify the OpenTelemetry collector configuration, we can take a look and try to pick up only traces instead, so we can figure out what they are. We'll do that by setting exporters in the logs: section in the configuration to [nop] and the traces: section to [debug], then take down the collector with docker compose down and run docker compose up again.

To receive any trace data, some event must start a trace. Hopefully, the undertow instrumentation will take care of doing that for us, so let's try making a request to the application by running:

curl http://localhost:8080
Givsgud!

Place orders at http://localhost:8080/order/:n (i.e. http://localhost:8080/order/3)
Check order inventory at http://localhost:8080/orders

Huh, what a strange message. Looks like maybe this is some sort of system for placing orders of some kind? But look, the collector picked up something!

otel-collector  | InstrumentationScope io.opentelemetry.undertow-1.4 2.12.0-alpha
otel-collector  | Span #0
otel-collector  |     Trace ID       : a7d3845d1c03bb597ac49df0d5efa035
otel-collector  |     Parent ID      :
otel-collector  |     ID             : 84a979344483557e
otel-collector  |     Name           : GET
otel-collector  |     Kind           : Server
otel-collector  |     Start time     : 2025-02-10 16:58:20.907362 +0000 UTC
otel-collector  |     End time       : 2025-02-10 16:58:20.90890325 +0000 UTC
otel-collector  |     Status code    : Unset
otel-collector  |     Status message :
otel-collector  | Attributes:
otel-collector  |      -> thread.id: Int(47)
otel-collector  |      -> http.request.method: Str(GET)
otel-collector  |      -> http.response.status_code: Int(200)
otel-collector  |      -> url.path: Str(/)
otel-collector  |      -> server.address: Str(localhost)
otel-collector  |      -> client.address: Str(127.0.0.1)
otel-collector  |      -> server.port: Int(8080)
otel-collector  |      -> network.peer.address: Str(127.0.0.1)
otel-collector  |      -> url.scheme: Str(http)
otel-collector  |      -> thread.name: Str(XNIO-1 I/O-5)
otel-collector  |      -> network.protocol.version: Str(1.1)
otel-collector  |      -> user_agent.original: Str(curl/8.7.1)
otel-collector  |      -> network.peer.port: Int(60690)
otel-collector  | 	{"kind": "exporter", "data_type": "traces", "name": "debug"}

Notice how there's a Trace ID now. We may see this Trace ID as the Parent ID of another Span within the same trace. For example, if an HTTP request is made to a different system. The OpenTelemetry agent will also instrument any http clients it can find, so that the Trace ID can propagate correctly to other systems that may also send trace information. This is super useful for some kinds of architectures (looking at you, microservices).

Traces are a lot like structured logs that allow nesting in a parent/child relationship. There's a lot of structured information associated with our trace, we can see where the client came from, the user agent string and thread ids.

This is a good time to note that the debug collector is very helpful if you want to use the collector to process incoming data, for example, to remove attributes that could contain personally identifiable information -- maybe the client.address in this case?

Checking out metrics

We've checked two of the three kinds of data that the OpenTelemetry agent can pick up for us. Let's check if there's any kind of metric data available by editing the open-telemetry configuration again, this time setting the traces exporters to [nop] and the metrics exporters to [debug]. Then taking down the collector with docker compose down and starting it again with docker compose up, let's wait and see if we get something...

...
otel-collector  | Metric #6
otel-collector  | Descriptor:
otel-collector  |      -> Name: db.client.connections.create_time
otel-collector  |      -> Description: The time it took to create a new connection.
otel-collector  |      -> Unit: ms
otel-collector  |      -> DataType: Histogram
otel-collector  |      -> AggregationTemporality: Cumulative
otel-collector  | HistogramDataPoints #0
otel-collector  | Data point attributes:
otel-collector  |      -> pool.name: Str(HikariPool-1)
otel-collector  | StartTimestamp: 2025-02-10 16:55:20.38978 +0000 UTC
otel-collector  | Timestamp: 2025-02-10 17:11:56.454185 +0000 UTC
otel-collector  | Count: 4
otel-collector  | Sum: 2.000000
otel-collector  | Min: 0.000000
otel-collector  | Max: 1.000000
...

Oh crap, it's actually making database connections to something. Let's hope it's not production! Maybe it's best to stop it before we break something...

Is auto-instrumentation without code supported for other platforms?

Yes! If you want to get started with experimenting, there's an interesting collection of links to check out at zero code instrumentation

At the time of writing, these runtimes have some sort of support for this:

Go
.NET
Python
PHP
Java
JavaScript (node.js)

Now what?

In the next post of this series, we'll expand our docker-compose setup with an observability solution that we can use to visualize logs, traces and metrics so we can figure out what exactly camelo.jar is actually doing. Stay tuned!