- Blog /
- Observability Lessons From OpenAI

Writing code is moving from the good old IDE into the realm of autonomous AI agents. One example of this is OpenAI, which has been developing internally with 0 lines of manually written code. You can read about their workflow in their engineering blog: Harness engineering: leveraging Codex in an agent-first world.
For me, the main takeaway of OpenAI’s article is how AI has rewritten the constraints equation. An AI Agent, such as Codex, can run for hours without human supervision, generating code at a speed no developer or QA specialist can keep up with. Human supervision is now the bottleneck.
If humans cannot review code fast enough, we need to automate testing and QA as much as possible. And this is what a thought-out harness does.
In the context of AI agents, the harness is all the “scaffolding” that wraps around an LLM to make it functional, durable, and capable of autonomous, multi-step actions. What OpenAI did is augment their harness with the VictoriaMetrics Observability Stack, giving agents access to metrics, logs, and traces.

Logs, metrics, and traces are exposed to Codex via a local observability stack that’s ephemeral for any given worktree. […] Agents can query logs with LogsQL and metrics with PromQL. With this context available, prompts like “ensure service startup completes in under 800ms” or “no span in these four critical user journeys exceeds two seconds” become tractable. Source: https://openai.com/index/harness-engineering/
If we provide the observability infrastructure, we can record metrics, logs, and traces during testing, allowing AI agents to run benchmarks and iterate on improvements.
We don’t know how OpenAI has implemented observability for its Codex agents; that part isn’t covered in the article. We can, however, reproduce the setup for local development with Docker.
All you need is a Docker Compose file. We can take the docker-compose.yml that Alexander used in his post “AI Agents Observability with OpenTelemetry and the VictoriaMetrics Stack” as a starting point.
In this case, I just removed the volumes since we don’t need persistent storage; the telemetry is deleted when the containers stop.
# docker-compose.yml
services:
otel-collector:
image: ghcr.io/open-telemetry/opentelemetry-collector-releases/opentelemetry-collector-contrib:0.133.0
command: [ "--config=/etc/otel-collector-config.yml" ]
ports: [ "4317:4317", "4318:4318" ]
configs: [ { source: "otel-collector-config", target: "/etc/otel-collector-config.yml", mode: 0444 } ]
victoriametrics:
image: victoriametrics/victoria-metrics:v1.130.0
ports: [ "8428:8428" ]
command: [ "--storageDataPath=/storage", "--opentelemetry.usePrometheusNaming=true" ]
victorialogs:
image: victoriametrics/victoria-logs:v1.47.0
ports: [ "9428:9428" ]
command: [ "--storageDataPath=/vlogs" ]
victoriatraces:
image: victoriametrics/victoria-traces:v0.7.1
ports: [ "10428:10428" ]
command: [ "--storageDataPath=/vtraces", "--servicegraph.enableTask=true" ]
configs:
otel-collector-config:
content: |
receivers:
otlp:
protocols:
http:
endpoint: "otel-collector:4318"
cors:
allowed_origins: [ "http://*", "https://*" ]
exporters:
otlphttp/victoriametrics:
metrics_endpoint: "http://victoriametrics:8428/opentelemetry/v1/metrics"
tls:
insecure: true
otlphttp/victorialogs:
logs_endpoint: "http://victorialogs:9428/insert/opentelemetry/v1/logs"
tls:
insecure: true
otlphttp/victoriatraces:
traces_endpoint: "http://victoriatraces:10428/insert/opentelemetry/v1/traces"
tls:
insecure: true
service:
pipelines:
traces: { receivers: [ otlp ], exporters: [ otlphttp/victoriatraces ] }
metrics: { receivers: [ otlp ], exporters: [ otlphttp/victoriametrics ] }
logs: { receivers: [ otlp ], exporters: [ otlphttp/victorialogs ] }
Here, we’re using four components for observability. First, VictoriaMetrics, VictoriaLogs, and VictoriaTraces listen for the metrics, logs, and traces, respectively. Then we use OpenTelemetry to route each signal to the appropriate destination.
With VictoriaMetrics Observability Stack in place, the development loop turns into:
docker compose upcurl -sG 'http://localhost:8428/api/v1/query?query=app_latency_ms_sum'curl -sG 'http://localhost:9428/select/logsql/query?query=*'curl -sG 'http://localhost:10428/select/jaeger/api/traces?service=otel-example'docker compose down -vYou can delegate all this setup to your AI agent. For me, Codex dumped everything into a single JSON file and used it as context for the next iteration. It even generated screenshots for human review.

The beauty of this setup is that you don’t need to rip out the observability in your code once it ships. You can leave all the instrumentation in place, change the endpoint to your production observability stack, and now you have visibility into your deployed application.
Adding telemetry to the AI context opens new possibilities, to name a few:
If you’re interested in observability and AI, you will find these articles interesting:
Reduce observability costs with hybrid strategies: prioritize revenue-driving signals in SaaS, self-host high-volume telemetry. Cut bills 3-12x without losing visibility.
Q1 2026 brought incremental but important updates to VictoriaMetrics Anomaly Detection: UI improvements, AI assistance inside the UI, a public traces playground, new false-positive reduction controls, and continued resource optimizations.
VictoriaMetrics participated in KubeCon + CloudNativeCon Europe 2026 in Amsterdam. The team delivered multiple talks covering platform design, Kubernetes observability, and distributed tracing optimization. A real-world case study from Miro showcased a cost-efficient, AZ-aware observability architecture built with VictoriaMetrics. With a 15-person team on site, the booth saw strong interest from users tackling scaling, cost, and performance challenges. The company also hosted its first community after-party, “After Deploy,” co-organized with Varnish and Shipfox, extending discussions beyond the conference.
Q1 2026 brought VictoriaLogs GA, a hosted MCP Server, a brand new cost calculator, a major expansion of alerting rule presets with a new editor, infrastructure improvements, notifications via generic webhooks and a few things we are cooking.