10/7/2024

Aurora Workspace

A developer console that unifies logs, traces, and build artifacts in one workspace.

Tagsplatformtoolingobservability

Problem space

Teams were juggling tickets, CI output, and telemetry across too many tabs. We needed a single console where code, traces, and notes could live together without losing momentum.

The existing stack forced context switches: terminals for logs, dashboards for metrics, and documents for runbooks. Each handoff added latency and made incident response feel like assembly by committee. We scoped the work to a unified surface that could keep fast feedback loops intact while still capturing durable artifacts.

Field notes

We audited three quarters of incident retrospectives and cataloged every tooling handoff. The highest friction came from triage hops: switching between observability, chat, and task trackers while a timeline was still forming. We wrote a checklist of what needed to be visible in a single frame before an incident commander could make a call.

Remedy

  • Built a split-pane workspace with timeline navigation, a persistent scratchpad, and keyboard-first command routing.
  • Modeled events as typed streams so telemetry can be replayed into the UI without rerunning jobs.
  • Added a timeline diff view that compares deploys across commits, tying build output to trace spikes.
  • Shipped a local-only sandbox mode so engineers can rehearse incident drills with synthetic data.

Data model

Every event is normalized to a single schema, then fanned out to the UI, export pipeline, and audit log. This kept the experience deterministic even when data arrived out of order from third-party integrations.

Implementation note

type WorkspaceEvent =
  | { type: 'build.completed'; commit: string; durationMs: number }
  | { type: 'trace.ready'; traceId: string; spanCount: number }
  | { type: 'note.saved'; author: string; sha: string }
const session = {
  id: crypto.randomUUID(),
  mode: 'live' as const,
  stream: new EventSource('/api/workspace/stream')
}

Rollout

We staged the launch by team and feature flag. The initial cohort focused on incident response, then we widened access to delivery managers once export artifacts were stable.

Outcome

The console reduced time-to-fix by 38% and became the default hub during incident response.

Post-launch telemetry showed a 52% drop in tab-switching during on-call shifts and a measurable increase in runbook compliance. The team now exports structured incident packets directly from the timeline, shortening handoffs between engineers and support.