User Guide

This guide covers everything you need to manage your infrastructure with Odysseus. The platform is fully managed -- the control plane runs on Delta Telematics infrastructure. You interact with it through the dashboard.

Accessing Odysseus

The dashboard is available at odysseus.delta-telematics.dev in any modern browser. It provides visual management of nodes, deployments, containers, logs, incidents, CVE scan results, and the Athena AI chat panel. No installation required -- just sign in.

Odysseus dashboard showing deployment grid with container counts, CPU, memory metrics per service

Managing Nodes

Nodes are the servers where your containers run. The Odysseus agent runs on each node and communicates with the hosted control plane over an encrypted WireGuard VPN tunnel.

Enrolling a Node

From the dashboard, navigate to Nodes > Add Node to get a fresh install command with an enrollment token. Run it on the target server:

curl -fsSL https://odysseus.delta-telematics.dev/install | sudo bash -s -- --token YOUR_ENROLLMENT_TOKEN

The agent installs as a systemd service, establishes the VPN tunnel, and registers with the control plane. The node appears in the dashboard within seconds.

Node Health Monitoring

The dashboard displays real-time health for every node:

  • Status indicator: Green (healthy), yellow (degraded), red (unreachable).
  • Resource usage: CPU, memory, and disk utilization with historical graphs.
  • Container count: Number of running containers and their aggregate resource consumption.
  • Agent version: Current agent version with upgrade status.

Multi-Node Clusters

Enroll multiple nodes to form a cluster. Odysseus schedules containers across nodes based on available resources and placement constraints. All inter-node communication uses the encrypted WireGuard VPN mesh -- no additional network configuration required.

Agent Auto-Upgrades

The agent upgrades itself automatically when a new version is available. The upgrade process is zero-downtime: the new version starts, verifies health, and takes over. If the new version fails health checks, the agent rolls back to the previous version automatically. No manual intervention required.

Deployments

A deployment defines a containerized application: its image, replica count, resource limits, ports, environment variables, and health checks. Odysseus manages the full lifecycle from creation through scaling and updates.

Creating a Deployment

Click New Deployment in the dashboard and fill in the form -- name, image, replicas, ports, environment variables, and resource limits. Click Deploy.

Deployment Lifecycle

When you create or update a deployment, it moves through these states:

  1. Scheduling -- The control plane selects nodes based on available resources and placement rules.
  2. Starting -- The agent on each selected node pulls the image and starts containers.
  3. Running -- Containers are healthy and serving traffic. Health checks are monitored continuously.

If a container fails a health check or crashes, Odysseus automatically restarts it. If a node goes offline, containers are rescheduled to healthy nodes.

Updating a Deployment

Change the image version, resource limits, environment variables, or any other configuration from the deployment settings page. Odysseus performs a rolling update by default -- new containers start before old ones stop, maintaining availability throughout.

Rollback

Every update creates a new revision. Roll back to any previous revision instantly from the deployment's revision history in the dashboard.

Rollbacks are instant -- Odysseus redeploys the containers from the selected revision's configuration. No image rebuild required.

Or ask Athena: "Deploy redis with 3 replicas" / "Update my-api to image v2.1.0" / "Rollback my-api to previous version"

Deployment History

The dashboard shows a complete revision timeline for every deployment: who changed what, when, and the resulting container state. Filter by date range, author, or change type.

Autoscaling

Autoscaling adjusts the number of replicas for a deployment based on real-time resource metrics. Prometheus metrics drive the scaling decisions.

Autoscaling policy configuration with min/max replicas, target CPU, and cooldown settings

Configuration

Set autoscaling parameters per deployment:

  • Min replicas: The floor. Odysseus never scales below this count.
  • Max replicas: The ceiling. Odysseus never scales above this count.
  • CPU target: The target average CPU utilization percentage across all replicas.

Configure autoscaling in the dashboard under Deployment > Settings > Autoscaling.

How It Works

The autoscaler evaluates metrics every 15 seconds:

  1. Collects average CPU utilization from Prometheus for all running replicas.
  2. Compares to the configured target.
  3. If utilization exceeds the target, scales up. If below, scales down (with a cooldown period to prevent flapping).
  4. Respects min/max boundaries at all times.

Manual Override

You can manually scale a deployment at any time from the dashboard, even when autoscaling is enabled. A manual scale sets the replica count immediately. The autoscaler resumes normal operation after the cooldown period.

Or ask Athena: "Scale web-frontend to 8 replicas" / "Show autoscaling status for my-api"

Canary Deployments

Canary deployments let you roll out a new version to a small percentage of traffic, validate its behavior, and then promote it to full production or roll back with zero downtime.

Canary detail view with 75% stable / 25% canary traffic distribution and Promote, Rollback, Pause actions

Starting a Canary

Specify the new image and the initial traffic weight (percentage routed to the new version). In the dashboard, click Canary Release on any deployment to open the canary wizard.

Monitoring the Canary

The dashboard shows a real-time comparison between the stable and canary versions:

  • Error rate: Percentage of failed requests per version.
  • Latency: p50, p95, and p99 response times per version.
  • Resource usage: CPU and memory consumption per version.
  • Traffic split: Current weight distribution with a visual bar.

Adjusting Traffic Weight

Gradually increase the canary's traffic share as confidence grows using the weight slider in the canary dashboard.

Automatic Promotion and Rollback

Configure health thresholds to let Odysseus decide automatically:

  • Auto-promote: If error rate stays below threshold and latency stays within bounds for a configured duration, promote the canary to 100%.
  • Auto-rollback: If error rate exceeds the threshold at any point, roll back the canary immediately and restore 100% traffic to the stable version.

Configure these thresholds in the canary wizard when starting a new canary deployment.

Manual Controls

You always retain manual control, even when auto-rules are configured. Use the Promote and Rollback buttons in the canary dashboard to promote to production or roll back instantly.

Or ask Athena: "Start canary for my-api with 10% traffic to v2.0" / "Promote canary for my-api" / "Rollback canary for my-api"

Athena AI Assistant

Athena is an AI assistant built into the Odysseus dashboard. It understands your infrastructure and can query status, diagnose problems, and execute operations using natural language.

Athena AI assistant interface showing natural language container orchestration

Opening Athena

Click the Athena icon in the bottom-right corner of the dashboard to open the chat panel. It slides out on the right side of the screen. Athena opens with context about the page you are currently viewing -- if you are looking at a specific deployment, Athena already knows which deployment you mean.

What Athena Can Do

Athena has access to 61 operation tools organized across every platform capability:

Category Example Commands
Deployments "Scale web-api to 8 replicas", "Update web-api to v3.1", "Roll back web-api"
Containers "Show logs for web-api", "Why is redis restarting?", "List containers on node-2"
Canary "Start a canary for web-api with 10% traffic", "Promote the canary"
Cluster "Show node health", "Which node has the most free memory?"
Configuration "Set environment variable DB_HOST on web-api"
Backups "Create a backup of the postgres volume", "List recent backups"
Networks "Create a network called backend-net", "Attach web-api to backend-net"
Volumes "Create a 50GB volume", "Show volume usage"
Audit "Who scaled web-api yesterday?", "Show audit log for the last hour"
SRE "Show active incidents", "What caused the last outage?"

RBAC-Scoped Access

Athena respects your RBAC role. The tools available depend on your permissions:

  • Admin: Full access to all 61 tools, including destructive operations.
  • Operator: Operational tools (scale, restart, canary, backups) but no RBAC or tenant management.
  • Developer: Deploy, read logs, view status. No infrastructure changes.
  • Read-only: Query and view only. No modifications.

Safety Guardrails

Athena includes built-in safety features:

  • Confirmation prompts: Destructive actions (delete, stop, rollback) require explicit confirmation before execution.
  • Rate limiting: Prevents runaway automation from rapid-fire commands.
  • Secret redaction: Sensitive values are never displayed in chat output.
  • Audit trail: Every action Athena takes is recorded in the audit log, attributed to your user account.

CVE Scanner

The built-in CVE scanner checks container images for known vulnerabilities before and after deployment. It uses dual scanning engines for defense in depth.

Scanning Images

Scan an image on demand from the dashboard. Click Scan on any deployment or navigate to Security > CVE Scanner to scan an image by name.

Dual-Engine Scanning

Odysseus runs two independent scanning engines:

  • Trivy (primary) -- Comprehensive vulnerability database with OS package and application dependency scanning.
  • Grype (secondary) -- Cross-references findings for defense in depth and catches vulnerabilities one engine may miss.

Results from both engines are merged and deduplicated in the scan report.

Security Policies

Define policies that gate deployments based on scan results:

  • Block on critical: Prevent deployment of images with any critical-severity CVE.
  • Block on high: Prevent deployment of images with critical or high-severity CVEs.
  • Warn only: Allow deployment but flag vulnerabilities in the dashboard.

Policies are configured per organization under Settings > Security Policies.

Patch Recommendations

For each vulnerability, the scanner provides:

  • CVE identifier and severity (Critical, High, Medium, Low).
  • Affected package and installed version.
  • Fixed version (when available) with upgrade instructions.
  • Links to the full advisory.

Scheduled Scanning

Enable periodic scanning to catch newly disclosed vulnerabilities in already-deployed images. Configure the scan interval (e.g., daily, weekly) under Settings > Security Policies > Scheduled Scans. Results appear in the dashboard with notifications for new findings.

Or ask Athena: "Scan nginx:1.25 for vulnerabilities" / "Show latest scan results"

SRE Automation

Odysseus continuously monitors your workloads and automatically detects and responds to infrastructure incidents.

Automatic Incident Detection

The platform detects these conditions automatically:

  • Container crashes: Unexpected exits with non-zero exit codes.
  • OOM kills: Containers terminated by the kernel for exceeding memory limits.
  • Restart loops: Containers restarting repeatedly within a short window.
  • Health check failures: Containers failing their configured health endpoint.
  • High CPU/memory: Sustained resource utilization above configured thresholds.

Auto-Remediation

When an incident is detected, Odysseus can take automatic corrective action based on your remediation policies:

  • Restart: Restart the affected container.
  • Scale up: Add replicas to distribute load.
  • Scale down: Remove excess replicas when load subsides.
  • Update resources: Adjust CPU or memory limits to match actual usage.
  • Stop: Halt a container that is causing cascading failures.
  • Run diagnostics: Execute predefined diagnostic commands and capture output for analysis.

Incident Dashboard

The Incidents section in the dashboard provides:

  • Active incidents with severity level: Critical, High, Medium, Low.
  • Timeline of detection, analysis, and remediation actions taken.
  • Root cause summary when available.
  • Full incident history with search and filtering.

Configurable Policies

Customize detection thresholds and remediation actions per deployment under Deployment > Settings > SRE Policies. For example, you can set a restart-loop threshold of 5 restarts in 10 minutes, or a CPU alert threshold of 90% sustained for 5 minutes.

Or ask Athena: "Show active incidents" / "What's wrong with redis-cache?"

Secrets Management

Odysseus provides enterprise-grade secrets management with Vault integration. Secrets are stored encrypted and injected into containers securely.

Storing Secrets

In the dashboard, navigate to Secrets to create and manage secrets. You can store key-value pairs or upload files (e.g., certificates).

Secure Injection

Secrets are injected into containers via tmpfs (RAM-backed filesystem). This means:

  • Secrets are never written to disk on the host or inside the container.
  • Secrets are never exposed in environment variables, preventing accidental leaks in logs, crash dumps, or process listings.
  • Secrets are mounted as files at a path you configure (e.g., /run/secrets/database-password).

Automatic Rotation

Enable automatic rotation on any secret. When a secret is rotated:

  1. The new value is written to the Vault backend.
  2. Containers using the secret are restarted with the new value.
  3. Restarts are performed as rolling updates -- zero downtime.

Access Policies

Secrets follow the principle of least privilege. Each secret has an access policy defining which deployments can read it. A deployment can only access secrets explicitly granted to it -- no ambient credentials.

Or ask Athena: "List secrets for my-api"

RBAC

Role-Based Access Control governs who can do what on the platform. Every user is assigned one of four roles.

Roles

Role Permissions
Admin Full control: manage users, roles, billing, organization settings, all deployments, all nodes, all secrets. Delete operations and RBAC changes.
Operator Operational actions: scale, restart, canary, backups, incident management. Cannot manage users or organization settings.
Developer Deploy applications, read logs, view metrics, manage secrets assigned to their deployments. Cannot modify infrastructure or other teams' deployments.
Read-only View deployments, metrics, logs, and audit records. No modifications.

Authentication Methods

  • JWT tokens: Short-lived tokens issued at login. Used by the dashboard. Automatically refreshed.
  • mTLS: Mutual TLS for inter-component communication between the control plane and agents. Managed automatically -- no user configuration required.

Managing Users

Admins manage team members under Settings > Team Members. Invite users by email, assign a role, and revoke access at any time. All role changes are recorded in the audit log.

Volumes and Networks

Persistent Volumes

Create and manage persistent volumes for stateful workloads (databases, file storage, caches) from the dashboard under Volumes. Specify a name and size, then attach the volume to a deployment in the deployment configuration.

Volumes persist across container restarts and redeployments. Data is stored on the node where the volume was created.

Or ask Athena: "List volumes" / "Create a volume called my-data"

Networks

Create isolated Docker networks and attach containers to control communication between services. Manage networks from the dashboard under Networks.

Containers on the same network can communicate using their deployment name as a hostname. Containers on different networks are isolated from each other.

Or ask Athena: "List networks" / "Show network details for app-network"

Audit Logging

Every action on the Odysseus platform produces a tamper-evident audit record. Audit logging is always on and cannot be disabled.

Events timeline showing deployment updates, canary creation, container lifecycle events

What Is Recorded

Each audit entry captures:

  • Who: The user (or API key, or Athena on behalf of a user) that performed the action.
  • What: The operation performed (e.g., scale deployment, create secret, promote canary).
  • When: Timestamp with millisecond precision.
  • Outcome: Success or failure, with error details when applicable.
  • Context: The affected resource (deployment name, node ID, secret name, etc.).

Querying the Audit Log

Navigate to Audit Log in the dashboard for a searchable, filterable view. Filter by user, action type, resource, date range, or outcome.

Compliance Reporting

Audit logs support compliance requirements for:

  • SOC 2: Demonstrates access controls, change management, and monitoring.
  • GDPR: Tracks data access and modifications for data subject requests.

Export audit logs in JSON format for integration with your compliance tooling or SIEM system.

Or ask Athena: "Show audit log for today" / "Who scaled my-api yesterday?"
Tamper-evident: Audit records are append-only and integrity-protected. Records cannot be modified or deleted after creation, ensuring a trustworthy chain of evidence.