GenAI Engineering Leader

Hire and build GenAI engineering teams, design team structures for GenAI, set engineering quality frameworks.

7 skill groups9 courses1063 goals~492 hrs

Verifiable skill graph

7 skill groups · each becomes a signed node on your graph.

Every lab you pass signs a W3C Verifiable Credential on your public skill graph. Completing the labs in each group below mints one node on that graph — the badge you walk away with is a cryptographic record of what you can ship, not a completion certificate.

Share the URL on your résumé or with a hiring manager. They click; they see the discipline, the labs you passed, and the verification signature. No honor system, no broker.

Tech Depth: Agents + Ops + Safety

Retained hands-on technical depth: agent architectures, production ops, eval methodology, safety failure modes. The technical credibility to lead a GenAI org (NOT a proxy for management).

Incident Management & Reliability

Incident response, on-call, reliability engineering, post-mortems, SLO/error-budget operation for LLM systems. Lab-provable.

Eval Harness Execution & CI Gates

The execution half of eval-driven development: build eval sets, wire CI gates, interpret pass/fail, catch regressions. (Authoring the org eval STRATEGY is judgment — reserved for the portfolio track.)

Quality Engineering for Non-Deterministic Systems

Quality-engineering tooling for non-deterministic systems: output-schema validation, LLM-judge gates, flake quarantine, prompt/version diffing, regression guards. The lab-able gates/tooling slice.

Cost & FinOps Instrumentation

The FinOps-engineering face of cost management: token/cost telemetry, budget alerts, caching/routing/model-tiering tradeoffs, capacity forecasting. (Vendor negotiation/selection reserved for the portfolio track.)

Hosted LLM API Integration

Provider SDK literacy: OpenAI/Anthropic/Gemini integration, multi-provider abstraction. Prerequisite.

Python for GenAI Engineering

Production-grade Python: async, typing, packaging, tooling. Prerequisite.

What you'll ship in production

Core responsibilities this discipline prepares you for.

1
Hire and build GenAI engineering teams
- Define GenAI-specific hiring criteria and design technical interviews for LLM and agent engineering roles
- Build skill assessment frameworks and team composition strategies balancing generalist and specialist profiles
- Write job descriptions, design interview rubrics, and evaluate candidates against GenAI competency matrices
2
Define engineering processes
for GenAI development — eval-driven workflows
- Design GenAI-specific sprint planning with eval-driven development as the core feedback loop
- Define evaluation metrics before writing code and measure GenAI team velocity with non-deterministic outputs
- Build team workflows integrating Langfuse for evaluation tracking and Grafana for velocity metrics
3
Manage quality and team performance
for GenAI outputs
- Define GenAI quality metrics and SLA management frameworks for LLM system reliability
- Build team performance dashboards using Grafana with latency, quality, and throughput indicators
- Construct performance dashboards and define quality standards for GenAI engineering deliverables
4
Understand the technical stack
deeply enough to unblock teams
- Learn LLM fundamentals, LangGraph agent engineering patterns, and LiteLLM gateway operations
- Monitor production systems with Langfuse and Prometheus to review PRs and debug incidents
- Gain sufficient depth to make architecture calls, review designs, and unblock teams on technical decisions
5
Operate and budget for GenAI infrastructure
— FinOps and capacity
- Build LLM cost attribution dashboards with capacity planning and budget forecasting models
- Manage vendor relationships and optimize spend allocation across multiple LLM providers
- Construct FinOps dashboards, set team-level token budgets, and produce monthly cost reports for leadership
6
Design organization structure
for GenAI engineering teams
- Apply GenAI team topology patterns including on-call rotation design and knowledge sharing practices
- Evaluate embed-vs-centralize tradeoffs for GenAI engineering functions across the organization
- Design org structures for different company sizes with clear ownership boundaries and escalation paths
7
Drive technical strategy
— evaluate new tools and plan migrations
- Apply technology evaluation frameworks with structured criteria for GenAI tool and platform selection
- Build migration planning methodology and strategic roadmaps for technology transitions
- Evaluate new tools against defined criteria, build migration plans, and present strategy to leadership
8
Ensure responsible AI practices
across your team
- Design governance policies and safety review processes for GenAI system development and deployment
- Build compliance workflows and team-level responsible AI standards with enforcement mechanisms
- Create governance policies and integrate safety review checkpoints into the development lifecycle

Curriculum

9 courses · each builds on previous goals

15 goals unlocked for preview — click to read. Locked goals need a subscription.

CourseGoalsWeight

Python Essentials for Agent Builders623.1%

Your Dev Environment4

Navigate filesystem with terminal
Manage files from command line
Set up VS Code
Configure terminal in VS Code

Python, Git & Package Management6

Install and verify Python
Write hello world script
Use Python REPL
Initialize Git repository
Track changes with Git
Install packages with pip

Variables & Basic Types5

Create and name variables
Work with strings
Work with numbers
Work with booleans
Format with f-strings

Control Flow4

Make decisions with if/elif/else
Iterate with for loops
Repeat with while loops
Control loop execution

Functions5

Define and call functions
Use parameters
Return values
Document with docstrings
Understand scope

Modules & Imports4

Import standard library
Create custom modules
Understand Python path
Create packages

Lists & Tuples5

Create and access lists
Modify lists
Slice lists
Use list comprehensions
Work with tuples

Dictionaries & Sets5

Create and access dicts
Modify dictionaries
Iterate over dicts
Work with nested dicts
Use sets

Classes & Dataclasses5

Understand class basics
Create dataclasses
Add methods
Use default values
Basic inheritance

Files, JSON & Error Handling5

Read and write files
Work with JSON
Use pathlib
Handle exceptions
Create custom exceptions

Basic Testing4

Use assert statements
Create test functions
Run pytest
Test classes

Environment Variables & Configuration5

Understand environment variables
Use .env files
Load with python-dotenv
Handle missing variables
Organize configuration

Decorators & Context Managers5

Understand decorators
Write simple decorators
Use context managers
Write context managers
Combine patterns

LLM Foundations for Agent Builders654.5%

Generators & Iterators5

Understand iteration
Create generators
Use generator expressions
Build data pipelines
Use itertools

Async Programming Basics5

Understand async concepts
Write async functions
Run concurrent operations
Use async context managers
Handle async exceptions

Type Hints & Pydantic5

Add basic type hints
Use typing generics
Create Pydantic models
Validate API data
Configure Pydantic

Data Pipelines & Transformations5

Build functional pipelines
Work with tabular data
Transform data shapes
Process LLM data formats
Optimize for performance

HTTP Clients & httpx5

Make GET requests
Make POST requests
Use async httpx
Handle errors
Use sessions

Your First LLM Call5

Set up credentials
Install Gemini SDK
Make first API call
Parse response
Handle API errors

Sampling Parameters & Output Control5

Understand temperature
Use top-p sampling
Implement determinism
Control output length
Use structured output

Multi-Provider & Prompt Engineering5

Build provider abstraction
Structure conversations
Use few-shot prompting
Implement chain-of-thought
Build prompt templates

Function Calling Fundamentals5

Understand tool use concept
Define tool schemas
Make function calls
Handle tool responses
Compare provider patterns

Embeddings & Semantic Search5

Understand embeddings
Generate embeddings
Calculate similarity
Build simple search
Compare embedding models

RAG Fundamentals5

Understand RAG pattern
Chunk documents
Build retrieval pipeline
Compose RAG prompts
Evaluate RAG quality

Cost Awareness & Token Economics5

Understand pricing models
Calculate request costs
Compare provider costs
Identify cost drivers
Basic cost optimization

Retry Patterns with Tenacity5

Understand retry need
Use tenacity basics
Implement exponential backoff
Handle specific exceptions
Combine with async

Kubernetes Essentials for GenAI724.2%

Containerizing LLM Applications6

Write a Python app that calls the Gemini API and returns structured responses
Write a Dockerfile and build a container image for the LLM app
Run the containerized LLM app with environment-based configuration
Use Docker Compose to run the LLM app with supporting services
Tag images with semantic versions and push to a container registry
Debug containers with exec, logs, and inspect

Your Kubernetes Cluster & First LLM Pod6

Understand K8s architecture and connect to your vCluster
Deploy the LLM app as your first Kubernetes pod
Organize workloads with namespaces
Use labels and selectors to organize and query resources
Understand pod lifecycle and restart policies
Master kubectl debugging: exec, logs, describe, port-forward

Services & the LLM Chat Backend6

Create a ClusterIP service to expose the LLM chat API internally
Deploy a multi-tier LLM chat application
Compare service types: ClusterIP, NodePort, LoadBalancer
Master DNS-based service discovery in Kubernetes
Understand endpoints and traffic routing
Debug service connectivity problems

Deployments, Scaling & Rolling Updates6

Create a Deployment for the LLM chat API
Scale LLM app replicas to handle concurrent requests
Perform a rolling update with zero downtime
Roll back a broken deployment
Compare deployment strategies: RollingUpdate vs Recreate
Manage deployment lifecycle with kubectl rollout

ConfigMaps & Secrets for LLM Apps6

Create ConfigMaps for LLM app settings
Mount ConfigMaps as files for complex configuration
Store LLM proxy credentials securely in Secrets
Manage per-environment configuration for dev, staging, and prod
Handle configuration updates and rolling restarts
Debug configuration issues in LLM app pods

Persistent Storage & StatefulSets6

Create PersistentVolumeClaims for durable storage
Deploy PostgreSQL as a StatefulSet
Connect the LLM chat API to PostgreSQL for conversation persistence
Deploy Redis as a StatefulSet for LLM response caching
Understand StatefulSet scaling and ordering guarantees
Manage PVC lifecycle: expansion, snapshots, and cleanup

Multi-Container Pods: Sidecars & Init Containers6

Add an LLM proxy sidecar to the chat API pod
Use init containers for database setup and config loading
Share data between containers via emptyDir volumes
Implement the ambassador pattern for multi-model LLM routing
Add a logging and metrics sidecar to the LLM app
Debug multi-container pods

Resource Management & Cost Optimization6

Set resource requests and limits for the LLM chat API
Understand QoS classes and their impact on eviction
Enforce resource defaults with LimitRanges
Cap namespace resource usage with ResourceQuotas
Right-size LLM app containers based on actual usage
Diagnose OOMKilled and CPU throttling issues

Packaging with Helm & Kustomize6

Create a Helm chart for the LLM chat application
Parameterize the chart with values.yaml for each environment
Manage Helm release lifecycle: install, upgrade, rollback
Use Kustomize bases and overlays for the LLM app
Use Kustomize patches and generators
Compare Helm vs Kustomize for different deployment scenarios

Networking, Ingress & TLS6

Expose the LLM chat API via an Ingress resource
Add TLS to the Ingress for HTTPS access
Isolate services with NetworkPolicies
Configure Ingress annotations for production traffic
Understand K8s networking: pod IPs, CNI, and service routing
Debug networking and connectivity issues

Health Probes, Autoscaling & Self-Healing6

Add liveness and readiness probes to the LLM chat API
Configure startup probes for containers with slow initialization
Scale the chat API automatically with HPA based on CPU
Create PodDisruptionBudgets for safe maintenance
Implement health check patterns for LLM-dependent services
Combine autoscaling, probes, and PDBs for a resilient LLM service

RBAC, Security & K8s Troubleshooting6

Create RBAC roles for the LLM chat application
Enforce Pod Security Standards
Apply SecurityContext for defense in depth
Debug CrashLoopBackOff and OOMKilled failures
Use kubectl debug and ephemeral containers for live debugging
Troubleshoot LLM-specific issues: timeouts, proxy errors, stale connections

Web APIs for GenAI Engineers603.6%

FastAPI Fundamentals6

Create a FastAPI application with path operations
Define Pydantic request and response models
Implement dependency injection for shared resources
Build CRUD endpoints with proper HTTP semantics
Configure OpenAPI documentation with examples
Handle errors with custom exception handlers

Async Python for APIs6

Convert sync endpoints to async with proper await patterns
Implement background tasks for non-blocking operations
Execute concurrent API calls with asyncio.gather
Manage application lifecycle with lifespan handlers
Build async generators for streaming responses
Control concurrency with semaphores and throttling

Database Integration6

Configure SQLAlchemy async engine with connection pooling
Define ORM models with relationships and constraints
Create and manage database migrations with Alembic
Implement repository pattern for data access
Build transactional endpoints with session lifecycle
Implement filtering, sorting, and full-text search

Authentication & Authorization6

Implement user registration with password hashing
Build OAuth2 password flow with JWT tokens
Implement API key authentication for services
Enforce role-based access control with permissions
Build token refresh and revocation
Compose multiple auth strategies into dependencies

Real-time Streaming6

Build SSE endpoint for streaming LLM responses
Implement WebSocket endpoint with connection lifecycle
Build WebSocket connection manager for broadcasting
Handle backpressure and slow clients
Implement heartbeat and automatic reconnection
Build real-time notification system with Redis pub/sub

Resilience Patterns6

Implement rate limiting with Redis sliding window
Build circuit breaker for LLM provider calls
Configure retry logic with tenacity
Isolate critical paths with bulkhead semaphores
Build fallback responses for degraded mode
Combine resilience patterns into middleware stack

API Gateway & Routing6

Build reverse proxy with path-based routing
Implement load balancing across backend instances
Transform requests and responses through the gateway
Aggregate responses from multiple backends
Implement service discovery with health checking
Build gateway authentication and request enrichment

Testing & Documentation6

Write async endpoint tests with httpx.AsyncClient
Build database fixtures with transaction rollback
Mock external services for deterministic tests
Implement contract tests for API consumers
Measure test coverage and set quality gates
Generate rich OpenAPI documentation with examples

API Versioning & Evolution6

Implement URL-based API versioning with routers
Build header-based version negotiation
Manage deprecation with Sunset and Warning headers
Build request and response adapters for version translation
Detect breaking changes automatically
Generate API changelogs from schema diffs

Deployment & Observability6

Build production Docker images with multi-stage builds
Deploy to Kubernetes with health check probes
Instrument endpoints with Prometheus metrics
Implement distributed tracing with OpenTelemetry
Build structured logging with correlation IDs
Create Grafana dashboards for API monitoring

Agent Hosted Models29424.5%

The LLM Client7

OpenAI client setup
Anthropic client setup
Google Gemini client setup
Build a unified LLM client interface
Error handling and provider fallback
Async LLM client patterns
Practical use cases — security, parameters, observability

Token Economics7

Understand tokenization
Count tokens across providers
Cost forecasting and budgeting
Track LLM API usage in production
Implement budget controls
Optimize tokens
Advanced context engineering

Prompt Caching4

Implement Anthropic cache_control
Leverage OpenAI automatic caching
Design cache-friendly prompt architectures
Build cache monitoring systems

The Function Caller7

OpenAI function schemas
Anthropic function schemas
Gemini function schemas
Handle tool call responses
Execute tools safely with Pydantic validation
Handle parallel tool calls
Framework integration with LangGraph

The Tool Definer7

Write clear tool descriptions for LLMs
Define parameter schemas
Use Pydantic for tool schemas
Implement tool decorators
Handle complex parameter types
Validate tool inputs at runtime
Framework tool patterns — LangGraph, CrewAI, OpenAI, Gemini, Anthropic

The Raw Agent Loop7

The core agent while-loop
Manage context as a mutable list
Handle stop sequences
Track iteration limits
Tool execution in the loop
Build a conversation state tracker
Build with LangGraph StateGraph

The Prompt Engineer (Dynamic)6

Master Jinja2 templating for prompts
Implement dynamic few-shot example selection
Enforce Chain-of-Thought reasoning
Structure system prompts with a builder pattern
Inject dynamic context into prompts safely
Build prompt versioning and A/B testing

The ReAct Pattern (Manual)6

Build the Thought-Action generator
Tool execution and observation injection
Complete ReAct agent implementation
Advanced ReAct patterns — validation, retry, confidence
Optimize ReAct performance
Common ReAct pitfalls and solutions

The Planner Pattern7

Plan generation
Step execution
Dynamic replanning
Hierarchical planning
Plan optimization
Monitoring and observability
Practical considerations — strategy selection

The Pydantic Tool7

Pydantic fundamentals for tool definitions
Generate JSON Schema from Pydantic models
Input validation with custom validators
Build a Pydantic tool library
Advanced Pydantic patterns
Integrate Pydantic tools with agent frameworks
Common pitfalls and solutions

The Safe Executor (Sandboxing)5

Understand code execution risks
Static code analysis
Sandboxed execution
Apply resource limits
Build a complete safe executor

The Web Navigator5

Web navigation fundamentals
Web navigation tools — locating elements and forms
Browser automation with Playwright
Session management
Complete web navigator system

The MCP Protocol (Basics)4

JSON-RPC 2.0 message format and handler
Transport mechanisms — stdio and HTTP/SSE
Protocol lifecycle — initialization, runtime, shutdown
Capability negotiation

The MCP Server6

Create an MCP server with lifecycle management
Define MCP tools
Implement MCP resources
Create prompt templates
Error handling in MCP servers
Composable MCP server architecture

The MCP Client6

MCP client architecture and stdio transport
Discover available tools and translate schemas
Proxy tool invocation
Fetch and use MCP resources
Manage MCP server lifecycle
Build multi-server MCP clients

The Tool Router5

Tool routing architecture and implementation
Namespace-based routing
Capability-based routing
Fallback chains
Routing performance optimization

Short-Term Memory8

Sliding window memory
Token-aware memory management
Message summarization strategies
Memory persistence layers
Memory retrieval optimization
Integrate memory with agents
Memory performance considerations
Non-functional requirements (privacy + safety)

Long-Term Memory (RAG)6

Document chunking strategies
Embedding pipelines
Vector database integration
Hybrid search implementation
Retrieval optimization
RAG response generation

Agentic RAG Patterns5

Self-reflective RAG
Multi-hop retrieval
Query routing
Adaptive retrieval
Retrieval feedback loops

Semantic Memory6

Knowledge extraction pipelines
Entity and relationship extraction
Knowledge graph construction
Memory consolidation
Integrate semantic memory with agents
Build semantic memory with LangGraph

Context Optimizer6

Context economics
Dynamic context prioritization
Context compression techniques
Prompt optimization
Context utilization metrics
Complete context optimizer

The State Graph5

StateGraph fundamentals — config and lifecycle
Design state schemas with TypedDict
Add nodes to StateGraph
State initialization patterns
Tracing, debugging, validation

The Conditional Edge5

Understand conditional edges
Design routing functions
Fan-out and fan-in patterns
Handle unknown routes and errors
Multi-stage routing

The Checkpointer (Time Travel)4

Resumable workflows
Inspect, replay, and time-travel
Retention, large state, and performance
Thread management — IDs and namespaces

Human-in-the-Loop6

LangGraph interrupt patterns
Approval workflow patterns
Interactive agent conversations
Feedback integration
State management for HITL
Practical use cases — escalation and analytics

The Streaming Agent6

Streaming modes in LangGraph
Token streaming from LLMs
Custom events with `astream_events`
Build streaming APIs
Error handling in streams
Backpressure and flow control

The Subgraph (Composition)7

Subgraph fundamentals — compile + test in isolation
State schema mapping
Subgraph checkpointers + namespace isolation
Compose subgraphs into a parent
Catch subgraph exceptions and recover
Define subgraph interfaces and build a registry
Build a multi-agent orchestrator

The Supervisor Pattern7

Design supervisor architectures
Worker agent specialization
Build the complete supervisor graph
Manage inter-agent communication
Handle failures and edge cases
Implement task aggregation
Build the supervisor pattern with CrewAI

The Hierarchical Pattern4

Design hierarchical agent architectures
Implement team-lead agents
Build cross-team coordination
Build the complete hierarchical graph

The Reflector Pattern (Critique)6

Design reflection architectures
Implement critic agents
Build the evaluation and convergence system
Build the complete reflection graph
Handle reflection edge cases
Practical use cases for reflection

Input Guardrails6

Design layered guardrail architectures
Format and schema validation
Build content filtering systems
Create injection / jailbreak detection
Implement policy-based guardrails
Assemble the complete guardrail system

Output Guardrails6

Design output validation architectures
Implement factual validation (hallucination detection)
Build content safety filters
Create PII redaction
Implement policy compliance
Assemble the complete output guardrail system

Prompt Injection Defense7

Identify injection vulnerabilities
Detect direct injections
Detect indirect injections
Implement defense layers
Build red-team suites
Implement canary tokens
LangGraph injection defense pipeline

Evaluations (Evals)6

Design evaluation frameworks
Implement automated evaluation pipelines
Create task-specific metrics
Human evaluation protocols
Regression testing
Set baselines and track progress

Agent Benchmarking6

Understand the GAIA benchmark
Implement ToolBench evaluation
Use AgentBench
Design domain-specific benchmarks
Cross-model performance comparison
Build benchmark dashboards

Tracing & Observability6

Understand distributed tracing
Add tags and metadata
Context propagation
Build feedback collection
Integrate with Langfuse
Trace visualization

Tool Use Debugging6

Tool selection failures and solutions
Argument validation systems
Build tool use dashboards and visualization
Schema mismatch detection
Tool call replay
Interactive tool debugger

Serving Agents (FastAPI)7

Async endpoints, request validation, error handling
Server-Sent Events (SSE) streaming
Background tasks
Design request and response schemas
Authentication — API keys, middleware, errors
OpenAPI metadata and documentation
FastAPI + LangGraph + uvicorn deployment

Podman & Containerization for K8s5

Build optimized container images
Container health checks
Advanced image optimization
Security best practices
Build multi-container agent pods

Production Databases (Postgres/Redis)6

Async PostgreSQL configuration
Connection pool management
Redis caching for LLM responses
Database migrations for agent stacks
Backup and disaster recovery
Monitoring database health

Scaling & Load Balancing7

Stateless service design
Session externalization
Load balancing algorithms
Scaling metrics for LLM workloads
Horizontal Pod Autoscaler configuration
Load testing your scaling design
Rate limiting at the load balancer

Multi-Tenant Agents6

Tenant context middleware
Database-level tenant isolation
Tenant-specific rate limiting and quotas
Per-tenant configuration templates
Usage metering for billing and SLA
Enforcing tenant data segregation at the API

Kubernetes (K8s) Basics8

Creating Kubernetes Deployments
Resource management for LLM workloads
Kubernetes Services
ConfigMaps and Secrets
Rolling updates and CronJobs
Cluster planning and scheduling
Deployment planning synthesis
NetworkPolicies and prompt-injection defense at the edge

CI/CD for Agents7

GitHub Actions for agent testing
Agent evaluation scripts in CI
Kubernetes deployment pipeline
GitOps deployment pattern
Quality gates and pipeline optimization
Rollback mechanisms
Pipeline observability and notifications

Monitoring & Alerting7

Prometheus metrics for agents
Grafana dashboards
Alerting configuration
SLOs and SLIs
Runbook creation for agent incidents
Tracking business KPIs for agent platforms
Agent-specific monitoring patterns (RED, USE, golden signals)

Model Routing & Fallbacks7

Cost-optimized routing
Latency-optimized routing
Building the resilient LLM client
Provider health checking
Cost tracking and optimization
Capability-based routing
Multi-model routing inside LangGraph

Long-Running Agents7

Cross-session persistence
Checkpoint serialization
Workflow resumption
Task queue integration with Celery
Progress tracking and SSE streaming
Timeout handling and graceful shutdown
Long-running agents with CrewAI — synthesis

Production Architecture Patterns7

System components, interfaces, and integration points
Cost modeling and projection
Production checklists and audit
Architecture Decision Records (ADRs)
Disaster recovery planning
System component diagrams
Architecture pattern evaluation — synthesis

Enterprise LLM Customization425.5%

LiteLLM Gateway6

Deploy LiteLLM proxy
Implement failover and circuit breakers
Load test and capacity plan
Build testing and validation for litellm gateway
Optimize performance for litellm gateway
Build operational runbook for litellm gateway

Guardrails Pipeline6

Build guardrails pipeline
Test guardrails under adversarial input
Benchmark guardrail performance
Build security testing for guardrails pipeline
Optimize throughput for guardrails pipeline
Build compliance reporting for guardrails pipeline

Prompt Injection Red Team6

Run OWASP LLM Top 10 attacks
Build security assessment report
Build automated security regression pipeline
Build security testing for prompt injection red team
Optimize throughput for prompt injection red team
Build compliance reporting for prompt injection red team

Data Classification Router6

Build DataClassifier
Integrate DLP controls
Build data lineage tracking
Build security testing for data classification router
Optimize throughput for data classification router
Build compliance reporting for data classification router

Compliance Audit Trail6

Build audit logging system
Build compliance reporting
Build audit data retention and archival
Build security testing for compliance audit trail
Optimize throughput for compliance audit trail
Build compliance reporting for compliance audit trail

Compliance Test Suite6

Build regulatory compliance tests
Build bias and fairness tests
Build continuous compliance monitoring
Build security testing for compliance test suite
Optimize throughput for compliance test suite
Build compliance reporting for compliance test suite

Multi-Tenant AI Platform on K8s6

Build TenantManager
Build PlatformDeployment
Build PlatformDashboard
Build testing and validation for multi-tenant ai platform on k8s
Optimize scalability for multi-tenant ai platform on k8s
Build operational runbook for multi-tenant ai platform on k8s

GenAI Operations27643.1%

GenAI Failure Catalog6

Classify GenAI failures into five categories: provider, quality, cost, security, and data failures
Instrument a multi-provider LLM gateway to detect each failure category
Build typed failure event models that feed into alerting and incident management
Measure baseline failure rates across OpenAI, Anthropic, and Google providers
Implement failure prediction from leading indicators
Create failure impact assessment system

GenAI SLI Framework6

Define latency SLIs: TTFT, tokens-per-second, end-to-end response time across providers
Define quality SLIs: faithfulness, hallucination rate, format compliance, retrieval precision
Define cost SLIs: cost-per-request, cost-per-token, cache hit rate, budget burn rate
Instrument all SLIs with Prometheus metrics and Langfuse traces
Build SLI aggregation and reporting API
Implement SLI validation and testing

GenAI SLO Engine6

Define SLO targets for latency, quality, and cost SLIs with business-justified thresholds
Compute error budgets and track consumption over rolling windows
Build multi-window burn-rate alerts that detect SLO violations before budget exhaustion
Create SLO status dashboards showing budget remaining and projected exhaustion
Implement SLO negotiation framework
Build cross-SLO dependency tracking

GenAI Toil Analyzer6

Identify GenAI-specific toil patterns: manual model updates, prompt tweaking, provider failover, cache invalidation
Measure toil using time-tracking instrumentation and classify by automation potential
Automate the highest-impact toil items with operational scripts and scheduled workflows
Track toil reduction over time with team-level reporting
Build automation testing and validation
Create toil reduction roadmap generator

GenAI Launch Readiness6

Define operational readiness criteria specific to GenAI services
Build automated readiness checks that verify infrastructure, monitoring, and runbook completeness
Implement launch gate enforcement that blocks deployment without readiness sign-off
Create readiness dashboards and historical tracking for continuous improvement
Implement progressive readiness rollout
Create readiness automation toolkit

GenAI Runbook Engine6

Write structured runbooks for the top 5 GenAI failure modes identified in Ch 1
Build executable runbook steps that link to operational APIs and scripts
Implement runbook testing that validates each step works as documented
Track runbook usage and effectiveness metrics
Build runbook recommendation engine
Implement cross-runbook orchestration

GenAI Helm Charts6

Package LiteLLM, Langfuse, Prometheus, and Grafana as Helm Charts with Production-Ready Defaults
Create values files with environment-specific overrides for dev, staging, and production
Implement Helm chart testing and linting with ct and helm unittest
Build chart dependency management for the full GenAI stack
Implement Helm chart documentation generation
Build chart rollback and recovery procedures

Dev/Staging/Prod Environments6

Create K8s namespaces for dev/staging/prod with resource quotas and LimitRanges
Build Kustomize overlays for environment-specific configuration
Implement environment parity verification that detects configuration drift
Deploy the full GenAI stack to all three environments
Implement environment configuration validation
Create environment lifecycle automation

GitOps Control Plane6

Deploy Argo CD to vCluster with Application and ApplicationSet CRDs
Implement GitOps sync for all GenAI components across dev/staging/prod
Configure drift detection and auto-remediation for configuration consistency
Build Argo CD RBAC for team-based access control
Implement performance optimization for gitops with argo cd
Build operational documentation for gitops with argo cd

GenAI CI/CD Pipelines6

Build Argo Workflow templates for GenAI artifact CI/CD
Implement pipeline stages: lint, validate, eval, promote
Create artifact-specific pipelines for prompts, models, and RAG configs
Monitor pipeline health with observability metrics
Implement performance optimization for ci/cd pipelines for genai artifacts
Build operational documentation for ci/cd pipelines for genai artifacts

GenAI Secret Manager6

Deploy External Secrets Operator with GCP Secret Manager provider
Implement perenvironment provider key isolation with
Build secret sync monitoring and alerting for rotation compliance
Create emergency secret rotation procedures with zero-downtime key swap
Implement performance optimization for secret management for genai
Build operational documentation for secret management for genai

Self-Service Environment API6

Build API for on-demand feature environment provisioning with full GenAI stack
Implement environment lifecycle management with TTL and auto-cleanup
Create environment templates with pre-configured GenAI stack components
Monitor environment usage and resource consumption across all feature environments
Implement performance optimization for developer self-service environments
Build operational documentation for developer self-service environments

Pipeline Health Dashboard6

Instrument Argo Workflow metrics for comprehensive pipeline observability
Build promotion velocity tracking across dev, staging, and production environments
Implement pipeline failure analysis with root cause categorization
Create pipeline health dashboards with bottleneck detection and DORA metrics
Implement performance optimization for pipeline observability
Build operational documentation for pipeline observability

Prompt Registry6

Build Immutable Prompt Storage with Content-Addressable Versioning
Implement Promotion Gates Between Dev, Staging, and Production
Create Prompt Diff and Review Workflow
Track Prompt Lineage Across All Deployments
Implement performance optimization for immutable prompt registry
Build operational documentation for immutable prompt registry

Model Lifecycle Manager6

Deploy MLflow on vCluster for Model Registry and Experiment Tracking
Implement Model Versioning with Stage Promotion Gates
Build Model Deprecation Workflow with Consumer Notification
Track Experiments with Cost and Quality Metrics for Data-Driven Model Selection
Implement performance optimization for model registry and lifecycle
Build operational documentation for model registry and lifecycle

Progressive Delivery Engine6

Deploy Argo Rollouts with Canary Strategy for LiteLLM Model Config Changes
Implement Shadow Deployments for Risk-Free Model Comparison in Production
Build Automated Rollback on Quality Regression During Canary Progression
Monitor Canary vs Baseline Quality, Latency, and Cost Metrics in Real-Time
Implement performance optimization for canary and shadow deployments
Build operational documentation for canary and shadow deployments

Eval Gate Pipeline6

Build Promptfoo Eval Suites for Pre-Promotion Quality Verification
Implement Eval Gates in Argo Workflows That Block Promotion on Failure
Create Golden Test Sets for Regression Detection
Track Eval Pass Rates and Gate Effectiveness Metrics
Implement performance optimization for automated eval gates
Build operational documentation for automated eval gates

AI Feature Flag System6

Build a Feature Flag Service for AI Configuration with Redis-Backed Storage
Implement Percentage-Based Rollouts for Model and Prompt Changes
Create Kill Switches for Rapid AI Behavior Reversion During Incidents
Track Feature Flag Impact on Quality and Cost Metrics
Implement performance optimization for feature flags for ai behaviors
Build operational documentation for feature flags for ai behaviors

RAG Release Pipeline6

Build a Release Pipeline for Embedding Model Swaps with Dual-Index Strategy
Implement Chunking Strategy Changes with A/B Comparison Using RAGAS Metrics
Create an Index Migration Workflow with Zero-Downtime Cutover
Validate RAG Releases with Retrieval Quality Metrics Before and After
Implement performance optimization for rag pipeline release management
Build operational documentation for rag pipeline release management

Distributed LLM Tracer6

Deploy an OpenTelemetry Collector with Langfuse Exporter
Instrument Multi-Provider Request Chains with Parent-Child Trace Spans
Build Trace Correlation Across RAG Retrieval, LLM Inference, and Guardrail Processing
Create Trace-Based Latency Analysis Dashboards with Drill-Down Capability
Implement performance optimization for end-to-end llm tracing
Build operational documentation for end-to-end llm tracing

Quality Drift Detector6

Implement Output Quality Drift Detection with Rolling Window Comparison
Build Embedding Drift Detection Using Distribution Divergence Metrics
Detect Retrieval Relevance Degradation with RAGAS-Based Monitoring
Configure Automated Alerts for Each Drift Type with Severity Classification
Implement performance optimization for quality drift detection
Build operational documentation for quality drift detection

GenAI Alert System6

Configure Alertmanager with GenAI-Specific Routing Rules and Severity Classification
Deploy Grafana OnCall for On-Call Schedules, Escalation Policies, and Incident Lifecycle
Implement Alert Deduplication and Grouping for Noisy GenAI Metrics
Build Alert Effectiveness Tracking to Reduce Alert Fatigue
Implement performance optimization for alerting strategy
Build operational documentation for alerting strategy

GenAI Dashboard Suite6

Build operational dashboard with SLO status, active incidents, and system health overview
Create business dashboard with usage, cost, and adoption metrics for stakeholders
Build compliance dashboard with guardrail activity, audit coverage, and policy status
Implement dashboard-as-code with Grafana provisioning for version-controlled dashboards
Implement performance optimization for dashboard engineering
Build operational documentation for dashboard engineering

Provider SLA Tracker6

Implement per-provider availability tracking with synthetic probes
Build provider degradation detection using quality and latency SLIs
Create automated escalation chains for provider issues with status page integration
Track provider SLA compliance for vendor management and contract negotiation
Implement performance optimization for provider sla monitoring
Build operational documentation for provider sla monitoring

Cross-Env Comparator6

Implement cross-environment metric comparison for quality regression detection
Build staging-to-prod quality correlation analysis for deployment confidence
Create environment drift detection for configuration parity monitoring
Monitor promotion impact by comparing pre/post metrics across environments
Implement performance optimization for cross-environment observability
Build operational documentation for cross-environment observability

AI Incident Commander6

Define LLM-specific incident severity classification with impact-based criteria
Build incident lifecycle management with role assignments and status tracking
Create communication templates for AI-specific incidents targeting different audiences
Track incident metrics with MTTD, MTTA, MTTR and trend analysis
Implement performance optimization for llm incident response framework
Build operational documentation for llm incident response framework

Runbook Automation Engine6

Build alert-to-runbook routing that triggers automated remediation workflows
Implement human approval gates for high-impact remediation steps
Create runbook execution auditing with step-by-step logging and outcome tracking
Track automation coverage and success rates across all runbook types
Implement performance optimization for automated runbook execution
Build operational documentation for automated runbook execution

AI Post-Mortem Engine6

Build structured post-mortem templates for GenAI failure modes
Implement timeline reconstruction from Langfuse traces and Prometheus metrics
Create action item tracking with follow-through verification
Analyze post-mortem trends to identify systemic issues
Implement performance optimization for post-mortems for ai failures
Build operational documentation for post-mortems for ai failures

LLM Chaos Lab6

Deploy Chaos Mesh in vCluster
Build Provider Failover Chaos Experiments
Create Cache Invalidation Chaos Experiments
Implement Quality Degradation Injection
Implement performance optimization for chaos engineering for llm providers
Build operational documentation for chaos engineering for llm providers

Pipeline Chaos Experiments6

Create Embedding Pipeline Chaos Experiments
Build Ingestion Interruption Tests
Implement Index Corruption Detection and Recovery Validation
Track Chaos Experiment Results and Improvement Trends
Implement performance optimization for pipeline failure chaos
Build operational documentation for pipeline failure chaos

GenAI Game Day6

Design multi-failure game day scenarios for GenAI platforms
Build game day orchestration that chains chaos experiments with time delays
Implement game day scoring: response time, runbook adherence, communication quality
Create game day retrospectives with improvement tracking
Implement performance optimization for game day operations
Build operational documentation for game day operations

Cost Attribution Engine6

Instrument per-request cost tracking across all pipeline stages
Build cost attribution to teams, projects, and use cases
Create cost allocation models for shared infrastructure components
Implement cost anomaly detection with automated investigation
Implement performance optimization for full-stack cost attribution
Build operational documentation for full-stack cost attribution

Token Budget Controller6

Configure LiteLLM Virtual Keys with Per-Team Budget Limits
Implement Per-Request Token Limits
Build Budget Alerting at 50%, 80%, and 100% Thresholds with Escalation
Create Budget Override Workflows for Emergency Usage Beyond Limits
Implement performance optimization for token budget enforcement
Build operational documentation for token budget enforcement

Cache Economics Analyzer6

Deploy Redis Semantic Cache and Measure Hit Rate vs Cost Savings
Compare Provider Caching Strategies for OpenAI, Anthropic, and Google
Build Cost-Benefit Analysis with Break-Even Calculations
Recommend Optimal Caching Mix Per Use Case
Implement performance optimization for caching roi analysis
Build operational documentation for caching roi analysis

Batch API Scheduler6

Implement workload classification: real-time vs batch-eligible based on latency requirements
Build Batch API job scheduling with priority queues and SLA tracking
Create batch job monitoring with completion time SLAs and failure handling
Measure and report cost savings from batch routing vs synchronous requests
Implement performance optimization for batch api optimization
Build operational documentation for batch api optimization

Capacity Forecaster6

Build token demand forecasting using historical usage patterns and trend analysis
Implement embedding volume projection for storage and compute planning
Create cost projection models for budget planning cycles
Track forecast accuracy and improve models over time with feedback loops
Implement performance optimization for capacity forecasting
Build operational documentation for capacity forecasting

FinOps Governance Platform6

Build Showback and Chargeback Reports per Team and Project with Full Cost Transparency
Create Executive FinOps Dashboards with Trend Analysis for Leadership
Implement Cost Governance Policies with Automated Enforcement
Generate Monthly FinOps Reviews with Optimization Recommendations
Implement performance optimization for finops reporting and governance
Build operational documentation for finops reporting and governance

Key Rotation Operator6

Implement automated key rotation for OpenAI, Anthropic, and Google provider keys
Build zero-downtime key swap using dual-key overlap windows
Create key rotation audit trails for compliance reporting
Monitor key age and rotation compliance across all environments
Implement performance optimization for api key rotation automation
Build operational documentation for api key rotation automation

PII Detection Pipeline6

Deploy Microsoft Presidio on K8s for runtime PII detection in LLM traffic
Build PII scrubbing middleware that redacts sensitive data before sending to LLM providers
Implement PII detection alerting and audit logging for compliance
Create PII detection tuning workflows to reduce false positives
Implement performance optimization for runtime pii detection
Build operational documentation for runtime pii detection

Injection Monitoring System6

Implement multi-layer prompt injection detection with pattern and embedding-based methods
Build real-time injection alerting with severity classification
Create injection attack analysis dashboards for security monitoring
Implement adaptive detection that learns from new attack patterns
Implement performance optimization for prompt injection monitoring
Build operational documentation for prompt injection monitoring

Guardrail Operations Platform6

Deploy Guardrails AI and LlamaFirewall on K8s for runtime content validation
Implement hot-reload guardrail configuration without service restarts
Build A/B testing framework for guardrail thresholds to optimize block rates
Build testing and validation for guardrail operations
Implement performance optimization for guardrail operations
Build operational documentation for guardrail operations

Compliance Audit Engine6

Implement automated compliance scans for GenAI-specific requirements
Build evidence collection pipelines that gather audit artifacts
Schedule recurring compliance checks with drift detection
Build testing and validation for compliance audit automation
Implement performance optimization for compliance audit automation
Build operational documentation for compliance audit automation

Red Team Automation Platform6

Build automated red team attack suites using Promptfoo for systematic security testing
Implement scheduled security testing with regression detection across model changes
Build security posture scoring with trend monitoring and improvement tracking
Build testing and validation for red team operations
Implement performance optimization for red team operations
Build operational documentation for red team operations

Multi-Tenant GenAI Platform6

Automate tenant onboarding with namespace provisioning and secret management
Implement namespace isolation with network policies and resource quotas
Build noisy-neighbor detection that identifies tenants causing resource contention
Create tenant operations dashboards with per-tenant health visibility
Implement performance optimization for multi-tenant platform operations
Build operational documentation for multi-tenant platform operations

AI Developer Platform6

Build self-service deployment workflows with approval gates for AI artifacts
Create golden path templates for common GenAI patterns
Implement internal tool marketplace for reusable AI components
Build developer experience metrics and platform analytics
Implement performance optimization for internal developer platform for ai
Build operational documentation for internal developer platform for ai

GenAI Ops Maturity Assessor6

Define GenAI operational maturity model with five levels across eight capability areas
Build automated maturity assessment that evaluates current operational state
Generate improvement roadmaps with prioritized actions based on assessment results
Track maturity progression over time with milestone tracking
Implement performance optimization for operational maturity model
Build operational documentation for operational maturity model

GenAI Eval Safety Governance243.6%

EU AI Act Compliance6

Classify AI systems under EU AI Act risk categories
Implement the Feb 2025 AI literacy requirements
Build technical documentation for GPAI compliance
Implement risk management system
Build human oversight mechanisms
Track EU AI Act enforcement timeline compliance

Compliance Frameworks6

Implement NIST AI RMF Govern and Map functions
Implement NIST AI RMF Measure and Manage functions
Build ISO 42001 AI management system documentation
Create unified governance dashboard with Credo AI Agent Registry
Implement comprehensive audit trail
Build compliance automation and alerting

Red Teaming Methodology6

Plan and scope a red team exercise
Execute manual red team techniques
Combine manual, Meta GOAT automated, and Inspect AI red teaming
Build red team findings reports
Track remediation and verify fixes
Build AI safety scorecard and establish red team cadence

Bias, Fairness & Continuous Monitoring6

Detect bias in hosted LLM outputs
Implement fairness metrics for LLM applications
Build continuous safety monitoring for production
Detect safety drift over time
Build safety incident response workflow
Generate weekly and monthly safety reports

GenAI Engineering Leadership1207.8%

Hiring GenAI Engineers6

Design the data model and schema for Hiring GenAI Engineers
Implement the core service logic for Hiring GenAI Engineers
Build the API and interface layer for Hiring GenAI Engineers
Integrate with external systems for Hiring GenAI Engineers
Implement testing and validation for Hiring GenAI Engineers
Deploy and operate Hiring GenAI Engineers in production

Team Structure for AI6

Design the data model and schema for Team Structure for AI
Implement the core service logic for Team Structure for AI
Build the API and interface layer for Team Structure for AI
Integrate with external systems for Team Structure for AI
Implement testing and validation for Team Structure for AI
Deploy and operate Team Structure for AI in production

Career Ladders for AI Engineers6

Design the data model and schema for Career Ladders for AI Engineers
Implement the core service logic for Career Ladders for AI Engineers
Build the API and interface layer for Career Ladders for AI Engineers
Integrate with external systems for Career Ladders for AI Engineers
Implement testing and validation for Career Ladders for AI Engineers
Deploy and operate Career Ladders for AI Engineers in production

Onboarding AI Engineers6

Design the data model and schema for Onboarding AI Engineers
Implement the core service logic for Onboarding AI Engineers
Build the API and interface layer for Onboarding AI Engineers
Integrate with external systems for Onboarding AI Engineers
Implement testing and validation for Onboarding AI Engineers
Deploy and operate Onboarding AI Engineers in production

Building AI Engineering Culture6

Design the data model and schema for Building AI Engineering Culture
Implement the core service logic for Building AI Engineering Culture
Build the API and interface layer for Building AI Engineering Culture
Integrate with external systems for Building AI Engineering Culture
Implement testing and validation for Building AI Engineering Culture
Deploy and operate Building AI Engineering Culture in production

Engineering Process for Non-Deterministic Systems6

Design the data model and schema for Engineering Process for Non-Deterministic Systems
Implement the core service logic for Engineering Process for Non-Deterministic Systems
Build the API and interface layer for Engineering Process for Non-Deterministic Systems
Integrate with external systems for Engineering Process for Non-Deterministic Systems
Implement testing and validation for Engineering Process for Non-Deterministic Systems
Deploy and operate Engineering Process for Non-Deterministic Systems in production

Eval-Driven Development6

Design the data model and schema for Eval-Driven Development
Implement the core service logic for Eval-Driven Development
Build the API and interface layer for Eval-Driven Development
Integrate with external systems for Eval-Driven Development
Implement testing and validation for Eval-Driven Development
Deploy and operate Eval-Driven Development in production

Testing Strategy for AI6

Design the data model and schema for Testing Strategy for AI
Implement the core service logic for Testing Strategy for AI
Build the API and interface layer for Testing Strategy for AI
Integrate with external systems for Testing Strategy for AI
Implement testing and validation for Testing Strategy for AI
Deploy and operate Testing Strategy for AI in production

Code Review for AI Systems6

Design the data model and schema for Code Review for AI Systems
Implement the core service logic for Code Review for AI Systems
Build the API and interface layer for Code Review for AI Systems
Integrate with external systems for Code Review for AI Systems
Implement testing and validation for Code Review for AI Systems
Deploy and operate Code Review for AI Systems in production

Technical Debt in AI Systems6

Design the data model and schema for Technical Debt in AI Systems
Implement the core service logic for Technical Debt in AI Systems
Build the API and interface layer for Technical Debt in AI Systems
Integrate with external systems for Technical Debt in AI Systems
Implement testing and validation for Technical Debt in AI Systems
Deploy and operate Technical Debt in AI Systems in production

Quality Frameworks for AI6

Design the data model and schema for Quality Frameworks for AI
Implement the core service logic for Quality Frameworks for AI
Build the API and interface layer for Quality Frameworks for AI
Integrate with external systems for Quality Frameworks for AI
Implement testing and validation for Quality Frameworks for AI
Deploy and operate Quality Frameworks for AI in production

Velocity Metrics for AI Teams6

Design the data model and schema for Velocity Metrics for AI Teams
Implement the core service logic for Velocity Metrics for AI Teams
Build the API and interface layer for Velocity Metrics for AI Teams
Integrate with external systems for Velocity Metrics for AI Teams
Implement testing and validation for Velocity Metrics for AI Teams
Deploy and operate Velocity Metrics for AI Teams in production

Incident Management for AI6

Design the data model and schema for Incident Management for AI
Implement the core service logic for Incident Management for AI
Build the API and interface layer for Incident Management for AI
Integrate with external systems for Incident Management for AI
Implement testing and validation for Incident Management for AI
Deploy and operate Incident Management for AI in production

Performance Reviews for AI Engineers6

Design the data model and schema for Performance Reviews for AI Engineers
Implement the core service logic for Performance Reviews for AI Engineers
Build the API and interface layer for Performance Reviews for AI Engineers
Integrate with external systems for Performance Reviews for AI Engineers
Implement testing and validation for Performance Reviews for AI Engineers
Deploy and operate Performance Reviews for AI Engineers in production

Managing AI Costs6

Design the data model and schema for Managing AI Costs
Implement the core service logic for Managing AI Costs
Build the API and interface layer for Managing AI Costs
Integrate with external systems for Managing AI Costs
Implement testing and validation for Managing AI Costs
Deploy and operate Managing AI Costs in production

Org Design for AI Functions6

Design the data model and schema for Org Design for AI Functions
Implement the core service logic for Org Design for AI Functions
Build the API and interface layer for Org Design for AI Functions
Integrate with external systems for Org Design for AI Functions
Implement testing and validation for Org Design for AI Functions
Deploy and operate Org Design for AI Functions in production

Cross-Functional Collaboration6

Design the data model and schema for Cross-Functional Collaboration
Implement the core service logic for Cross-Functional Collaboration
Build the API and interface layer for Cross-Functional Collaboration
Integrate with external systems for Cross-Functional Collaboration
Implement testing and validation for Cross-Functional Collaboration
Deploy and operate Cross-Functional Collaboration in production

AI Strategy & Roadmapping6

Design the data model and schema for AI Strategy & Roadmapping
Implement the core service logic for AI Strategy & Roadmapping
Build the API and interface layer for AI Strategy & Roadmapping
Integrate with external systems for AI Strategy & Roadmapping
Implement testing and validation for AI Strategy & Roadmapping
Deploy and operate AI Strategy & Roadmapping in production

Vendor Management for AI6

Design the data model and schema for Vendor Management for AI
Implement the core service logic for Vendor Management for AI
Build the API and interface layer for Vendor Management for AI
Integrate with external systems for Vendor Management for AI
Implement testing and validation for Vendor Management for AI
Deploy and operate Vendor Management for AI in production

Leadership Capstone6

Design the data model and schema for Leadership Capstone
Implement the core service logic for Leadership Capstone
Build the API and interface layer for Leadership Capstone
Integrate with external systems for Leadership Capstone
Implement testing and validation for Leadership Capstone
Deploy and operate Leadership Capstone in production

GenAI Engineering Leader

Verifiable skill graph

What you'll ship in production

Hire and build GenAI engineering teams

Define engineering processes

Manage quality and team performance

Understand the technical stack

Operate and budget for GenAI infrastructure

Design organization structure

Drive technical strategy

Ensure responsible AI practices

Curriculum