Published taxonomy · v1.0 · 2026-06-14 · CC-BY-4.0

The GenAI Engineering Role Taxonomy

12 disciplines · 102 skills · 14 categories. Roles curated from live GenAI job descriptions and vetted by practicing engineers; each role's responsibilities map to a skill ladder assessed by graded labs. The role is the unit; the skill is what gets measured.

Download JSON Explore disciplines →

Disciplines & responsibilities

GenAI Application Engineering

Build production RAG & prompt chain applications, design streaming chat UIs, implement guardrails & evaluation, optimize LLM inference costs, and deploy on Kubernetes.

• Design and build production GenAI features (chatbots, search, summarization) into web applications
• Implement RAG pipelines with vector databases for enterprise search and knowledge retrieval
• Optimize LLM inference for latency, cost, and reliability across multiple providers
• Integrate LLM APIs (OpenAI, Gemini, Anthropic) into existing applications with error handling
• Build GenAI agent features with tool calling, function execution, and human-in-the-loop workflows
• Evaluate model outputs using automated metrics and LLM-as-judge for production quality
• Deploy and containerize GenAI applications on Kubernetes with CI/CD

GenAI Agent Engineering

Build autonomous multi-agent systems with planning, reasoning, tool use, memory, MCP/A2A protocols, safety boundaries, and production evaluation.

• Design autonomous GenAI agents using state machines with tool calling, memory, and planning
• Build multi-agent systems with supervisor/worker hierarchies, delegation, and parallel execution
• Implement MCP servers and clients for standardized tool integration
• Enable agent-to-agent communication using A2A protocol for cross-framework interoperability
• Build production RAG agents with iterative retrieval, self-verification, and query decomposition
• Implement guardrails and safety controls within agent workflows
• Evaluate agent performance with trajectory analysis and cost tracking
• Design context engineering — systematic composition of prompts, memory, tools, and history

GenAI Inference Engineering

Architect multi-provider LLM gateways, implement semantic caching and batch optimization, monitor provider SLAs, and optimize inference costs.

• Design LLM gateway infrastructure routing requests across providers
• Optimize request latency through caching, batching, and streaming
• Implement structured output extraction from LLMs with type safety
• Build cost attribution and FinOps dashboards tracking token spend
• Monitor inference quality metrics in production
• Implement intelligent routing — route queries to model tiers based on complexity
• Manage API rate limits and quotas across providers
• Deploy inference services on K8s with scaling and health checks

GenAI Platform Engineering

Build internal GenAI developer platforms with self-service capabilities, multi-tenancy, RBAC, CI/CD for model/prompt/guardrail pipelines.

• Build the internal GenAI platform enabling developers to deploy LLM applications self-service
• Design multi-tenant infrastructure with namespace isolation and RBAC
• Implement CI/CD pipelines with GitOps for GenAI applications
• Manage data infrastructure — databases, caches, message queues on K8s
• Build autoscaling for GenAI workloads using event-driven scaling and batch job queuing
• Provision infrastructure-as-code using K8s-native tooling
• Implement full-stack observability across the GenAI platform
• Operate LLM gateways as platform infrastructure

Forward Deployed GenAI Engineering

Rapid-prototype GenAI solutions on customer infrastructure, integrate GenAI with customer data and workflows, scope solutions with delivery methodology.

• Embed on-site with clients to discover GenAI opportunities and scope projects
• Build rapid prototypes that demonstrate GenAI value within weeks
• Integrate GenAI into client data systems — databases, APIs, and legacy systems
• Customize LLM applications for client-specific domains (healthcare, finance, legal)
• Deploy solutions as packaged Helm charts clients can operate independently
• Build GenAI agent workflows tailored to client business processes
• Manage LLM provider costs and build FinOps models for client engagements
• Configure enterprise guardrails to meet client compliance requirements

LLMOps Engineering

Monitor hallucination rates and token costs, operate guardrails and eval gates, manage prompt versioning and canary deployments.

• Design CI/CD pipelines for LLM application deployment
• Monitor LLM systems in production — latency, errors, costs, quality
• Manage LLM gateway operations — key rotation, failover, quota management
• Implement FinOps practices — cost attribution, budgets, and optimization
• Build continuous evaluation pipelines for production LLM quality
• Detect and respond to prompt attacks and safety incidents in production
• Manage data quality for RAG systems — freshness, drift, accuracy
• Implement capacity planning — predict demand and right-size deployments

GenAI Safety & Evaluation Engineering

Design automated LLM evaluation pipelines, red-team GenAI systems, build bias detection and fairness benchmarks, implement guardrails.

• Build automated evaluation pipelines to continuously measure LLM output quality
• Conduct red-team exercises — probe LLMs for vulnerabilities
• Implement production guardrails — content filters, PII detection, jailbreak prevention
• Design GenAI governance frameworks aligned with regulations
• Evaluate GenAI agent behavior — trajectory quality, tool selection accuracy
• Monitor bias, fairness, and hallucination rates in production
• Build safety incident response processes for deployed GenAI systems
• Design LlamaFirewall policies for agent safety

GenAI Security Engineering

Engineer defenses against prompt injection, jailbreaks, and data exfiltration. Implement PII leakage detection, content safety, and compliance.

• Conduct adversarial red-team testing of LLM systems
• Implement defense-in-depth guardrails — input validation, output filtering, content safety
• Threat-model GenAI agent systems — analyze attack surfaces across tools, memory, and inter-agent communication
• Build PII protection — detect, classify, and redact sensitive data in LLM pipelines
• Design compliance programs aligned with OWASP LLM Top 10, MITRE ATLAS, EU AI Act
• Build security monitoring for GenAI systems
• Implement incident response for GenAI security events
• Secure GenAI supply chain — model provenance, dependency scanning, container security

GenAI Solutions Architecture

Design enterprise GenAI reference architectures, create ADRs and technical standards, bridge GenAI with enterprise workflows.

• Define enterprise GenAI architecture with proper documentation and governance
• Design scalable RAG systems at enterprise scale
• Architect multi-agent systems with MCP mesh and A2A network topology
• Lead PoC development and production rollouts with model selection and cost estimation
• Design GenAI governance architecture — RBAC, audit trails, and compliance
• Oversee operational architecture — observability, FinOps, SLA management
• Integrate GenAI with enterprise data platforms — pipelines, knowledge graphs, streaming
• Present architecture decisions with cost/risk analysis to leadership

GenAI Solutions & Delivery

Scope GenAI solutions with estimation, risk, and success criteria. Orchestrate delivery teams, manage client relationships.

• Lead end-to-end GenAI project delivery from discovery through production handoff
• Design GenAI architecture for client engagements
• Build agent-based solutions for client business processes
• Customize enterprise LLM deployments — gateways, RAG, domain adaptation
• Manage FinOps for client GenAI projects
• Scope project timelines and team requirements
• Package solutions as deployable artifacts for client operations teams
• Advise clients on technology roadmaps with emerging GenAI patterns

GenAI Engineering Leader

Hire and build GenAI engineering teams, design team structures for GenAI, set engineering quality frameworks.

• Hire and build GenAI engineering teams
• Define engineering processes for GenAI development — eval-driven workflows
• Manage quality and team performance for GenAI outputs
• Understand the technical stack deeply enough to unblock teams
• Operate and budget for GenAI infrastructure — FinOps and capacity
• Design organization structure for GenAI engineering teams
• Drive technical strategy — evaluate new tools and plan migrations
• Ensure responsible AI practices across your team

GenAI Data Engineering

Build RAG data pipelines for ingestion, chunking, embedding, and indexing. Manage vector store operations and embedding model lifecycle.

• Build embedding pipelines — ingest, chunk, embed, and store in vector databases
• Design RAG data infrastructure — hybrid search and reranking
• Build knowledge graph pipelines using Neo4j
• Process documents at scale — parsing, chunking, and quality filtering
• Implement data quality controls — PII, dedup, compliance filtering
• Orchestrate data pipelines with scheduling and failure recovery
• Monitor pipeline health — freshness, quality scores, embedding drift
• Design multi-tenant data isolation for enterprise RAG

Skill ladder (102 skills)

Agent core (10)

Agent Memory SystemsImplements short-term sliding windows, semantic memory, and context optimization for agents.
Agent State-Graph PatternsDesigns agent state-graph pipelines with typed schemas, nodes, edges, conditional routing, compilation, and state-transition debugging.
Agent Tool Design & ValidationDesigns typed agent tools with Pydantic schemas, docstring parsing, and runtime validation.
Agentic RAG & Knowledge GraphsBuilds Self-RAG, Corrective RAG, and GraphRAG pipelines with adaptive retrieval and entity graphs.
Enterprise Vertical Agent PatternsBuilds document-processing, triage, and code-review agents with domain-specific tool sets and human handoff points.
LangGraph Framework UsageBuilds agents with the LangGraph library: StateGraph, conditional edges, MessagesState, ToolNode integration.
Multi-Agent OrchestrationBuilds supervisor, hierarchical, and reflector multi-agent patterns with handoffs and result aggregation.
Multimodal & Computer-Use AgentsBuilds vision, voice, computer-use, and code agents with multimodal models and desktop automation.
ReAct & Planning Agent LoopsBuilds ReAct agent loops with thought-action-observation, planning, and dynamic replanning.
Web Browsing AgentsBuilds agents that navigate web pages with Playwright, extract structured data, and submit forms.

Agent deployment (5)

Agent Cost Control & Model RoutingTracks per-agent token spend, routes tasks to cost-appropriate models, and enforces budget limits.
Agent Load Testing & Capacity PlanningRuns concurrent-load benchmarks with k6 or Locust, identifies bottlenecks, and plans capacity for production agents.
Agent Observability & TracingInstruments agents with OpenTelemetry, Langfuse, fleet dashboards, and tool-use debugging.
Agent Release ManagementManages agent config versions with canary rollout, automated rollback, and config drift detection.
Production Agent DeploymentServes agents via FastAPI on Kubernetes with Postgres/Redis state, horizontal scaling, and CI/CD pipelines.

Agent infrastructure (2)

A2A Protocol & Agent NetworksImplements Agent-to-Agent protocol for discovery, authentication, and remote task delegation across agent fleets.
MCP Protocol Servers & ClientsBuilds and consumes MCP servers using JSON-RPC 2.0 over stdio and SSE with tool, resource, and prompt exposure.

Agent safety (3)

Agent Evaluation & BenchmarkingBuilds golden datasets, LLM-as-judge pipelines, trajectory scoring, and CI-gated regression testing for agents.
Agent Safety Guardrails & Injection DefenseImplements input/output guardrails, jailbreak detection, prompt injection defense, and safety boundaries.
Enterprise Agent Governance & AuditEnforces audit trails, escalation policies, human-in-the-loop checkpoints, and compliance reporting on production agents.

Cost & economics (5)

Caching Strategies for CostDesigns prompt, semantic, and tool-call caches with appropriate TTLs and invalidation, quantifying cost-per-hit and quality impact.
Cost Anomaly MonitoringInstruments cost telemetry per feature/tenant and detects anomalies via baselines or statistical detectors before they become bills.
Cost-Aware Model RoutingBuilds cascade and routing strategies that send easy queries to cheap models and hard queries to expensive ones, governed by quality SLOs.
GPU Capacity PlanningPlans GPU capacity using spot/reserved/on-demand mix, autoscaling envelopes, and queue-based load shedding to hit SLO at target cost.
LLM Cost ModelingModels per-request token economics, p99 cost, and unit economics for LLM features; compares hosted vs. self-hosted total cost.

Customization (8)

Continued Pretraining for Domain AdaptationPerforms domain-adaptive continued pretraining on curated corpora and measures downstream-task improvement vs. base model.
Distributed Training InfrastructureConfigures distributed training with DeepSpeed, FSDP, or accelerate; understands ZeRO stages, gradient checkpointing, and mixed precision.
Few-Shot & In-Context Learning DesignDesigns few-shot exemplar selection (k, ordering, similarity-based retrieval) and measures in-context learning quality.
Fine-Tuning EvaluationBuilds evaluation harnesses to compare base vs. fine-tuned models on task suites, regression sets, and held-out human preference data.
Preference Optimization (DPO/RLHF)Aligns models with human or AI preferences using DPO, IPO, KTO, or RLHF/RLAIF pipelines, including reward modeling fundamentals.
Production Prompt Template EngineeringAuthors versioned production prompts with structured outputs, ablations, and prompt-variant A/B tests under load.
Supervised Fine-Tuning (LoRA/QLoRA)Fine-tunes open-weight LLMs with LoRA, QLoRA, and full SFT, manages training hyperparameters, and evaluates instruction-following gains.
Training Dataset CurationCurates, deduplicates, and decontaminates training datasets; balances domain mixtures and applies quality filters.

Data engineering (10)

Chunking Strategies for RAGSelects chunking strategies (fixed, recursive, semantic, hierarchical, late-chunking) per document class and measures retrieval impact.
Data Lake & Warehouse for AIModels AI feature and event tables in BigQuery, Snowflake, or open-table formats (Iceberg, Delta) with appropriate partitioning and clustering.
Data Pipeline OrchestrationDesigns idempotent batch and incremental pipelines using Airflow, Dagster, or Prefect, with retries, lineage, and SLAs.
Data Quality & ValidationEncodes data contracts, schema checks, drift detection, and quality SLOs using Great Expectations, dbt tests, or equivalent tooling.
Document Parsing & ExtractionExtracts structured content from PDF, DOCX, HTML, and scanned images using Unstructured, Docling, or comparable tooling, including layout-aware parsing.
Hybrid Retrieval & RerankingCombines lexical (BM25) and dense retrieval, applies cross-encoder rerankers, and tunes retrieval-quality metrics (recall, MRR, nDCG).
Knowledge Graph ConstructionBuilds knowledge graphs from unstructured corpora — entity extraction, linking, deduplication, and graph schema design for retrieval.
PII & Data GovernanceDetects and redacts PII, enforces data residency and retention policies, and tracks lineage for AI training and inference data.
Streaming Data with Kafka/PulsarBuilds event-driven AI pipelines with Kafka or Pulsar — partitioning, consumer groups, exactly-once semantics, and schema evolution.
Vector Database OperationsOperates production vector DBs (Pinecone, Weaviate, Qdrant, pgvector) — index tuning, sharding, hybrid filters, and capacity planning.

Evaluation (8)

Agent Trajectory EvaluationEvaluates end-to-end agent task success — tool-call correctness, intermediate state validation, and trace-based replay scoring.
Bias, Fairness & Toxicity TestingAudits models for demographic bias, fairness gaps, and toxicity using accepted suites and reports impact in plain terms.
Domain Benchmark DesignDesigns domain-specific benchmarks with held-out splits, contamination checks, and diverse failure-mode coverage.
Factuality & Grounding EvaluationQuantifies hallucination rate and grounding fidelity for RAG and agent outputs using span-level annotators or reference-based metrics.
LLM Evaluation HarnessesRuns evaluations using lm-evaluation-harness, Inspect, OpenAI Evals, or custom harnesses with reproducible task specs.
LLM Regression Testing in CIWires evaluation suites into CI gates with golden-set tracking, drift alerts, and statistically valid regression thresholds.
LLM-as-Judge EvaluationDesigns LLM-judge rubrics with calibration, debiasing, and inter-judge agreement checks; knows when judges are unreliable.
Red-Teaming & Jailbreak TestingGenerates adversarial prompts, tests jailbreak resistance, and reports findings with severity and reproduction steps.

Foundations (9)

Async Python with asyncioWrites concurrent async/await code with asyncio, gather, semaphores, and async HTTP clients.
Configuration & Secrets ManagementManages environment variables, .env files, and secrets safely with python-dotenv and decorators.
File I/O, JSON & Exception HandlingReads and writes files, parses JSON, and handles errors with try/except and custom exceptions.
Python Classes & DataclassesModels data with classes, dataclasses, methods, and inheritance for structured Python code.
Python Core ProgrammingWrites Python programs using variables, control flow, functions, modules, and packages.
Python Data Pipelines with PolarsBuilds data transformation pipelines with Polars, generators, and lazy evaluation for tabular data.
Python Data Structures & ComprehensionsManipulates lists, tuples, dictionaries, and sets using slicing, iteration, and comprehensions.
Python Testing with pytestWrites unit and integration tests with pytest fixtures, assertions, and mocking patterns.
Type Hints & Pydantic ModelsBuilds typed Python data models with type hints, generics, protocols, and Pydantic validation.

Inference optimization (10)

Continuous Batching & Inference ServingImplements continuous and dynamic batching for high-throughput LLM serving using vLLM, TGI, or comparable engines.
GPU Kernel Programming BasicsReads and authors basic Triton or CUDA kernels for custom ops, understands occupancy and memory coalescing fundamentals.
GPU Memory ManagementProfiles CUDA memory, sizes batches to fit available VRAM, handles OOM gracefully, and uses gradient checkpointing or offloading for memory-bound workloads.
Inference Latency ProfilingProfiles p50/p95/p99 token-generation latency, isolates bottlenecks across tokenizer, attention, and decode phases, and reports actionable findings.
KV Cache OptimizationTunes transformer decoder KV cache for throughput and memory; understands prefix caching, paged-attention, and cache eviction strategies.
Model Distillation & PruningCompresses large models via knowledge distillation and structured/unstructured pruning while preserving target metrics.
Model QuantizationApplies INT8/INT4/FP8 post-training quantization (GPTQ, AWQ, GGUF, bitsandbytes) and measures quality vs. throughput trade-offs.
Model Serving FrameworksDeploys LLMs via vLLM, TGI, TensorRT-LLM, or SGLang with appropriate engine flags, schedulers, and runtime configuration.
Multi-GPU Tensor & Pipeline ParallelismConfigures tensor-parallel and pipeline-parallel sharding across multiple GPUs to serve models that exceed single-GPU memory.
Speculative DecodingImplements speculative-decoding strategies (draft models, Medusa, lookahead) to reduce decoder latency while preserving output distribution.

Infrastructure (7)

Docker Containerization for LLM AppsWrites Dockerfiles with multi-stage builds, manages images, and runs containers with Compose.
Helm & Kustomize PackagingPackages Kubernetes apps with Helm charts and manages environment overlays with Kustomize.
K8s Health Probes & AutoscalingConfigures liveness, readiness, startup probes, HPA, and PodDisruptionBudgets for resilient services.
Kubernetes Ingress, TLS & NetworkPolicyExposes services via Ingress with TLS termination and isolates traffic with NetworkPolicies.
Kubernetes RBAC & TroubleshootingApplies RBAC, Pod Security Standards, and SecurityContext while debugging CrashLoopBackOff and OOMKilled pods.
Kubernetes Workloads, Pods & ServicesDeploys pods, services, and Deployments to Kubernetes with rolling updates and DNS-based discovery.
Sandboxed Agent Code ExecutionIsolates agent-generated code in containers with timeouts, cgroup resource limits, and input sanitization to prevent escape.

LLM core (9)

Embeddings & Semantic SearchGenerates embeddings, computes cosine similarity, and builds semantic search over documents.
LangChain & LCEL RunnablesComposes LangChain Runnables with LCEL pipe syntax, streaming, batching, and configurable runtime fields.
LLM API IntegrationCalls OpenAI, Anthropic, and Gemini APIs with auth, error handling, and response parsing.
LLM Cost & Resilience OptimizationTracks token costs, applies retry with exponential backoff, and tunes prompts for budget.
LLM Function Calling & Tool UseDefines tool schemas in JSON Schema and orchestrates multi-turn function calling across providers.
LLM Sampling & Structured OutputControls LLM outputs with temperature, top-p, stop sequences, JSON mode, and structured schemas.
Multi-Provider Prompt EngineeringBuilds versioned prompts with Jinja2, few-shot examples, and chain-of-thought across providers.
RAG Pipeline FundamentalsBuilds retrieval-augmented generation pipelines with chunking, retrieval, and citation.
Transformer Architecture InternalsImplements scaled dot-product attention and reasons about KV-cache memory, FFN dimensions, and quantization tradeoffs to choose inference strategies.

Security (8)

AI IAM & Secrets ManagementConfigures IAM, Workload Identity / IRSA, KMS, and short-lived credentials for AI workloads; rotates and audits secrets.
Compliance Frameworks for AIMaps AI systems to SOC 2, ISO 27001, HIPAA, and EU AI Act controls; produces evidence and audit-ready documentation.
Model Supply-Chain SecurityVerifies model provenance, signed weights, SBOMs, and dependency integrity for open-weight and hosted models.
Output Filtering & Data-Loss PreventionBuilds output-side DLP for PII, secrets, and proprietary IP, with deterministic filters layered with model-based classifiers.
Prompt Injection DefenseIdentifies direct and indirect prompt-injection vectors and implements input filtering, isolation, and least-privilege tool gating.
Threat Modeling for AI SystemsApplies STRIDE / PASTA threat modeling to AI architectures including model, data, and agent-tool boundaries.
Vulnerability Scanning for AI StacksRuns SCA, SAST, container, and model-asset scanning in CI; triages and remediates findings with appropriate severity gates.
Zero-Trust Networking for AIEnforces network isolation, egress allowlists, mTLS, and zero-trust policies for AI inference and training workloads.

Web APIs (8)

API Authentication & AuthorizationImplements OAuth2, JWT, API keys, and role-based access control on FastAPI endpoints.
API Gateway & RoutingBuilds reverse-proxy gateways with path routing, load balancing, and response aggregation.
API ObservabilityInstruments APIs with Prometheus metrics, OpenTelemetry traces, structured logging, and Grafana dashboards.
API Resilience PatternsApplies rate limiting, circuit breakers, retries with backoff, and bulkhead isolation to API services.
API Testing & VersioningTests async endpoints with pytest and httpx, and manages API versions with deprecation strategies.
Async Databases with SQLAlchemy & AlembicModels data with async SQLAlchemy ORM, manages migrations with Alembic, and applies the repository pattern.
FastAPI REST API DevelopmentBuilds production REST APIs with FastAPI using Pydantic validation, dependency injection, and async handlers.
Real-Time Streaming with SSE & WebSocketsStreams LLM responses with SSE and manages WebSocket connection lifecycles for real-time apps.

Cite this taxonomy

Released under CC-BY-4.0 — cite it, link it, build on it.

GenBodha. "The GenAI Engineering Role Taxonomy v1.0." genbodha.ai, 2026-06-14. https://genbodha.ai/taxonomy (CC-BY-4.0).