Published taxonomy · v1.0 · 2026-06-14 · CC-BY-4.0
The GenAI Engineering Role Taxonomy
12 disciplines · 102 skills · 14 categories. Roles curated from live GenAI job descriptions and vetted by practicing engineers; each role's responsibilities map to a skill ladder assessed by graded labs. The role is the unit; the skill is what gets measured.
Disciplines & responsibilities
GenAI Application Engineering
Build production RAG & prompt chain applications, design streaming chat UIs, implement guardrails & evaluation, optimize LLM inference costs, and deploy on Kubernetes.
- • Design and build production GenAI features (chatbots, search, summarization) into web applications
- • Implement RAG pipelines with vector databases for enterprise search and knowledge retrieval
- • Optimize LLM inference for latency, cost, and reliability across multiple providers
- • Integrate LLM APIs (OpenAI, Gemini, Anthropic) into existing applications with error handling
- • Build GenAI agent features with tool calling, function execution, and human-in-the-loop workflows
- • Evaluate model outputs using automated metrics and LLM-as-judge for production quality
- • Deploy and containerize GenAI applications on Kubernetes with CI/CD
GenAI Agent Engineering
Build autonomous multi-agent systems with planning, reasoning, tool use, memory, MCP/A2A protocols, safety boundaries, and production evaluation.
- • Design autonomous GenAI agents using state machines with tool calling, memory, and planning
- • Build multi-agent systems with supervisor/worker hierarchies, delegation, and parallel execution
- • Implement MCP servers and clients for standardized tool integration
- • Enable agent-to-agent communication using A2A protocol for cross-framework interoperability
- • Build production RAG agents with iterative retrieval, self-verification, and query decomposition
- • Implement guardrails and safety controls within agent workflows
- • Evaluate agent performance with trajectory analysis and cost tracking
- • Design context engineering — systematic composition of prompts, memory, tools, and history
GenAI Inference Engineering
Architect multi-provider LLM gateways, implement semantic caching and batch optimization, monitor provider SLAs, and optimize inference costs.
- • Design LLM gateway infrastructure routing requests across providers
- • Optimize request latency through caching, batching, and streaming
- • Implement structured output extraction from LLMs with type safety
- • Build cost attribution and FinOps dashboards tracking token spend
- • Monitor inference quality metrics in production
- • Implement intelligent routing — route queries to model tiers based on complexity
- • Manage API rate limits and quotas across providers
- • Deploy inference services on K8s with scaling and health checks
GenAI Platform Engineering
Build internal GenAI developer platforms with self-service capabilities, multi-tenancy, RBAC, CI/CD for model/prompt/guardrail pipelines.
- • Build the internal GenAI platform enabling developers to deploy LLM applications self-service
- • Design multi-tenant infrastructure with namespace isolation and RBAC
- • Implement CI/CD pipelines with GitOps for GenAI applications
- • Manage data infrastructure — databases, caches, message queues on K8s
- • Build autoscaling for GenAI workloads using event-driven scaling and batch job queuing
- • Provision infrastructure-as-code using K8s-native tooling
- • Implement full-stack observability across the GenAI platform
- • Operate LLM gateways as platform infrastructure
Forward Deployed GenAI Engineering
Rapid-prototype GenAI solutions on customer infrastructure, integrate GenAI with customer data and workflows, scope solutions with delivery methodology.
- • Embed on-site with clients to discover GenAI opportunities and scope projects
- • Build rapid prototypes that demonstrate GenAI value within weeks
- • Integrate GenAI into client data systems — databases, APIs, and legacy systems
- • Customize LLM applications for client-specific domains (healthcare, finance, legal)
- • Deploy solutions as packaged Helm charts clients can operate independently
- • Build GenAI agent workflows tailored to client business processes
- • Manage LLM provider costs and build FinOps models for client engagements
- • Configure enterprise guardrails to meet client compliance requirements
LLMOps Engineering
Monitor hallucination rates and token costs, operate guardrails and eval gates, manage prompt versioning and canary deployments.
- • Design CI/CD pipelines for LLM application deployment
- • Monitor LLM systems in production — latency, errors, costs, quality
- • Manage LLM gateway operations — key rotation, failover, quota management
- • Implement FinOps practices — cost attribution, budgets, and optimization
- • Build continuous evaluation pipelines for production LLM quality
- • Detect and respond to prompt attacks and safety incidents in production
- • Manage data quality for RAG systems — freshness, drift, accuracy
- • Implement capacity planning — predict demand and right-size deployments
GenAI Safety & Evaluation Engineering
Design automated LLM evaluation pipelines, red-team GenAI systems, build bias detection and fairness benchmarks, implement guardrails.
- • Build automated evaluation pipelines to continuously measure LLM output quality
- • Conduct red-team exercises — probe LLMs for vulnerabilities
- • Implement production guardrails — content filters, PII detection, jailbreak prevention
- • Design GenAI governance frameworks aligned with regulations
- • Evaluate GenAI agent behavior — trajectory quality, tool selection accuracy
- • Monitor bias, fairness, and hallucination rates in production
- • Build safety incident response processes for deployed GenAI systems
- • Design LlamaFirewall policies for agent safety
GenAI Security Engineering
Engineer defenses against prompt injection, jailbreaks, and data exfiltration. Implement PII leakage detection, content safety, and compliance.
- • Conduct adversarial red-team testing of LLM systems
- • Implement defense-in-depth guardrails — input validation, output filtering, content safety
- • Threat-model GenAI agent systems — analyze attack surfaces across tools, memory, and inter-agent communication
- • Build PII protection — detect, classify, and redact sensitive data in LLM pipelines
- • Design compliance programs aligned with OWASP LLM Top 10, MITRE ATLAS, EU AI Act
- • Build security monitoring for GenAI systems
- • Implement incident response for GenAI security events
- • Secure GenAI supply chain — model provenance, dependency scanning, container security
GenAI Solutions Architecture
Design enterprise GenAI reference architectures, create ADRs and technical standards, bridge GenAI with enterprise workflows.
- • Define enterprise GenAI architecture with proper documentation and governance
- • Design scalable RAG systems at enterprise scale
- • Architect multi-agent systems with MCP mesh and A2A network topology
- • Lead PoC development and production rollouts with model selection and cost estimation
- • Design GenAI governance architecture — RBAC, audit trails, and compliance
- • Oversee operational architecture — observability, FinOps, SLA management
- • Integrate GenAI with enterprise data platforms — pipelines, knowledge graphs, streaming
- • Present architecture decisions with cost/risk analysis to leadership
GenAI Solutions & Delivery
Scope GenAI solutions with estimation, risk, and success criteria. Orchestrate delivery teams, manage client relationships.
- • Lead end-to-end GenAI project delivery from discovery through production handoff
- • Design GenAI architecture for client engagements
- • Build agent-based solutions for client business processes
- • Customize enterprise LLM deployments — gateways, RAG, domain adaptation
- • Manage FinOps for client GenAI projects
- • Scope project timelines and team requirements
- • Package solutions as deployable artifacts for client operations teams
- • Advise clients on technology roadmaps with emerging GenAI patterns
GenAI Engineering Leader
Hire and build GenAI engineering teams, design team structures for GenAI, set engineering quality frameworks.
- • Hire and build GenAI engineering teams
- • Define engineering processes for GenAI development — eval-driven workflows
- • Manage quality and team performance for GenAI outputs
- • Understand the technical stack deeply enough to unblock teams
- • Operate and budget for GenAI infrastructure — FinOps and capacity
- • Design organization structure for GenAI engineering teams
- • Drive technical strategy — evaluate new tools and plan migrations
- • Ensure responsible AI practices across your team
GenAI Data Engineering
Build RAG data pipelines for ingestion, chunking, embedding, and indexing. Manage vector store operations and embedding model lifecycle.
- • Build embedding pipelines — ingest, chunk, embed, and store in vector databases
- • Design RAG data infrastructure — hybrid search and reranking
- • Build knowledge graph pipelines using Neo4j
- • Process documents at scale — parsing, chunking, and quality filtering
- • Implement data quality controls — PII, dedup, compliance filtering
- • Orchestrate data pipelines with scheduling and failure recovery
- • Monitor pipeline health — freshness, quality scores, embedding drift
- • Design multi-tenant data isolation for enterprise RAG
Skill ladder (102 skills)
- Agent Memory SystemsImplements short-term sliding windows, semantic memory, and context optimization for agents.
- Agent State-Graph PatternsDesigns agent state-graph pipelines with typed schemas, nodes, edges, conditional routing, compilation, and state-transition debugging.
- Agent Tool Design & ValidationDesigns typed agent tools with Pydantic schemas, docstring parsing, and runtime validation.
- Agentic RAG & Knowledge GraphsBuilds Self-RAG, Corrective RAG, and GraphRAG pipelines with adaptive retrieval and entity graphs.
- Enterprise Vertical Agent PatternsBuilds document-processing, triage, and code-review agents with domain-specific tool sets and human handoff points.
- LangGraph Framework UsageBuilds agents with the LangGraph library: StateGraph, conditional edges, MessagesState, ToolNode integration.
- Multi-Agent OrchestrationBuilds supervisor, hierarchical, and reflector multi-agent patterns with handoffs and result aggregation.
- Multimodal & Computer-Use AgentsBuilds vision, voice, computer-use, and code agents with multimodal models and desktop automation.
- ReAct & Planning Agent LoopsBuilds ReAct agent loops with thought-action-observation, planning, and dynamic replanning.
- Web Browsing AgentsBuilds agents that navigate web pages with Playwright, extract structured data, and submit forms.
- Agent Cost Control & Model RoutingTracks per-agent token spend, routes tasks to cost-appropriate models, and enforces budget limits.
- Agent Load Testing & Capacity PlanningRuns concurrent-load benchmarks with k6 or Locust, identifies bottlenecks, and plans capacity for production agents.
- Agent Observability & TracingInstruments agents with OpenTelemetry, Langfuse, fleet dashboards, and tool-use debugging.
- Agent Release ManagementManages agent config versions with canary rollout, automated rollback, and config drift detection.
- Production Agent DeploymentServes agents via FastAPI on Kubernetes with Postgres/Redis state, horizontal scaling, and CI/CD pipelines.
- A2A Protocol & Agent NetworksImplements Agent-to-Agent protocol for discovery, authentication, and remote task delegation across agent fleets.
- MCP Protocol Servers & ClientsBuilds and consumes MCP servers using JSON-RPC 2.0 over stdio and SSE with tool, resource, and prompt exposure.
- Agent Evaluation & BenchmarkingBuilds golden datasets, LLM-as-judge pipelines, trajectory scoring, and CI-gated regression testing for agents.
- Agent Safety Guardrails & Injection DefenseImplements input/output guardrails, jailbreak detection, prompt injection defense, and safety boundaries.
- Enterprise Agent Governance & AuditEnforces audit trails, escalation policies, human-in-the-loop checkpoints, and compliance reporting on production agents.
- Caching Strategies for CostDesigns prompt, semantic, and tool-call caches with appropriate TTLs and invalidation, quantifying cost-per-hit and quality impact.
- Cost Anomaly MonitoringInstruments cost telemetry per feature/tenant and detects anomalies via baselines or statistical detectors before they become bills.
- Cost-Aware Model RoutingBuilds cascade and routing strategies that send easy queries to cheap models and hard queries to expensive ones, governed by quality SLOs.
- GPU Capacity PlanningPlans GPU capacity using spot/reserved/on-demand mix, autoscaling envelopes, and queue-based load shedding to hit SLO at target cost.
- LLM Cost ModelingModels per-request token economics, p99 cost, and unit economics for LLM features; compares hosted vs. self-hosted total cost.
- Continued Pretraining for Domain AdaptationPerforms domain-adaptive continued pretraining on curated corpora and measures downstream-task improvement vs. base model.
- Distributed Training InfrastructureConfigures distributed training with DeepSpeed, FSDP, or accelerate; understands ZeRO stages, gradient checkpointing, and mixed precision.
- Few-Shot & In-Context Learning DesignDesigns few-shot exemplar selection (k, ordering, similarity-based retrieval) and measures in-context learning quality.
- Fine-Tuning EvaluationBuilds evaluation harnesses to compare base vs. fine-tuned models on task suites, regression sets, and held-out human preference data.
- Preference Optimization (DPO/RLHF)Aligns models with human or AI preferences using DPO, IPO, KTO, or RLHF/RLAIF pipelines, including reward modeling fundamentals.
- Production Prompt Template EngineeringAuthors versioned production prompts with structured outputs, ablations, and prompt-variant A/B tests under load.
- Supervised Fine-Tuning (LoRA/QLoRA)Fine-tunes open-weight LLMs with LoRA, QLoRA, and full SFT, manages training hyperparameters, and evaluates instruction-following gains.
- Training Dataset CurationCurates, deduplicates, and decontaminates training datasets; balances domain mixtures and applies quality filters.
- Chunking Strategies for RAGSelects chunking strategies (fixed, recursive, semantic, hierarchical, late-chunking) per document class and measures retrieval impact.
- Data Lake & Warehouse for AIModels AI feature and event tables in BigQuery, Snowflake, or open-table formats (Iceberg, Delta) with appropriate partitioning and clustering.
- Data Pipeline OrchestrationDesigns idempotent batch and incremental pipelines using Airflow, Dagster, or Prefect, with retries, lineage, and SLAs.
- Data Quality & ValidationEncodes data contracts, schema checks, drift detection, and quality SLOs using Great Expectations, dbt tests, or equivalent tooling.
- Document Parsing & ExtractionExtracts structured content from PDF, DOCX, HTML, and scanned images using Unstructured, Docling, or comparable tooling, including layout-aware parsing.
- Hybrid Retrieval & RerankingCombines lexical (BM25) and dense retrieval, applies cross-encoder rerankers, and tunes retrieval-quality metrics (recall, MRR, nDCG).
- Knowledge Graph ConstructionBuilds knowledge graphs from unstructured corpora — entity extraction, linking, deduplication, and graph schema design for retrieval.
- PII & Data GovernanceDetects and redacts PII, enforces data residency and retention policies, and tracks lineage for AI training and inference data.
- Streaming Data with Kafka/PulsarBuilds event-driven AI pipelines with Kafka or Pulsar — partitioning, consumer groups, exactly-once semantics, and schema evolution.
- Vector Database OperationsOperates production vector DBs (Pinecone, Weaviate, Qdrant, pgvector) — index tuning, sharding, hybrid filters, and capacity planning.
- Agent Trajectory EvaluationEvaluates end-to-end agent task success — tool-call correctness, intermediate state validation, and trace-based replay scoring.
- Bias, Fairness & Toxicity TestingAudits models for demographic bias, fairness gaps, and toxicity using accepted suites and reports impact in plain terms.
- Domain Benchmark DesignDesigns domain-specific benchmarks with held-out splits, contamination checks, and diverse failure-mode coverage.
- Factuality & Grounding EvaluationQuantifies hallucination rate and grounding fidelity for RAG and agent outputs using span-level annotators or reference-based metrics.
- LLM Evaluation HarnessesRuns evaluations using lm-evaluation-harness, Inspect, OpenAI Evals, or custom harnesses with reproducible task specs.
- LLM Regression Testing in CIWires evaluation suites into CI gates with golden-set tracking, drift alerts, and statistically valid regression thresholds.
- LLM-as-Judge EvaluationDesigns LLM-judge rubrics with calibration, debiasing, and inter-judge agreement checks; knows when judges are unreliable.
- Red-Teaming & Jailbreak TestingGenerates adversarial prompts, tests jailbreak resistance, and reports findings with severity and reproduction steps.
- Async Python with asyncioWrites concurrent async/await code with asyncio, gather, semaphores, and async HTTP clients.
- Configuration & Secrets ManagementManages environment variables, .env files, and secrets safely with python-dotenv and decorators.
- File I/O, JSON & Exception HandlingReads and writes files, parses JSON, and handles errors with try/except and custom exceptions.
- Python Classes & DataclassesModels data with classes, dataclasses, methods, and inheritance for structured Python code.
- Python Core ProgrammingWrites Python programs using variables, control flow, functions, modules, and packages.
- Python Data Pipelines with PolarsBuilds data transformation pipelines with Polars, generators, and lazy evaluation for tabular data.
- Python Data Structures & ComprehensionsManipulates lists, tuples, dictionaries, and sets using slicing, iteration, and comprehensions.
- Python Testing with pytestWrites unit and integration tests with pytest fixtures, assertions, and mocking patterns.
- Type Hints & Pydantic ModelsBuilds typed Python data models with type hints, generics, protocols, and Pydantic validation.
- Continuous Batching & Inference ServingImplements continuous and dynamic batching for high-throughput LLM serving using vLLM, TGI, or comparable engines.
- GPU Kernel Programming BasicsReads and authors basic Triton or CUDA kernels for custom ops, understands occupancy and memory coalescing fundamentals.
- GPU Memory ManagementProfiles CUDA memory, sizes batches to fit available VRAM, handles OOM gracefully, and uses gradient checkpointing or offloading for memory-bound workloads.
- Inference Latency ProfilingProfiles p50/p95/p99 token-generation latency, isolates bottlenecks across tokenizer, attention, and decode phases, and reports actionable findings.
- KV Cache OptimizationTunes transformer decoder KV cache for throughput and memory; understands prefix caching, paged-attention, and cache eviction strategies.
- Model Distillation & PruningCompresses large models via knowledge distillation and structured/unstructured pruning while preserving target metrics.
- Model QuantizationApplies INT8/INT4/FP8 post-training quantization (GPTQ, AWQ, GGUF, bitsandbytes) and measures quality vs. throughput trade-offs.
- Model Serving FrameworksDeploys LLMs via vLLM, TGI, TensorRT-LLM, or SGLang with appropriate engine flags, schedulers, and runtime configuration.
- Multi-GPU Tensor & Pipeline ParallelismConfigures tensor-parallel and pipeline-parallel sharding across multiple GPUs to serve models that exceed single-GPU memory.
- Speculative DecodingImplements speculative-decoding strategies (draft models, Medusa, lookahead) to reduce decoder latency while preserving output distribution.
- Docker Containerization for LLM AppsWrites Dockerfiles with multi-stage builds, manages images, and runs containers with Compose.
- Helm & Kustomize PackagingPackages Kubernetes apps with Helm charts and manages environment overlays with Kustomize.
- K8s Health Probes & AutoscalingConfigures liveness, readiness, startup probes, HPA, and PodDisruptionBudgets for resilient services.
- Kubernetes Ingress, TLS & NetworkPolicyExposes services via Ingress with TLS termination and isolates traffic with NetworkPolicies.
- Kubernetes RBAC & TroubleshootingApplies RBAC, Pod Security Standards, and SecurityContext while debugging CrashLoopBackOff and OOMKilled pods.
- Kubernetes Workloads, Pods & ServicesDeploys pods, services, and Deployments to Kubernetes with rolling updates and DNS-based discovery.
- Sandboxed Agent Code ExecutionIsolates agent-generated code in containers with timeouts, cgroup resource limits, and input sanitization to prevent escape.
- Embeddings & Semantic SearchGenerates embeddings, computes cosine similarity, and builds semantic search over documents.
- LangChain & LCEL RunnablesComposes LangChain Runnables with LCEL pipe syntax, streaming, batching, and configurable runtime fields.
- LLM API IntegrationCalls OpenAI, Anthropic, and Gemini APIs with auth, error handling, and response parsing.
- LLM Cost & Resilience OptimizationTracks token costs, applies retry with exponential backoff, and tunes prompts for budget.
- LLM Function Calling & Tool UseDefines tool schemas in JSON Schema and orchestrates multi-turn function calling across providers.
- LLM Sampling & Structured OutputControls LLM outputs with temperature, top-p, stop sequences, JSON mode, and structured schemas.
- Multi-Provider Prompt EngineeringBuilds versioned prompts with Jinja2, few-shot examples, and chain-of-thought across providers.
- RAG Pipeline FundamentalsBuilds retrieval-augmented generation pipelines with chunking, retrieval, and citation.
- Transformer Architecture InternalsImplements scaled dot-product attention and reasons about KV-cache memory, FFN dimensions, and quantization tradeoffs to choose inference strategies.
- AI IAM & Secrets ManagementConfigures IAM, Workload Identity / IRSA, KMS, and short-lived credentials for AI workloads; rotates and audits secrets.
- Compliance Frameworks for AIMaps AI systems to SOC 2, ISO 27001, HIPAA, and EU AI Act controls; produces evidence and audit-ready documentation.
- Model Supply-Chain SecurityVerifies model provenance, signed weights, SBOMs, and dependency integrity for open-weight and hosted models.
- Output Filtering & Data-Loss PreventionBuilds output-side DLP for PII, secrets, and proprietary IP, with deterministic filters layered with model-based classifiers.
- Prompt Injection DefenseIdentifies direct and indirect prompt-injection vectors and implements input filtering, isolation, and least-privilege tool gating.
- Threat Modeling for AI SystemsApplies STRIDE / PASTA threat modeling to AI architectures including model, data, and agent-tool boundaries.
- Vulnerability Scanning for AI StacksRuns SCA, SAST, container, and model-asset scanning in CI; triages and remediates findings with appropriate severity gates.
- Zero-Trust Networking for AIEnforces network isolation, egress allowlists, mTLS, and zero-trust policies for AI inference and training workloads.
- API Authentication & AuthorizationImplements OAuth2, JWT, API keys, and role-based access control on FastAPI endpoints.
- API Gateway & RoutingBuilds reverse-proxy gateways with path routing, load balancing, and response aggregation.
- API ObservabilityInstruments APIs with Prometheus metrics, OpenTelemetry traces, structured logging, and Grafana dashboards.
- API Resilience PatternsApplies rate limiting, circuit breakers, retries with backoff, and bulkhead isolation to API services.
- API Testing & VersioningTests async endpoints with pytest and httpx, and manages API versions with deprecation strategies.
- Async Databases with SQLAlchemy & AlembicModels data with async SQLAlchemy ORM, manages migrations with Alembic, and applies the repository pattern.
- FastAPI REST API DevelopmentBuilds production REST APIs with FastAPI using Pydantic validation, dependency injection, and async handlers.
- Real-Time Streaming with SSE & WebSocketsStreams LLM responses with SSE and manages WebSocket connection lifecycles for real-time apps.
Cite this taxonomy
Released under CC-BY-4.0 — cite it, link it, build on it.
GenBodha. "The GenAI Engineering Role Taxonomy v1.0." genbodha.ai, 2026-06-14. https://genbodha.ai/taxonomy (CC-BY-4.0).