GenAI Safety & Evaluation Engineering
Design automated LLM evaluation pipelines, red-team GenAI systems, build bias detection and fairness benchmarks, implement guardrails.
Verifiable skill graph
12 skill groups · each becomes a signed node on your graph.
Verifiable skill graph
12 skill groups · each becomes a signed node on your graph.
Every lab you pass signs a W3C Verifiable Credential on your public skill graph. Completing the labs in each group below mints one node on that graph — the badge you walk away with is a cryptographic record of what you can ship, not a completion certificate.
Share the URL on your résumé or with a hiring manager. They click; they see the discipline, the labs you passed, and the verification signature. No honor system, no broker.
LLM-as-judge rubric design, position-bias correction, calibration against human raters, scoring functions, judge reproducibility, multi-judge ensembling. The core automated-evaluation technique.
Automated eval pipelines: eval-driven CI/CD, quality gates that block deploys, regression suites, champion/challenger A/B, prompt-variant testing — plus eval-pipeline observability, tracing (Langfuse/OpenTelemetry) and cost governance for judge runs.
Golden-set curation, dataset versioning, prompt-variant generation, edge-case mining, stratified sampling — and creating standardized safety benchmarks: jailbreak/refusal taxonomies, harm-category coverage, threshold-gated safety suites.
Adversarial robustness evaluation: systematic red-team test-suite generation, jailbreak/injection probe batteries, automated red teaming, robustness scoring and attack-success-rate metrics, OWASP LLM Top 10 + MITRE ATLAS. The measured-verdict eval slice, not offensive exploitation.
Statistical fairness evaluation: demographic-parity and equalized-odds tests, disparate-impact ratios, subgroup and intersectional slicing, counterfactual-token swaps (BBQ/CrowS-Pairs/WinoBias), and fairness-benchmark construction. Distinct from factual grounding.
EU AI Act / NIST AI RMF / SOC2 / HIPAA / GDPR control mapping, governance-artifact generation (model cards, eval policies, audit trails, sign-off gates), AI risk classification and governance workflows. Implemented on top of the eval capabilities.
Factual-grounding evaluation: hallucination detectors, faithfulness/groundedness scoring, NLI-based entailment, attribution and citation precision against source context. Distinct from statistical fairness.
Step-level agent evaluation: trajectory scoring, tool-call accuracy, plan-quality assessment, human-in-the-loop review gates, golden-trajectory datasets.
Runtime safety enforcement: content moderation, guardrails, PII detection + redaction, toxicity classifiers, output sanitization, regulated-content classification. The inference-time defense layer (vs. building the eval suite in G3).
RAGAS + DeepEval + TruLens pipelines, retrieval relevance + faithfulness + answer-relevancy metrics, cross-model comparison harnesses and arena/pairwise model ranking.
Provider SDK integration in eval and safety code: judge models, multi-provider scoring, cross-model evaluation harnesses, multi-provider abstraction. Prerequisite plumbing.
Production-grade Python for eval tooling: async/await, Pydantic models for eval rubrics, typing, dataclasses, pytest harnesses, parametrized testing. Prerequisite.
What you'll ship in production
Core responsibilities this discipline prepares you for.
What you'll ship in production
Core responsibilities this discipline prepares you for.
- 1
Build automated evaluation pipelines
to continuously measure LLM output quality
- Design evaluation harnesses with RAGAS, DeepEval, and NeMo Evaluator SDK for multi-metric scoring
- Create evaluation datasets with ground-truth annotations and run cross-provider comparisons
- Wire CI gates that automatically block deployments when faithfulness or relevance scores degrade
- 2
Conduct red-team exercises
— probe LLMs for vulnerabilities
- Automate adversarial testing with Garak for prompt injection, jailbreak, and data extraction probes
- Run multi-turn adversarial campaigns with Meta GOAT and DeepTeam for agent vulnerability testing
- Execute red-team campaigns against realistic systems, discover vulnerabilities, and write actionable findings
- 3
Implement production guardrails
— content filters, PII detection, jailbreak prevention
- Configure NeMo Guardrails with Colang policy language, Llama Guard 4, and Prompt Guard 2
- Add Presidio for PII detection/redaction and Model Armor for Google-native content safety
- Layer multiple defenses, test against comprehensive attack suites, and quantify safety-vs-helpfulness tradeoffs
- 4
Design GenAI governance frameworks
aligned with regulations
- Map EU AI Act risk classification and implement NIST AI RMF control frameworks
- Build OWASP LLM Top 10 mitigation strategies mapped to technical controls
- Create governance artifacts, conduct risk assessments, and build automated audit trail pipelines
- 5
Evaluate GenAI agent behavior
— trajectory quality, tool selection accuracy
- Build trajectory scoring systems measuring tool selection accuracy and task completion quality
- Design human preference alignment tests and regression test suites for agent workflows
- Evaluate multi-step agent executions to identify failure modes and build targeted regression tests
- 6
Monitor bias, fairness, and hallucination rates
in production
- Detect bias across protected attributes using statistical fairness metrics and disparity analysis
- Measure hallucination rates through ground-truth comparison and citation verification
- Implement continuous bias scanning, hallucination detection, and alerting for metric drift
- 7
Build safety incident response processes
for deployed GenAI systems
- Design safety monitoring dashboards with severity-based alert routing and escalation paths
- Build incident triage workflows with containment procedures and post-incident reporting templates
- Simulate safety incidents end-to-end and practice the full detection-to-resolution workflow
- 8
Design LlamaFirewall policies
for agent safety
- Configure LlamaFirewall middleware for controlling agent tool access and output filtering rules
- Set up multi-agent safety boundaries with policy-based execution constraints
- Validate firewall policies against adversarial scenarios where agents attempt to bypass controls
Curriculum
7 courses · each builds on previous goals
Curriculum
7 courses · each builds on previous goals
11 goals unlocked for preview — click to read. Locked goals need a subscription.