Principles of Generative AI Model Testing and Risk Management
Generative AI requires testing approaches that extend beyond traditional model evaluation, particularly for agentic AI systems—autonomous agents that perceive, reason, plan, act and reflect through multi-step tool-using trajectories. This session presents practical and structured methods for both model developers and validators to assess Large Language Models and GenAI systems. Key focus areas include use case segmentation for risk tiering, validation using established frameworks such as SR 11-7, assessment of conceptual soundness and interpretability, automated generation of test cases, evaluation using functional metrics including retrieval, hallucination, incompleteness and answer relevance, and safety metrics including toxicity, bias and privacy. The session also covers agentic AI model risk management and validation through behavioral-contract approaches that validate decision-making processes rather than solely output correctness, benchmarking, outcome analysis to identify model weaknesses and strategies for continuous model performance monitoring.
)
