How do you test an agent’s behavior on ambiguous instructions?

Question

Accepted Answer

Testing an agent's behavior on ambiguous instructions requires a multi-faceted approach to uncover its interpretative capabilities and limitations. First, we design test cases featuring instructions that are deliberately vague, open to multiple reasonable interpretations, or contain conflicting elements. This involves creating scenarios where the agent could choose one valid interpretation over others, request clarification from the user, or attempt to infer user intent based on context. Such tests are critical for evaluating how the agent handles uncertainty and whether it defaults to a safe or reasonable action. Furthermore, human evaluators play a crucial role in assessing the quality and relevance of the agent's responses, especially when the "correct" answer is subjective, ensuring it aligns with expected human behavior. Ultimately, the goal is to identify if the agent consistently seeks clarification, makes logical assumptions, or fails gracefully rather than producing irrelevant or harmful outputs.