Kaleidoscope: Semantically-grounded, Context-specific ML Model Evaluation

Harini Suresh MIT CSAIL

Divya Shanmugam MIT CSAIL

Tiffany Chen MIT CSAIL

Annie Bryan MIT CSAIL

Alexander D'Amour Google Research

John V. Guttag MIT CSAIL

Arvind Satyanarayan MIT CSAIL

ACM Human Factors in Computing Systems (CHI), 2023

Kaleidoscope’s workflow consists of identifying meaningful examples, generalizing them into larger, diverse sets representing important concepts, and using these concepts to specify and test model behavior.


To ensure accountability and mitigate harm, it is critical that diverse stakeholders can interrogate black-box automated systems and find information that is understandable, relevant, and useful to them. In this paper, we eschew prior expertise- and role-based categorizations of interpretability stakeholders in favor of a more granular framework that decouples stakeholders’ knowledge from their interpretability needs. We characterize stakeholders by their formal, instrumental, and personal knowledge and how it manifests in the contexts of machine learning, the data domain, and the general milieu. We additionally distill a hierarchical typology of stakeholder needs that distinguishes higher-level domain goals from lower-level interpretability tasks. In assessing the descriptive, evaluative, and generative powers of our framework, we find our more nuanced treatment of stakeholders reveals gaps and opportunities in the interpretability literature, adds precision to the design and comparison of user studies, and facilitates a more reflexive approach to conducting this research.