Explanation for Whom? Hospitable Interpretability for Machine Learning
A feature-attribution plot can be faithful and still answer the wrong question. Interpretability needs hospitable records that mark which why-question each artifact answers and keep divergent readings attached to the same decision.
A developer and a regulator can read the same feature-attribution plot and still disagree about whether anything has been explained. The properties by which interpretability evaluates its artifacts — faithfulness, sparsity, stability — describe how an artifact relates to a model. They say less about its relation to the question someone has come to ask.
The deductive-nomological account of explanation exposed the same problem: explanatory force depends not only on the adequacy of an answer, but on the why-question to which it is addressed. For systems making consequential decisions, this shifts attention away from the search for a single better artifact, or for separate artifacts tailored to separate audiences, and toward hospitality: a property of the record in which several legitimate questions about a case each receive a marked answer drawn from the same decision, with the places they diverge kept visible rather than smoothed into one verdict.
A developer reviewing a loan model opens the explanation attached to a denial. The feature-attribution plot is reassuring. Income, debt load, recent delinquencies, employment history — the weights are roughly what she would have predicted, and nothing spurious is happening. She closes the page knowing more than she did when she opened it.
A regulator reads the same plot and learns almost nothing she came to learn. The graph shows which features the model relied on. It does not show whether the institution was entitled to rely on them, whether one of them is acting as a proxy for race, whether this case should have been routed to a human, or whether the thresholding policy is the real object of scrutiny. She has read the plot correctly, but it has answered a different question than hers.
It is tempting to call this a problem of the plot. The plot, after all, is what the two of them looked at. Papers on interpretability assess artifacts of this kind by properties like faithfulness, sparsity, stability, and local accuracy, and these properties are not empty. A local surrogate that misrepresents the model is a bad surrogate. An attribution that flips under irrelevant perturbations is hard to trust. But each property describes a relation among the model, an input, a perturbation class, and an artifact. None describes the reader, or the question she brought. So none can settle whether the artifact has explained anything to anyone in particular.
Philosophers of science met a version of this difficulty before, and it broke one of their most influential theories of explanation. The deductive-nomological account treated explanation as the derivation of a fact from laws and antecedent conditions (Hempel and Oppenheim 1948). The picture was tidy: if the derivation was sound and the premises true, the explanation was in hand. But a sound derivation can leave its addressee cold, because explanation is not a relation between sentences alone. What was being asked, and what would count as a relevant answer, are part of what an explanation is. Interpretability research has reproduced the older problem in a new setting. Its artifacts can be technically well-formed and still answer a question other than the one in front of them.
Why-Questions
Van Fraassen’s account is useful because it changes the unit of analysis. The basic relation is not between an explanans and an explanandum, but between a why-question and the answer offered to it (van Fraassen 1980). A why-question has three working parts: a topic, a contrast class, and a relevance relation. The topic is what the question is about. The contrast class is the set of alternatives against which the topic is being considered. The relevance relation determines what kind of consideration could count as bearing on the answer.
Consider “Why did the model deny this applicant?” English makes this look like a single question. It is not. The contrast might be denial rather than approval; denial rather than a smaller loan; automatic denial rather than escalation to a human; this institution’s decision rather than the decision a different lawful policy would have produced. The relevance relation could be model-internal feature dependence, feasible recourse, compliance with lending rules, the provenance of the training data, or the decision to automate cases of this kind in the first place. A good answer under one pairing is empty under another. When a plot is said to “explain the denial,” a pairing has already been chosen, silently, and the choice has been made on someone’s behalf.
The word these debates reach for at this point is context, and the word is too soft to bear the weight being placed on it. Context is what a formal vocabulary calls in when it has run out of vocabulary of its own; it gestures at what it cannot describe and is taken to have described it. Van Fraassen’s apparatus is sharper because it says exactly what has to be specified: what is being asked about, which alternatives make the question live, and what kind of consideration the questioner is entitled to count.
Once those are explicit, the loan case looks different. The plot was built for one why-question: why this output rather than nearby outputs, given how the model depends on its inputs. That was the developer’s question precisely. The regulator’s question had a different topic, a different contrast class, and a different relevance relation. She was asking about the institution, the other decisions the rules permitted, and the authority to use proxies, thresholds, and automated routing. The artifact could only have settled one question.
Lipton makes the same point through contrast. Many explanatory requests have a fact-foil structure: to explain why a fact obtains is often to explain why it obtains rather than some salient alternative (Lipton 2004). Jones’s syphilis explains why Jones rather than Smith developed paresis, if Smith did not have syphilis. It does not explain why Jones rather than Doe, if Doe also had syphilis and only Jones left his untreated. The medical facts are the same in both cases. The explanatory work being asked of them is not.
Machine-learning explanations have this structure whether or not their producers acknowledge it. LIME advertises an explanation of any classifier “in an interpretable and faithful manner,” and the warrant for the advertisement is a locally faithful interpretable surrogate (Ribeiro, Singh, and Guestrin 2016). SHAP offers a unified framework for additive attribution and proves a uniqueness result over local accuracy, missingness, and consistency (Lundberg and Lee 2017). I do not deny the value of either result. A misleading attribution is worse than no attribution, and getting these properties right is real work the field has done well. The point is that both contributions fix the contrast class to nearby model outputs and the relevance relation to model-internal dependence, and do so silently. The artifact then arrives with no tag for the question it was built to answer. A reader with a different question can take the artifact as a verdict it never offered.
Other interventions have pressed against the dominant view without quite displacing it. Zachary Lipton argued that interpretability was being used as though it named a single property, when in fact it covers several, and that claims of interpretability were more often asserted than earned (Lipton 2018). Miller brought pragmatic accounts of explanation directly into XAI, arguing that explanations are contrastive, selective, and social, and that the field had been ignoring decades of social-scientific evidence about how human explanation actually works (Miller 2019). Selbst and Barocas distinguish failing to describe a model’s rules from failing to give a satisfying account of why the rules are what they are; the second failure usually requires looking past the model (Selbst and Barocas 2018). Rudin argues, from another direction, that high-stakes decisions should use interpretable models rather than patch black boxes after the fact (Rudin 2019).
Adjacent work has also pressed harder on the choice of contrast and relevance without quite naming it. Wachter, Mittelstadt, and Russell argue that counterfactual explanations are well-suited to one specific question: what change to an applicant’s situation would have produced a different decision, and ground that question in the legal frame of automated decision-making under the GDPR (Wachter, Mittelstadt, and Russell 2018). The contrast is denial rather than approval. The relevance relation is feasible recourse. Both are fixed deliberately and visibly. Barocas, Selbst, and Raghavan go further: even feature-highlighting explanations of this kind rest on hidden assumptions about which feature changes map to real-world actions, which features can be made commensurable, and which considerations are taken to be relevant at all (Barocas, Selbst, and Raghavan 2020). The choice of contrast and relevance is rarely as clean as the artifact suggests, and someone is making the choice whether or not the artifact acknowledges it. These two papers are pieces of the same observation: an artifact does not select its own question, and pretending that it does transfers authority over the question to whoever produces the artifact.
Miller’s intervention was the right one, and the position I want to defend here is downstream of it. But the slogan that XAI should be pragmatic understates what is at stake. Taken on its own, pragmatic pluralism is satisfied by giving the developer one artifact, the regulator another, and the affected person a third, each tuned in isolation to its audience. The arrangement has obvious appeal: nobody is asked to read what she cannot use, and each reader receives something fitted to her concerns. It is also, if we let it run, fragmentation in a more considerate register. The missing element is not a user-modeling step grafted onto the existing pipeline. It is the questioner herself, and a structure in which her question has a recognized place alongside other people’s.
Even an inherently interpretable model is interpretable to someone, for some purpose, against some alternative. A sparse scoring rule may be transparent to the engineer checking implementation, unsatisfying to the person denied a benefit, and beside the point for a regulator asking whether the institution should have been allowed to encode that consideration in the first place. The implicit simplicity makes for an easier explanation, but not a complete one. It does not decide which question the explanation has to answer.
Hospitality
What would have prevented the developer and the regulator from talking past each other? It cannot have been that each was given an artifact she found satisfying. That is just the fragmentation already described, with the further indignity that each party has been encouraged to think the case has been explained to her. Nor would it have been enough that each artifact be faithful in its own terms. Faithfulness is part of the problem when it allows an artifact to pose as a verdict on the whole case.
The minimum the system has to support is more pedestrian: a single record of the decision that both readers are reading. Without that, they are looking at separate institutional fictions of the same case, and any apparent agreement between them is only a coincidence of vocabulary.
A common record is necessary, but not sufficient. The feature-attribution plot misled the regulator because it arrived unmarked. It did not say what question it was fit to answer, what contrast it had been drawn against, or what relevance relation it presupposed. Held outside its explanatory setting, a faithful artifact can acquire an authority it has not earned. It looks as though it has explained the decision when in fact it has explained one aspect of the model’s behavior under one assumed contrast. The marking condition is what prevents this inflation. An artifact should not merely be attached to a decision. It should declare the why-question for which it is an answer.
Even this is not enough. Several marked artifacts can sit in the same record and still fail to show how their answers bear on one another. A feature attribution may be locally faithful while the institution’s exposure lies in its choice to threshold automatically rather than escalate borderline cases. A counterfactual may tell the applicant that a higher income would have changed the outcome while leaving untouched the regulator’s question about proxies. These are not failures of any individual artifact. They are signs that the artifacts have not been coordinated. The system has answered several questions separately, but has not made visible where the answers diverge, where one leaves another open, or where the institution must choose between them.
Call the missing property hospitality. The closest analogy is from accessible design. An accessible interface supports different routes to the same underlying task, and none of them is the real route with the others as concessions. The screen-reader user and the sighted user need not encounter the same surface. They need access to the same underlying work. Hospitality is the analogue for explanation: a property not of any artifact, but of the record in which artifacts sit.
Stated as a condition: a record is hospitable to a set of legitimate questions when each question has a marked answer drawn from the same underlying decision, the answers share a common substrate that prevents them from describing different cases, and the places where the answers diverge are themselves visible rather than papered over into a single story.
Consider the loan case again. A hospitable record would include the model version, applicant information, score, threshold, decision rule, and institutional policy behind the denial. It might then contain a feature attribution marked as answering one question: why this score rather than nearby scores, given the model’s dependence on its inputs? It might contain a recourse explanation marked as answering another: why denial rather than approval, given changes available to this applicant? It might contain an escalation trace answering a third: why automatic denial rather than human review, given the institution’s thresholds and delegated authorities? And it might contain an audit note answering a fourth: whether the decision depended on variables or proxies the institution was not entitled to use. The result is four different answers to four different why-questions, coordinated in one record.
The important point is not that the record contains more information. More information can make a system less explanatory if it leaves the reader to infer what each artifact is for. The point is that each component has a declared scope. The attribution does not pretend to settle recourse. The recourse path does not pretend to settle legality. The escalation trace does not pretend to settle model dependence. If the applicant-facing recourse path says that a higher income would have changed the decision, while the audit note shows that the income variable is entangled with a suspect proxy, the record should not smooth this into a single reassuring narrative. It should preserve the tension as a feature of the case. The institution may still have to decide what follows, but the disagreement will be visible as a disagreement rather than hidden as a limitation of the reader.
This is why hospitality is not audience segmentation in better software. Segmentation gives the developer the real account, the applicant a simplified moral story, and the regulator a compliance summary detached from the model behavior that actually produced the decision. That arrangement is efficient, but it also ensures that no two readers quite read the same case. A hospitable record is different. The developer, the regulator, and the affected person may enter through different questions, but they are not assigned different realities. They read the same decision along different axes, with the axes marked and their points of divergence traceable.
Some components of such a record already exist in partial form. Model cards (Mitchell et al. 2019) and datasheets for datasets (Gebru et al. 2021) attach to models and data the kinds of provenance, intended-use, and population information that a faithful artifact alone cannot supply. Counterfactual recourse (Wachter, Mittelstadt, and Russell 2018) answers a specific applicant-facing question and is most useful when its scope is declared rather than assumed. Audit trails and policy documentation answer questions about institutional authority and the rules in play. A hospitable record is not a new explanatory artifact alongside these. It is a structure that holds existing components together, with each tagged by the question it answers, the contrast it presupposes, and the relevance relation it encodes.
None of this implies that every dissatisfaction is a failure of explanation. There are illegitimate questions, and there are questioners without standing. A competitor demanding access to a proprietary scoring procedure may have no claim on the institution at all. A person who asks explanation to deliver a remedy the institution has no authority to give is asking for something explanation cannot supply. But which questions count, and who has standing to ask them, is itself part of the design and governance of an explanatory system. It is not settled by the technical adequacy of an artifact.
A reader might object that I have only moved the artifact-style standard up one level. Faithfulness was a property of an artifact; hospitality looks like a property of a record. The objection is fair. A record can be hospitable to one set of readers and inhospitable to another, and reasonable people will disagree about which exclusions are tolerable. The point of hospitality is not to be a property that settles those disagreements. It is meant to locate them where they belong: in the question of which readings the system has been built to support, rather than in the technical adequacy of an artifact that was never designed to answer them.
The Authority to Finish
The artifact view quietly assigns the authority to finish explanation to the producer of the artifact. Once explanation is treated as a property that can be measured against the model, completion belongs to whoever can show that the artifact has the right technical virtues. The attribution is faithful, the surrogate is locally accurate, the counterfactual is valid; the explanatory work appears to be done. A reader who remains dissatisfied then becomes a residual problem. She can be treated as confused, insufficiently technical, or as asking for something outside the proper domain of explanation. This is a comfortable view for the producer of artifacts, since on it the question of whether an explanation has occurred is settled in his own office.
The position I have defended changes the location of that authority. Explanation is completed only relative to a why-question, and the relevant why-question is not chosen by the model builder alone. A developer may ask why this score rather than nearby scores was produced by the model. An applicant may ask why denial occurred rather than approval under feasible changes to her situation. A regulator may ask why automatic denial occurred rather than escalation to a human, or why this institutional action was permitted under the rules. Each question brings its own contrast class and relevance relation. Technical success with respect to one gives no general license to declare the others answered.
This point matters because many of the questions artifacts fail to answer are not confused versions of the questions they do answer. A person asking why she was denied rather than escalated is asking about institutional routing, threshold policy, and delegated authority. A regulator asking whether a decision depended on an impermissible proxy is asking about entitlement, not merely dependence. These questions may be inconvenient for the system as built. That inconvenience should be recorded as an explanatory limit of the system, not as a defect in the questioner.
This changes what I think we should ask of interpretability methods. LIME, SHAP, counterfactual recourse, interpretable models, audit trails, model cards, and datasheets answer different questions under different contrasts and relevance relations. Their internal adequacy still matters. A misleading attribution, infeasible recourse path, or incomplete audit trail remains a failure. But evaluation cannot stop at the artifact. We also have to ask whether the record marks the scope of each component, ties the components to the same underlying decision, and preserves the places where their answers pull apart.
The developer and the regulator I began with were both reading correctly. The plot made one aspect of the decision visible and left another untouched. A hospitable system would preserve that difference rather than force it into a single verdict about whether the plot was a good explanation. It would keep the readings attached to the same case, mark the question each artifact answers, and make the points of divergence traceable. Interpretability begins by holding open the difference between those readings, because that is where the authority to finish explanation is properly contested.
References
-
Timnit Gebru, Jamie Morgenstern, Briana Vecchione, Jennifer Wortman Vaughan, Hanna Wallach, Hal Daume III, and Kate Crawford. 2021. "Datasheets for Datasets." Communications of the ACM 64, 12, 86-92. cite
-
Carl G. Hempel and Paul Oppenheim. 1948. "Studies in the Logic of Explanation." Philosophy of Science 15, 2, 135-175. cite
-
Peter Lipton. 2004. Inference to the Best Explanation. 2nd ed. Routledge. cite
-
Zachary C. Lipton. 2018. "The Mythos of Model Interpretability." Communications of the ACM 61, 10, 36-43. cite
-
Scott M. Lundberg and Su-In Lee. 2017. "A Unified Approach to Interpreting Model Predictions." In Advances in Neural Information Processing Systems 30, 4765-4774. cite
-
Tim Miller. 2019. "Explanation in Artificial Intelligence: Insights from the Social Sciences." Artificial Intelligence 267, 1-38. cite
-
Margaret Mitchell, Simone Wu, Andrew Zaldivar, Parker Barnes, Lucy Vasserman, Ben Hutchinson, Elena Spitzer, Inioluwa Deborah Raji, and Timnit Gebru. 2019. "Model Cards for Model Reporting." In Proceedings of the Conference on Fairness, Accountability, and Transparency, 220-229. cite
-
Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. "'Why Should I Trust You?': Explaining the Predictions of Any Classifier." In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1135-1144. cite
-
Cynthia Rudin. 2019. "Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead." Nature Machine Intelligence 1, 5, 206-215. cite
-
Andrew D. Selbst and Solon Barocas. 2018. "The Intuitive Appeal of Explainable Machines." Fordham Law Review 87, 3, 1085-1139. cite
-
Bas C. van Fraassen. 1980. The Scientific Image. Oxford University Press. cite
-
Sandra Wachter, Brent Mittelstadt, and Chris Russell. 2018. "Counterfactual Explanations without Opening the Black Box: Automated Decisions and the GDPR." Harvard Journal of Law & Technology 31, 2, 841-887. cite 1, cite 2