Explainable Artificial Intelligence; Interpretable Machine Learning; Metafeatures; Comprehensibility; Global Explanations; Rule-Extraction; Classification; Big Behavioral Data; Textual Data;
Machine learning models built on behavioral and textual data can result in highly accurate prediction models, but are often very difficult to interpret. Linear models require investigating thousands of coefficients, while the opaqueness of nonlinear models makes things worse.Rule-extraction techniques have been proposed to combine the desired predictive accuracy of complex “black-box” models with global explainability. However, rule-extraction in the context of high-dimensional, sparse data, where many features are relevant to the predictions, can be challenging, as replacing the black-box model by many rules leaves the user again with an incomprehensible explanation.To address this problem, the authors develop and test a rule-extraction methodology based on higher-level, less-sparse “metafeatures”. The authors empirically validate the quality of the explanation rules in terms of fidelity, stability, and accuracy over a collection of data sets, and benchmark their performance against rules extracted using the fine-grained behavioral and textual features.A key finding of their analysis is that metafeatures-based explanations are better at mimicking the behavior of the black-box prediction model, as measured by the fidelity of explanations.