Explanation Feedback

LEXplain: Improving Model Explanations via Lexicon Supervision

Model explanations that shed light on the model{‘}s predictions are becoming a desired additional output of NLP models, alongside their predictions. Challenges in creating these explanations include making them trustworthy and faithful to the model{’}s predictions. In this work, we propose a novel framework for guiding model explanations by supervising them explicitly. To this end, our method, LEXplain, uses task-related lexicons to directly supervise model explanations. This approach consistently improves the model{‘}s explanations without sacrificing performance on the task, as we demonstrate on sentiment analysis and toxicity detection.