Actionable Directions for Reporting and Mitigating Language Model Harms

Abstract

Recent advances in the capacity of large language models to generate human-like text have resulted in their increased adoption in user-facing settings. In parallel, these improvements have prompted a heated discourse around the risks of societal harms they introduce, whether inadvertent or malicious. Several studies have identified potential causes of these harms and called for their mitigation via development of safer and fairer models. Going beyond enumerating the risks, I will present some of my recent work on high level directions for mitigating language model harms. First, I will present a survey of practical methods for addressing potential threats and societal harms from language generation models. This survey would serve as a practical guide for both LM researchers and practitioners with different mitigation strategies, their limitations, and open problems for future research. Second, I will present a framework for structured assessment and documentation of risks associated with an application of language models. This risk-centric framework permits the mapping of risks and harms to a specific model or its application scenario, ultimately contributing to a better, safer and shared understanding of the risk landscape. I will conclude with some of my recent research directions and highlight challenges that warrant attention from the community.

Date
Event
Center for Security and Emerging Technology, Georgetown University
Location
Remote
Avatar
Vidhisha Balachandran
Graduate Student at Language Technologies Institute