Internet users today have access to an ocean of information via the Web ranging from news articles to blogs to textbooks. To consume this ever-growing pool of information, the need for tools to concisely and accurately summarize important content from these sources is urgent. As data-driven NLP tools for automatic summarization are increasingly deployed as primary mediums for information consumption, they also become ripe for misuse by inadvertently propagating factual inaccuracies and overfitting to spurious data artifacts. Our work aims to build transparent and reliable tools for automatic summarization. This talk presents novel work toward building such methodologies. I will first introduce StructSum, an interpretable summarization framework that leverages the narrative structure of the document for producing higher quality summaries with reduced reliance on dataset artifacts. I will then present FRANK, a fine-grained factuality-focused evaluation benchmark that uses a linguistically motivated typology to elicit human annotations on factual errors made by many state-of-art summarization models. We present a two-fold analysis: (i) we analyze various summarization models and present their strengths and weaknesses in producing factually consistent summaries and (ii) we test recent metrics proposed to evaluate factuality and present a fine-grained understanding of what kinds of errors they detect well. Finally, I will conclude with some thoughts on future directions of our work and the field of automatic summarization.