The Canterbury Tales are well known as some of the most important works of literature. Many high schoolers read the whole book or at least a few tales. In high school broad themes of the book were talked about, and students attempted to follow along after skimming the cliffnotes 10 minutes before class started. This project will allow an indepth look at one topic in particular, women. Often women are left out of prominent literature, in the case of the Canterbury Tales this is not true. Women are narrators of two of the tales and mentioned throughout almost all of the tales.

In the past analysis of the tales has been focused on close reading of the text. A reader would come up with a hypothesis about the text and then through close reading find specific examples that would either support or not support the text.  The approach I chose to use was to come in with little subjectivity. By using R distant reading can be achieved.

In one piece of writing the author focuses on how the women responded in the patriarchal society of the Canterbury Tales. During this time women were told to be submissive to their male counterparts, in the Canterbury Tales this is the case. The author supports this with an example from the Wife of Bath.  It is important to note that this was not out of character of the times, in the fourteenth century women were considered the property of either their husband or father. Some of the women in this tail defy what society was telling women to do, the wife in the bath’s tale is not submissive to her husband. In Lipton(2017), “Not only did the medieval legal system treat wives as inferior; there was a colorful genre of “anti-matrimonial” writing that advised men not to marry on the grounds that wives were intolerable.” This is an important note when discussing the roles of women in the Canterbury Tales

Materials and Methods

When importing the data into R, I used the gutenberger package and function. The Gutenberg online repository contains datafiles of various books in different formats. Using this import function allows one to clean the text easier because when the text is imported it is already in the format of a data frame. From this R allows you to explore many different aspects of text analysis. The areas I focused on were sentiment analysis, word frequencies. After searching for a good resource for my R analysis I found the book Tidy Text Mining with R. This book was able to provide me with code and context about the analysis’s I was running. 


The first aspect I will discuss is the overall frequency of certain words appearing in the text. The copy of the text I used was in the old english version so in order to search frequencies I had to familiarize myself with the language choices of Chaucer in his works, in order to do this I read a few of the tales and compiled a list of words that women could be referred to as. After this R generates a table with the frequencies that each word appears in the text. For instance the word woman shows up a total of 260 times throughout the entire Canterbury Tales. The word wife is used almost 100 times more, totaling 359 total mentions. Lady is used 113, queen 20, and dame 71 times. These counts prove that women were indeed visible in the stories of the Canterbury Tales. This is of course known, there has been plenty of scholarship written about women’s presence in the Canterbury Tale, up to this point though it has not been quantified.

I then did sentiment analysis using the nrc lexicon. This is a list of about 15 different sentiments that you can use to search the text. For instance the word dame, was associated with anger 50 times, and the word harlot was referred to as negative 5 times. The words virgin was used as truthful twice and spouse was used 9 times.


This graph shows graphical information of the bing lexicon. It shows what words were typically used in a negative or positive sentiment.



This shows graphical information of the nrc lexicon. Represented are the 10 words used for analysis of the lexicon.


This is a word map of the most commonly used words in the Canterbury Tales.

Good bad

This is a word map of the words most commonly used in either a negative or positive sentiment. This is analyzed using the Bing lexicon.


When graphing the sentiment analysis I spent time looking for words that would have been associated with women. All of the graphs besides the overall word cloud lack any words that can be directly linked to women. When researching the Canterbury Tales, the general consensus is that women are very well represented in the tale. In terms of sentiment this does not seem to be the case, either the words associated with women are not picked up by the lexicon I was using or they may just be absent all together. The goal of my project was to look directly at women’s representation in the Canterbury Tales. This proved more difficult that I thought, R does not seem to have a catch all package, or code that can simply search for words and their context.  I do know from my word frequency analysis that words associated with women are prevalent in the text.  I was not able to graph the information found from the word specific sentiment analysis I did, but since there is some evidence it should be noted.

After reading and doing sentiment analysis I realized some of the problems with this method. Saying the word dame was used in an “angry” sentiment 50 times is interesting but it lacks overall context. it leaves a question of whether the dame herself was angry, or if someone is angry with her. I would be interested in further research in this area.