Words that matter: What text analysis can tell us about the third presidential debate

The final Presidential debate of 2016 was as heated as the previous two—well demonstrated by the following name-calling exchange:

CLINTON: ...[Putin would] rather have a puppet as president of the United States.
TRUMP: No puppet. No puppet.
CLINTON: And it's pretty clear...
TRUMP: You're the puppet!
CLINTON: It's pretty clear you won't admit ...
TRUMP: No, you're the puppet.

It is easy to form our opinions of the debate and on the differences between the Presidential candidates on excerpts like this and memorable one-liners. But are small extracts representative of the debate as a whole? Moreover, how can we objectively analyse what was said, who got to say the most, and how the candidates differed in their responses? One approach is to turn to basic text analysis, which is easily implemented in R (using a package such as “quanteda”).

To begin, all we need is a transcript of the debate, with each candidate’s answers saved in a different text (.txt) file. Once loaded into R as a corpus (i.e. an object that contains both our files), we have all we need to get started.

A simple summary of the texts tells us a few interesting things straight away.

	Words used	Unique words	Sentences
Clinton	8,190	1,493	446
Trump	8,077	1,167	620

We see that both candidates say roughly the same number of words (“Words Used”) but that Clinton uses a greater variety of words (“Unique Words”) in her responses. In fact she uses approximately 28 percent more unique terms than her opponent. Clinton also has longer sentences than Trump. This can be deduced from the fact that the candidates both use approximately the same number of words but Trump puts his words into 174 more sentences than Clinton. Based on this basic evidence, we can say that in the third debate, Trump preferred shorter sentences with more repeated words and Clinton longer sentences with more word variation.

Going further, we can see the similarity and differences between the vocabulary used by the two candidates creating a venn diagram (using the “venneuler” package in R) of the unique words used by each candidate and the words they have in common.

Roughly 28 percent of the words used in the debate are used by both candidates. Clinton, as mentioned above, and visible by her larger circle in the Venn diagram, uses more unique words than Trump.

To get a better feeling for what the candidates actually said, we can make a document-feature matrix (DFM). A DFM is a simple matrix that counts the number of times a word or set or words (e.g. every pair of 2 words or “bi-grams”) appears in each document. In order to make a DFM, extra punctuation is often removed, along with common “stop words” (e.g. the, a, an, by, etc.) and generally all words are changed into lowercase.

A simple way to visualise the results of a DFM is to plot them as a word cloud. First, Trump’s top words:

Now Clinton’s top words:

The size and darkness of the words corresponds to their prominence. For example, the most prominent word in both clouds is “people”. Trump said the word “people” 49 times and Clinton “37” times. However words out of context do not always mean much. We can instead plot pairs of words or “bi-grams” to get more of a sense of how the words fit together.

By looking at the bigrams the context behind the candidates’ words becomes clearer. We can easily pick out “planned parenthood”, “social security” and “women’s rights” in Clinton’s top bi-grams—all phrases that have very different meanings when their component words are paired together. In Trump’s word cloud, the contrasting terms “strong borders” and “open borders” are prominent, along with “make america” and “america great”. Interestingly, none of the word clouds include the word “puppet”.

We can go even further and make tri-grams or for that matter, any number of “n”-grams to get a better sense of the context in which these words were spoken. Going beyond increasing n-grams, text analysis should be paired with an understanding of what’s going on between the texts (over time and across space) and behind the words. As Trump quipped at a fundraiser on the 20th of October, “Michelle Obama gives a speech, and everyone loves it. It’s fantastic. They think she is absolutely great. My wife, Melania, gives the exact same speech and people get on her case.”

It was the context behind the re-use of Michelle Obama’s speech by Melania Trump that made that particular case of text-re-use controversial. But it is up to the savvy analyst to interpret their findings and to decide the substantive significance of the re-use.

Returning to the case of the Third Presidential Debate of 2016, text analysis techniques can be used to show the different choices the two candidates made in terms of the words, phrases, and sentences they used. These simple analyses can also be expanded to compare this latest debate with its predecessors to get a broader picture of the candidates’ approaches. Which strategies and talking points will pay off though? I’ll leave that to Nate Silver to predict.

This article was first published at the Oxford Q-Step Centre’s blog.

Comments

comments

Cookie	Duration	Description
_GRECAPTCHA	5 months 27 days	This cookie is set by Google. In addition to certain standard Google cookies, reCAPTCHA sets a necessary cookie (_GRECAPTCHA) when executed for the purpose of providing its risk analysis.
connect.sid	1 day	This cookie is used for authentication and for secure log-in. It registers the log-in information.
cookielawinfo-checbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other".
cookielawinfo-checkbox-advertisement	1 year	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Advertisement".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
_ga	2 years	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors.
_gat_gtag_UA_69029762_1	1 minute	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visited in an anonymous form.
_gid	1 day	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visted in an anonymous form.
VISITOR_INFO1_LIVE	5 months 27 days	This cookie is set by Youtube. Used to track the information of the embedded YouTube videos on a website.
YSC	session	This cookies is set by Youtube and is used to track the views of embedded videos.

Cookie	Duration	Description
_fbp	3 months	This cookie is set by Facebook to deliver advertisement when they are on Facebook or a digital platform powered by Facebook advertising after visiting this website.
fr	3 months	The cookie is set by Facebook to show relevant advertisments to the users and measure and improve the advertisements. The cookie also tracks the behavior of the user across the web on sites that have Facebook pixel or Facebook social plugin.
IDE	1 year 24 days	Used by Google DoubleClick and stores information about how the user uses the website and any other advertisement before visiting the website. This is used to present users with ads that are relevant to them according to the user profile.
NID	6 months	This cookie is used to a profile based on user's interest and display personalized ads to the users.
test_cookie	15 minutes	This cookie is set by doubleclick.net. The purpose of the cookie is to determine if the user's browser supports cookies.

Cookie	Duration	Description
CONSENT	16 years 8 months 26 days 14 hours	No description
lang		This cookie is used to store the language preferences of a user to serve up content in that stored language the next time user visit the website.
yt-remote-connected-devices	never	This cookie is set by Youtube and stores user video player preferences for embedded YouTube videos
yt-remote-device-id	never	This cookie is set by Youtube and stores user video player preferences for embedded YouTube videos

Words that matter: What text analysis can tell us about the third presidential debate

Comments

When Naming Cyber Threat Actors Does More Harm Than Good

“Repeal of the Corn Laws: Lessons for 2016?”

Claire Peacock

OxPol Blogcast. Women in Politics – In Conversation with Rachel Bernhard: Can Gender-Typical Appearance and Behaviour Help Candidates Win Office?

OxPol Blogcast Episode 3: Drug Legalisation Referendums

How Politicizing the Postal Service Got America in Trouble Before

War and the Ballot Box: What the Iraq War Tells Us About Military Escalation in an Election Year

Words that matter: What text analysis can tell us about the third presidential debate

Words that matter: What text analysis can tell us about the third presidential debate

Comments

When Naming Cyber Threat Actors Does More Harm Than Good

“Repeal of the Corn Laws: Lessons for 2016?”

Claire Peacock

Related Posts

OxPol Blogcast. Women in Politics – In Conversation with Rachel Bernhard: Can Gender-Typical Appearance and Behaviour Help Candidates Win Office?

OxPol Blogcast Episode 3: Drug Legalisation Referendums

How Politicizing the Postal Service Got America in Trouble Before

War and the Ballot Box: What the Iraq War Tells Us About Military Escalation in an Election Year

Words that matter: What text analysis can tell us about the third presidential debate