Trumped-up vs. Clintonesque: what text analysis can teach us about the US elections

The 2016 United States presidential election—or in John Oliver’s most recent definition: ‘lice-on-a-rat-on-a-horse-corpse-on-fire-2016’—has reached its final leg. As a political scientist and a computational text analyst, I cannot resist sharing my two cents on an election that has certainly broken a model or two.

Following in the footsteps of two colleagues who recently produced two excellent articles (you can read them here and here), in this post I’d like to analyse a few examples of the exceptional language used in this elections cycle. Text analysis can help us understand two commonly held beliefs or facts (the distinction has become a bit blurred over the course of this year’s election cycle) about the US elections:

Donald Trump is running a negative campaign based on fear, whereas Hillary Clinton sells a positive message of hope (i.e. “make America great again” because “it is losing bigly”—or rather, “big league” if I’m to believe what the linguists have to say on this—versus Clinton’s message of “stronger together”);

As Politico’s Jack Shafer put it, Donald Trump speaks like a third grader, but his campaign manager Kellyanne Conway has managed to knock some sense—and perhaps political sophistication—into him since she was appointed in August 2016.

Can we “prove”—or rather “show”—these two basic statements with some simple text analysis tools?

There is reason to believe we can. (And also some reason to believe that we can’t; more on that later).

Campaign Speeches

We can look at a wide selection of campaign speeches by Clinton and Trump since their nominations at their respective conventions, including their convention speeches (for the Republican: Cleveland, Ohio, July 18-21; for the Democrat: Philadelphia, July 25-28, 2016). Transcripts of Hillary Clinton’s speeches are available here.

The “speeches section” on Mr Trump’s website only contains paraphrased summaries of the candidate’s speeches and videos, not the full transcripts themselves. The press releases section of “The Donald’s” website contains “remarks as prepared for delivery”. I decided to be generous and assume that these somewhat resemble what Trump actually said at his campaign rallies.

[As an aside, make sure to check out the “crooked Hillary question of the day” posts in the press releases section on Trump’s website; they’re hilarious. Other gems include: “Donald J. Trump’s history of empowering women”, “Hillary Clinton’s lips are moving… she must be lying”, and “Rudy Giuliani’s statement on Donald J. Trump’s debate victory”.]

So, the dataset consists of 25 speeches by Clinton, and 27 by Trump.

Now, one of the key assumptions of all (more accurately: most) text analysis models is the bag-of-words assumption. Simply put, this means that we can chop up sentences into individual terms, throw them into a virtual bag without them losing their meaning. We can take independent draws from this “bag”. In other words: any draw from the collection does not affect the probability of what word will be drawn next. (This is also known as the “naïve Bayes” assumption). This is an obviously wrong, but nevertheless useful, assumption of how text is generated.

The bag-of-words assumption bears an uncanny resemblance to the way Donald Trump’s speech writers go about their work. In fact, I’m pretty sure a Poisson process (with word and document parameters) comes pretty close to how Trump’s speeches are written, or at least, delivered.

The assumption outlined above means that when applying any text analysis model, we chop up our texts into words (or sometimes, into bi- or tri-grams) and, subsequently, run our analysis. All the codes for what follows below can be found on my GitHub page.

1. Does Trump run a more negative campaign than Clinton?

First up: does Trump really have a more negative campaign than Clinton? To investigate this, I use an unsupervised sentiment analysis algorithm.

Admittedly this is not the best approach. A better one would be to manually classify a subset of speeches (or in this case, individual paragraphs) as negative or positive, fit a model to this data, and subsequently apply the “trained” model to other data. This is the basic logic behind classifiers such as Naïve Bayes and Stochastic Gradient Descent—the same kinds of models used for spam filters on your computer. Such an approach gives the researcher more context-specific training data, and therefore more accurate results.

But in the interest of speed I will use the unsupervised variant.

I use a lexicon-based sentiment classifier, implemented using the Sentlex library in Python. And here it gets a bit wonky. Every word conveys “negative”, “neutral” or “positive” sentiment. This sentiment is on a continuous scale, and is based on the SentiWordNet 3.0 dictionary. This dictionary contains lemma-part of speech (PoS) pairs that share the same meaning called “synsets”. Each synset has a numerical score for the three categories (pos., neg., neutr.) on the interval [0.0, 1.0]. The sentiment scores are generated by a combination of a semi-supervised learning step, and a random-walk stage.

There are reasons to expect that the sentiment analysis will not perform too well, as one of the two candidates defies some of the assumptions that we might reasonably make about human speech and word usage. For example, Trump seems pretty keen to use the words “great”, “big”, “huge”, “incredible” and other hyperbole not only to describe his hands, but his “policies” too. As these words usually have a positive connotation we might actually see Trump come out as more positive than Clinton.

Fortunately, the model seems to perform well, and is in line with what we would expect. The graphs below show the ratio between the total positive and negative scores of the entire speech, as figure 1 below shows. It is clear that Trump is consistently more negative in his speeches than Clinton.

Figure 1 Sentiment in candidate speeches — **Figure 1** Sentiment in candidate speeches

2. Does Trump actually talk like a third-grader?

Now on to the second question: does Trump have less-than-Shakespearean linguistic ability?

Here, the analysis is more straightforward. Fortunately, the education literature gives us a range of off-the-shelf tools to measure linguistic complexity. And, Ken Benoit’s “Quanteda” package for R offers a bunch of mathematical solutions to implement these measures, as well as a range of other text analysis tools that are incredibly useful (it even includes a score based on scrabble!). I use the Flesch-Kincaid reading ease score (based on sentence length), and the Dale-Chall index (based on both sentence length and the percentage of difficult words used). The results for Clinton and Trump speeches are shown in figures 2 and 3 below.

Figure 2 Dale-Chall index score for Clinton and Trump speeches — **Figure 2** Dale-Chall index score for Clinton and Trump speeches

Figure 3 Flesch-Kincaid index score for Clinton and Trump speeches — **Figure 3** Flesch-Kincaid index score for Clinton and Trump speeches

The Flesch-Kincaid score and the Dale-Chall index are roughly equivalent to American school grade levels. Interestingly, we see that Trump uses slightly more complicated language; but, this result is bound to be driven by the fact that—in contrast to Clinton’s speech transcripts—the transcripts provided by the Trump campaign are not as-delivered, but rather as-prepared. In fact, the outlier for Trump (a 5.2 in figure 3) is the one transcript that is a record of what he actually said rather than a copy of the prepared speech.

(Note that this results hold regardless what measure I use. See p. 57 of the Quanteda manual for an overview of all measures available.)

This teaches us two things about the current campaign (neither of which is too surprising): i) both candidates try to cater to the electorate by using easy language; and ii) Trump should listen to his speech writers and try to stay on-script.

And, of course the good news for Trump is that he does not talk like a third-grader; rather, he has the vocabulary of a fifth-grader.

We don’t know what is going to happen on November 8. But we do know that this election cycle has generated a host of speech data that analysists can delve into for years to come. I’ve only shown how text analysis can help us see patterns that we might reasonably suspect are out there. Who knows what future research might reveal.

Comments

comments

Cookie	Duration	Description
_GRECAPTCHA	5 months 27 days	This cookie is set by Google. In addition to certain standard Google cookies, reCAPTCHA sets a necessary cookie (_GRECAPTCHA) when executed for the purpose of providing its risk analysis.
connect.sid	1 day	This cookie is used for authentication and for secure log-in. It registers the log-in information.
cookielawinfo-checbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other".
cookielawinfo-checkbox-advertisement	1 year	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Advertisement".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
_ga	2 years	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors.
_gat_gtag_UA_69029762_1	1 minute	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visited in an anonymous form.
_gid	1 day	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visted in an anonymous form.
VISITOR_INFO1_LIVE	5 months 27 days	This cookie is set by Youtube. Used to track the information of the embedded YouTube videos on a website.
YSC	session	This cookies is set by Youtube and is used to track the views of embedded videos.

Cookie	Duration	Description
_fbp	3 months	This cookie is set by Facebook to deliver advertisement when they are on Facebook or a digital platform powered by Facebook advertising after visiting this website.
fr	3 months	The cookie is set by Facebook to show relevant advertisments to the users and measure and improve the advertisements. The cookie also tracks the behavior of the user across the web on sites that have Facebook pixel or Facebook social plugin.
IDE	1 year 24 days	Used by Google DoubleClick and stores information about how the user uses the website and any other advertisement before visiting the website. This is used to present users with ads that are relevant to them according to the user profile.
NID	6 months	This cookie is used to a profile based on user's interest and display personalized ads to the users.
test_cookie	15 minutes	This cookie is set by doubleclick.net. The purpose of the cookie is to determine if the user's browser supports cookies.

Cookie	Duration	Description
CONSENT	16 years 8 months 26 days 14 hours	No description
lang		This cookie is used to store the language preferences of a user to serve up content in that stored language the next time user visit the website.
yt-remote-connected-devices	never	This cookie is set by Youtube and stores user video player preferences for embedded YouTube videos
yt-remote-device-id	never	This cookie is set by Youtube and stores user video player preferences for embedded YouTube videos

Trumped-up vs. Clintonesque: what text analysis can teach us about the US elections

Campaign Speeches

1. Does Trump run a more negative campaign than Clinton?

2. Does Trump actually talk like a third-grader?

Comments

Populism and democracy: Dr Jekyll and Mr Hyde?

Remembering the first African American elected to the US Senate — a Republican

Niels Goet

Inclusive Democracy: Labour Party’s Election Manifesto for Reducing the Voting Age to 16

Stalled Democracy: The Saga of Delayed Elections in Kurdistan

A Race Against Time: The Public Safety Crisis in Brazil

OxPol Blogcast. Women in Politics – In Conversation with Rachel Bernhard: Can Gender-Typical Appearance and Behaviour Help Candidates Win Office?

1 Comment

Trumped-up vs. Clintonesque: what text analysis can teach us about the US elections

Trumped-up vs. Clintonesque: what text analysis can teach us about the US elections

Campaign Speeches

1. Does Trump run a more negative campaign than Clinton?

2. Does Trump actually talk like a third-grader?

Comments

Populism and democracy: Dr Jekyll and Mr Hyde?

Remembering the first African American elected to the US Senate — a Republican

Niels Goet

Related Posts

Inclusive Democracy: Labour Party’s Election Manifesto for Reducing the Voting Age to 16

Stalled Democracy: The Saga of Delayed Elections in Kurdistan

A Race Against Time: The Public Safety Crisis in Brazil

OxPol Blogcast. Women in Politics – In Conversation with Rachel Bernhard: Can Gender-Typical Appearance and Behaviour Help Candidates Win Office?

1 Comment

Trumped-up vs. Clintonesque: what text analysis can teach us about the US elections