A new forecasting method for the Brexit referendum

Is it possible to have a more accurate prediction by asking people how confident they are that their preferred choice will win the day?

As the Brexit referendum date approaches, the uncertainty regarding its outcome is increasing. And, so are concerns about the precision of the polls. The forecasts are, once again, suggesting a very close result. Ever since the general election of May 2015, criticism against pollsters has been rampant. They have been accused of complacency, herding, of making sampling errors, and even of deliberate manipulation of their results.

The UK is hardly the only country where pollsters are swiftly losing their reputation. With the rise of online polls, proper sampling can be extremely difficult. Online polls are based on self-selection of the respondents, making them non-random and hence biased towards a particular voter group (the young, the better educated, the urban population, etc.). On the other hand, the potential sample for traditional telephone (live interview) polls is in sharp decline, making them less and less reliable. Telephone interviews are usually done during the day biasing the results towards stay-at-home moms, retirees, and the unemployed, while most people, for some reason, do not respond to mobile phone surveys as eagerly as they once did to landline surveys. With all this uncertainty it is hard to gauge which poll(ster) should we trust and to judge the quality of different prediction methods.

However, what if the answer to ‘what is the best prediction method’ lies in asking people not only who they will vote for, but also who they think will win (as ‘citizen forecasters’[1]), and more importantly, how they feel about who other people think will win? Sounds convoluted? It is actually quite simple.

There are a number of scientific methods out there that aim to uncover how people form opinions and make choices. Elections are just one of the many choices people make. When deciding who to vote for, people usually succumb to their standard ideological or otherwise embedded preferences. However, they also carry an internal signal which tells them how much chance their preferred choice has. In other words, they think about how other people will vote. This is why, as game theory teaches us, people tend to vote strategically and do not always pick their first choice, but opt for the second or third, only to prevent their least preferred option from winning.

When pollsters make surveys they are only interested in figuring out the present state of the people’s ideological preferences. They have no idea on why someone made the choice they made. And if the polling results are close, the standard saying is: “the undecided will decide the election”. What if we could figure out how the undecided will vote, even if we do not know their ideological preferences?

One such method, focused on uncovering how people think about elections, is the Bayesian Adjusted Facebook Survey, or BAFS for short. The BAFS method is first and foremost an Internet poll. It uses the social networks between friends on Facebook to conduct a survey among them. The survey asks the participants to express: 1) their vote preference (e.g. Leave or Remain); 2) how much do they think their preferred choice will get (in percentages); and 3) how likely they think other people will estimate that Leave or Remain will win the day.

Let’s clarify the logic behind this. Each individual holds some prior knowledge as to what he or she thinks the final outcome will be. This knowledge can be based on current polls, or drawn from the information held by their friends and people they find more informed about politics. Based on this it is possible to draw upon the wisdom of crowds where one searches for informed individuals thus bypassing the necessity of the representative sample.

However, what if the crowd is systematically biased? For example, many in the UK believed that the 2015 election would yield a hung parliament – even Murr’s (2016) citizen forecasters (although in relative terms the citizen forecaster model was the most precise). In other words, information from the polls is creating a distorted perception of reality which is returned back to the crowd biasing their internal perception. To overcome this, we need to see how much individuals within the crowd are diverging from the opinion polls, but also from their internal networks of friends.

Depending on how well they estimate the prediction possibilities of their preferred choices (compared to what the polls are saying), BAFS formulates their predictive power and gives a higher weight to the better predictors (e.g. if the polls are predicting a 52%-48% outcome, a person estimating that one choice will get, say, 80% is given an insignificant weight). Group predictions can be completely wrong of course, as closed groups tend to suffer from confirmation bias. On the aggregate however, there is a way to get the most out of people’s individual opinions, no matter how internally biased they are. The Internet makes all of them easily accessible for these kinds of experiments, even if the sampling is non-random.

References

[1] See Murr, A.E. (2016) “The wisdom of crowds: What do citizens forecast for the 2015 British General Election?” Electoral Studies 41 (2016) 283-288.

Declaration of financial interests: Given the experimental nature of the survey there are no immediate financial interests that Oraclum I.S. Ltd. stands to gain from it. The survey is financed by Oraclum I.S. and, at its current stage, is not profit-oriented.

Oraclum Intelligence Systems

On the upcoming Brexit referendum, a couple of scientists gathered under a start-up company called Oraclum Intelligence Systems will seek to test how this method operates using live electoral data. Still in its R&D phase, Oraclum will engage in experimental testing of several methodological approaches within the BAFS to pick out the best one. The survey itself will be kick-started on Facebook ten days prior to the referendum, on the 13th of June, and will run up until the very last day when the final forecast will be calculated. It will not have access to any data of its participants apart from the actual survey responses. For more details on how the survey will be conducted and its final presentation see here.

Benchmarks

During and after the referendum, the BAFS method will be compared to a series of benchmarks for accuracy. The point is to test whether or not BAFS is indeed the most suitable one for estimating people’s choices and hence making electoral predictions.

Here is a brief summary of the benchmark methods. For more detail on each of these, including how they are weighted and calculated, see here.

Adjusted polling average – combining all the UK pollsters given their sample size, type, timing of the poll, and past performance in predicting the outcomes of the last two general elections (2015 and 2010), and the Scottish independence referendum in 2014. In total we covered 480 polls from 15 different pollsters in the UK across the selected elections. Further details here.

Regular polling average – similar as above, except without any particular weighting. Only the polls done at least two months before the current date are being considered.

What UK Thinks Poll of polls – a poll averaging only the six most recent polls, done by a non-partisan website What UK Thinks, run by the NatCen Social Research agency.

Forecasting polls – polls based on asking the people to estimate how much one choice will get over another (see sample questions here, here, here, and here).

Prediction markets – joint weighted estimates from PredictIt, PredictWise, Betfair, Pivit, Hypermind, Ig, and iPredict. Prediction markets, instead of predicted percentages for each option, only produce the probabilities an outcome will occur.

Prediction models – based on weighted polling averages, such as the ones done by Nate Silver and FiveThrityEight. However, so far FiveThirtyEight has not done any predictions on the Brexit, so the choice has narrowed down to pure socio-economic models (without taking account of any polling data – so quite different from Silver) such as the one done by the political scientist Matt Qvortrup.

Superforcaster WoC – utilizing the wisdom of the superforecaster crowd from Phillip Tetlock’s Good Judgement Project (GJP). Superforecatsers are however only a subset of more than 5000 forecasters who participate in the GJP. Given that we cannot really calculate and average out the performance of the top predictors within that crowd, we have to take the collective consensus forecast.

So far, 20 days before the elections here is the rundown of the benchmark methods:

Method	Remain	Leave
Adjusted polling average*	50.44	47.76
Regular polling average*	50.81	47.65
Poll of polls	51	49
Prediction models	53.9	46.1
Mean	51.54	47.63

Note: updated as of 2^nd June 2016.

The following table expresses it in terms of probabilities:

Method	Remain	Leave
Adjusted polling average	66.27	33.73
Regular polling average	66.36	33.64
Forecasting polls*	69.2	30.8
Prediction markets	72.77	27.23
Superforecaster WoC	80	20
Mean	70.92	29.08

Note: updated as of 2^nd June 2016.

* For the adjusted polling average, the regular polling average, and for the forecasting polls the undecided voters have been factored in as well.

In conclusion, the polls are currently suggesting a narrow victory for Remain. This changes on a daily basis however, sensitive to all sorts of external nudges. Arguably the most important and most predictive polls will be the ones done in the last week before the referendum. The BAFS method will attempt to catch the evolution of voter sentiment on the final ten days prior to the referendum, based on which it will make its daily predictions, comparing them with each of the aforementioned benchmarks.

Comments

comments

Cookie	Duration	Description
_GRECAPTCHA	5 months 27 days	This cookie is set by Google. In addition to certain standard Google cookies, reCAPTCHA sets a necessary cookie (_GRECAPTCHA) when executed for the purpose of providing its risk analysis.
connect.sid	1 day	This cookie is used for authentication and for secure log-in. It registers the log-in information.
cookielawinfo-checbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other".
cookielawinfo-checkbox-advertisement	1 year	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Advertisement".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
_ga	2 years	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors.
_gat_gtag_UA_69029762_1	1 minute	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visited in an anonymous form.
_gid	1 day	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visted in an anonymous form.
VISITOR_INFO1_LIVE	5 months 27 days	This cookie is set by Youtube. Used to track the information of the embedded YouTube videos on a website.
YSC	session	This cookies is set by Youtube and is used to track the views of embedded videos.

Cookie	Duration	Description
_fbp	3 months	This cookie is set by Facebook to deliver advertisement when they are on Facebook or a digital platform powered by Facebook advertising after visiting this website.
fr	3 months	The cookie is set by Facebook to show relevant advertisments to the users and measure and improve the advertisements. The cookie also tracks the behavior of the user across the web on sites that have Facebook pixel or Facebook social plugin.
IDE	1 year 24 days	Used by Google DoubleClick and stores information about how the user uses the website and any other advertisement before visiting the website. This is used to present users with ads that are relevant to them according to the user profile.
NID	6 months	This cookie is used to a profile based on user's interest and display personalized ads to the users.
test_cookie	15 minutes	This cookie is set by doubleclick.net. The purpose of the cookie is to determine if the user's browser supports cookies.

Cookie	Duration	Description
CONSENT	16 years 8 months 26 days 14 hours	No description
lang		This cookie is used to store the language preferences of a user to serve up content in that stored language the next time user visit the website.
yt-remote-connected-devices	never	This cookie is set by Youtube and stores user video player preferences for embedded YouTube videos
yt-remote-device-id	never	This cookie is set by Youtube and stores user video player preferences for embedded YouTube videos