When does traditional statistics become machine learning?

When does traditional statistical modelling (TSM) become machine learning (ML)?[i] “Machine learning” has truly become a buzzword that is applied rather liberally to a wide range of modelling applications. But, the difference is far from a question of semantics: there are fundamental differences between ML and TSM that data practitioners should keep in mind.

Similarities

But, let’s start off with some commonalities between ML and TSM. In both disciplines our aim is to build a (statistical) model (to use TSM terminology) that minimises loss, that is, that achieves the smallest possible difference between observed values and the values estimated by the model. In so doing, we have to achieve a successful balance between model complexity and generalisability: pick too complex a model and you’ll achieve a great fit on the data that you used to develop it; but your predictive power will be limited on unseen data.

We have successfully achieved our goal when, depending on the application, our model is able to explain a significant proportion of the variation in our dependent variable, achieves a high classification accuracy, or correctly predicts the number of occurrences of a particular phenomenon.

Some of the techniques that we use to achieve our goal – of minimising loss – are similar between ML and TSM too. For example, both rely on a cost function (such as the mean squared error, MSE), and on a way of optimising that cost function.

Different philosophical foundations

Yet, in spite of these rather superficial similarities, there are important differences that make ML and TSM distinct disciplines. First, their purpose differs. ML is an algorithmic approach that is mostly interested in prediction. Conversely, TSM practitioners focus on the (statistical) significance of individual features (or rather: parameters). In the latter case, we are interested in explaining why x leads to y, rather than purely in constructing a model that predicts the occurrence of y as accurately as possible. Our conclusions about relationships between these variables, in turn, depend on our ability to construct an appropriate statistical model to represent our data.

ML and TSM have fundamentally different philosophical roots. As a branch of artificial intelligence[ii] and a sub-field of computer science, ML’s main purpose is to “learn” from data, that is, to optimise the weights on features (or: parameters), and to apply that information to generate new predictions. Its approach is inductive: we let the data tell give us the answers, and do not rely on ex ante expectations of how feature x may affect phenomenon y. This requires fewer distributional assumptions about our predictor variables, and fewer assumptions about which features are most predictive of our outcome of interest. (Note however that this is not to say that ML is not interested in selecting appropriate features to include! In fact, this is a crucial part of any successful ML application).

By contrast, traditional statistical modelling attempts to use mathematical formulas to formalise the relationship between two or more variables. As such, it is a subfield of mathematics, and is usually applied in a deductive context, that is, a research design where hypotheses are formulated and subsequently put to the test.

Different mechanics

There also are important differences in terms of what we might call the “mechanics” of ML and TSM respectively. Let’s again take the example of linear regression. In linear regression, we aim to find the line (in a bivariate set-up) or the plane in a multidimensional space (in the multivariate variant) that minimises the sum of squared errors. Essentially, we draw the best line of fit through a number of observations, as illustrated in the figure below.

<br /> <html><br /> <head><meta charset="utf-8"></p> <style> <p> .dot{ fill: #209AED;</p> <p> }</p> <p> .dot:hover{ fill: lightblue; }</p> <p> .axis path, .axis line { fill: none; stroke: #000; shape-rendering: crispEdges; }</p> <p> .line{ stroke:red; stroke-width:3px; }</p> <p> g.dot.highlight text { opacity: 1; }</p> <p> .axis { font: 16px sans-serif; }</p> <p> .d3-tip { line-height: 1; font-weight: bold; padding: 12px; background: rgba(0, 0, 0, 0.8); color: #fff; border-radius: 2px; font: 15px sans-serif; }</p> <p>/* Creates a small triangle extender for the tooltip */ .d3-tip:after { box-sizing: border-box; display: inline; font-size: 10px; width: 100%; line-height: 1; color: rgba(0, 0, 0, 0.8); content: "\25BC"; position: absolute; text-align: center; }</p> <p>/* Style northward tooltips differently */ .d3-tip.n:after { margin: -1px 0 0 0; top: 100%; left: 0; }</p> </style> <p></head></p> <p><body></p> <p><script src="//cdnjs.cloudflare.com/ajax/libs/d3/3.5.6/d3.min.js"></script><br /> <script src="https://labratrevenge.com/d3-tip/javascripts/d3.tip.v0.6.3.js"></script></p> <div id="option"> <input name="updateButton" type="button" value="Generate new data" onclick="updateData()"> </div> <p><script type="text/javascript" src="js/OLSexample.js"></script><svg width="780" height="500"><g transform="translate(40,20)"><g class="x axis" transform="translate(0,440)"><g class="tick" transform="translate(28,0)" style="opacity: 1;"><line y2="6" x2="0"></line><text dy=".71em" y="9" x="0" style="text-anchor: middle;">1</text></g><g class="tick" transform="translate(63,0)" style="opacity: 1;"><line y2="6" x2="0"></line><text dy=".71em" y="9" x="0" style="text-anchor: middle;">2</text></g><g class="tick" transform="translate(98,0)" style="opacity: 1;"><line y2="6" x2="0"></line><text dy=".71em" y="9" x="0" style="text-anchor: middle;">3</text></g><g class="tick" transform="translate(133,0)" style="opacity: 1;"><line y2="6" x2="0"></line><text dy=".71em" y="9" x="0" style="text-anchor: middle;">4</text></g><g class="tick" transform="translate(168,0)" style="opacity: 1;"><line y2="6" x2="0"></line><text dy=".71em" y="9" x="0" style="text-anchor: middle;">5</text></g><g class="tick" transform="translate(203,0)" style="opacity: 1;"><line y2="6" x2="0"></line><text dy=".71em" y="9" x="0" style="text-anchor: middle;">6</text></g><g class="tick" transform="translate(238,0)" style="opacity: 1;"><line y2="6" x2="0"></line><text dy=".71em" y="9" x="0" style="text-anchor: middle;">7</text></g><g class="tick" transform="translate(273,0)" style="opacity: 1;"><line y2="6" x2="0"></line><text dy=".71em" y="9" x="0" style="text-anchor: middle;">8</text></g><g class="tick" transform="translate(308,0)" style="opacity: 1;"><line y2="6" x2="0"></line><text dy=".71em" y="9" x="0" style="text-anchor: middle;">9</text></g><g class="tick" transform="translate(343,0)" style="opacity: 1;"><line y2="6" x2="0"></line><text dy=".71em" y="9" x="0" style="text-anchor: middle;">10</text></g><g class="tick" transform="translate(378,0)" style="opacity: 1;"><line y2="6" x2="0"></line><text dy=".71em" y="9" x="0" style="text-anchor: middle;">11</text></g><g class="tick" transform="translate(413,0)" style="opacity: 1;"><line y2="6" x2="0"></line><text dy=".71em" y="9" x="0" style="text-anchor: middle;">12</text></g><g class="tick" transform="translate(448,0)" style="opacity: 1;"><line y2="6" x2="0"></line><text dy=".71em" y="9" x="0" style="text-anchor: middle;">13</text></g><g class="tick" transform="translate(483,0)" style="opacity: 1;"><line y2="6" x2="0"></line><text dy=".71em" y="9" x="0" style="text-anchor: middle;">14</text></g><g class="tick" transform="translate(518,0)" style="opacity: 1;"><line y2="6" x2="0"></line><text dy=".71em" y="9" x="0" style="text-anchor: middle;">15</text></g><g class="tick" transform="translate(553,0)" style="opacity: 1;"><line y2="6" x2="0"></line><text dy=".71em" y="9" x="0" style="text-anchor: middle;">16</text></g><g class="tick" transform="translate(588,0)" style="opacity: 1;"><line y2="6" x2="0"></line><text dy=".71em" y="9" x="0" style="text-anchor: middle;">17</text></g><g class="tick" transform="translate(623,0)" style="opacity: 1;"><line y2="6" x2="0"></line><text dy=".71em" y="9" x="0" style="text-anchor: middle;">18</text></g><g class="tick" transform="translate(658,0)" style="opacity: 1;"><line y2="6" x2="0"></line><text dy=".71em" y="9" x="0" style="text-anchor: middle;">19</text></g><g class="tick" transform="translate(693,0)" style="opacity: 1;"><line y2="6" x2="0"></line><text dy=".71em" y="9" x="0" style="text-anchor: middle;">20</text></g><path class="domain" d="M0,6V0H720V6"></path><text class="label" x="720" y="-6" style="text-anchor: end;">X</text></g><g class="y axis"><g class="tick" transform="translate(0,440)" style="opacity: 1;"><line x2="-6" y2="0"></line><text dy=".32em" x="-9" y="0" style="text-anchor: end;">0.0</text></g><g class="tick" transform="translate(0,391.1111111111111)" style="opacity: 1;"><line x2="-6" y2="0"></line><text dy=".32em" x="-9" y="0" style="text-anchor: end;">0.1</text></g><g class="tick" transform="translate(0,342.22222222222223)" style="opacity: 1;"><line x2="-6" y2="0"></line><text dy=".32em" x="-9" y="0" style="text-anchor: end;">0.2</text></g><g class="tick" transform="translate(0,293.33333333333337)" style="opacity: 1;"><line x2="-6" y2="0"></line><text dy=".32em" x="-9" y="0" style="text-anchor: end;">0.3</text></g><g class="tick" transform="translate(0,244.44444444444446)" style="opacity: 1;"><line x2="-6" y2="0"></line><text dy=".32em" x="-9" y="0" style="text-anchor: end;">0.4</text></g><g class="tick" transform="translate(0,195.55555555555554)" style="opacity: 1;"><line x2="-6" y2="0"></line><text dy=".32em" x="-9" y="0" style="text-anchor: end;">0.5</text></g><g class="tick" transform="translate(0,146.66666666666669)" style="opacity: 1;"><line x2="-6" y2="0"></line><text dy=".32em" x="-9" y="0" style="text-anchor: end;">0.6</text></g><g class="tick" transform="translate(0,97.77777777777783)" style="opacity: 1;"><line x2="-6" y2="0"></line><text dy=".32em" x="-9" y="0" style="text-anchor: end;">0.7</text></g><g class="tick" transform="translate(0,48.888888888888864)" style="opacity: 1;"><line x2="-6" y2="0"></line><text dy=".32em" x="-9" y="0" style="text-anchor: end;">0.8</text></g><g class="tick" transform="translate(0,0)" style="opacity: 1;"><line x2="-6" y2="0"></line><text dy=".32em" x="-9" y="0" style="text-anchor: end;">0.9</text></g><path class="domain" d="M-6,0H0V440H-6"></path><text class="label" transform="rotate(-90)" y="6" dy=".71em" style="text-anchor: end;">Y<path></path></text></g><circle class="dot" opacity="0.5" r="10" cx="19" cy="319.1186772731412"></circle><circle class="dot" opacity="0.5" r="10" cx="54" cy="211.4557572989375"></circle><circle class="dot" opacity="0.5" r="10" cx="89" cy="172.1417239885215"></circle><circle class="dot" opacity="0.5" r="10" cx="124" cy="295.2959584076671"></circle><circle class="dot" opacity="0.5" r="10" cx="159" cy="230.59013039460993"></circle><circle class="dot" opacity="0.5" r="10" cx="194" cy="369.3862617529518"></circle><circle class="dot" opacity="0.5" r="10" cx="229" cy="134.5271994346017"></circle><circle class="dot" opacity="0.5" r="10" cx="264" cy="216.42466619258894"></circle><circle class="dot" opacity="0.5" r="10" cx="299" cy="196.14194519176542"></circle><circle class="dot" opacity="0.5" r="10" cx="334" cy="42.83272320171698"></circle><circle class="dot" opacity="0.5" r="10" cx="369" cy="199.60682852216743"></circle><circle class="dot" opacity="0.5" r="10" cx="404" cy="252.25187443417911"></circle><circle class="dot" opacity="0.5" r="10" cx="439" cy="207.5497343147782"></circle><circle class="dot" opacity="0.5" r="10" cx="474" cy="7.271834938537727"></circle><circle class="dot" opacity="0.5" r="10" cx="509" cy="228.55188372669778"></circle><circle class="dot" opacity="0.5" r="10" cx="544" cy="18.776792906892275"></circle><circle class="dot" opacity="0.5" r="10" cx="579" cy="362.75863231523454"></circle><circle class="dot" opacity="0.5" r="10" cx="614" cy="352.746586729868"></circle><circle class="dot" opacity="0.5" r="10" cx="649" cy="433.48562664458797"></circle><circle class="dot" opacity="0.5" r="10" cx="684" cy="267.4672800412351"></circle><path class="line" d="M19,210.40693087858102L54,212.0397914056287L89,213.67265193267636L124,215.3055124597241L159,216.93837298677178L194,218.57123351381946L229,220.20409404086718L264,221.83695456791486L299,223.46981509496254L334,225.10267562201022L369,226.7355361490579L404,228.3683966761056L439,230.00125720315327L474,231.63411773020098L509,233.26697825724867L544,234.89983878429635L579,236.5326993113441L614,238.16555983839174L649,239.79842036543943L684,241.4312808924871"></path></g></svg></p> <div class="d3-tip n" style="position: absolute; opacity: 0; pointer-events: none; top: 321.747px; left: 626.5px;"><strong>X:</strong> <span style="color:orange">18</span><br /><strong>Y:</strong> <span style="color:orange">0.18</span></div> <p></body></html><br />

In addition, a linear model in a machine learning setup would use a different optimisation algorithms such as gradient descent (GD). While TSM applications of linear regression can (and do) use GD, it becomes particularly relevant in an ML context. When we have large numbers of predictors (which usually is the case in ML problems, where we may have thousands of features), we need to use an optimiser that is computationally cheap enough to process large amounts of data. Gradient descent saves a lot of time on calculations compared to calculating parameters analytically (note that some optimisation problems may not even have a closed-form solution due to their complexity!). We can significantly reduce our computation time in ML by using GD in mini-batch form, or in particular, in the stochastic variant, where we sample from data and change the model parameters just a little bit after each sampling.

This brings me to yet another difference between ML and TSM: scale.

Different scale

A discussion of ML is not complete without mentioning “big data”. And indeed, a data scientist will not quickly grab for the ML toolbox unless they are dealing with masses of data, that are generated at great speed, and that are hugely diverse (i.e. unstructured) (for a discussion of the use of big data in political science, see my earlier blog post). When we are dealing with such “big data”, the standard TSM toolbox no longer applies: our data includes thousands of features, and we need an algorithmic approach to make sense of these masses of data.

The importance of assumptions

Finally, in TSM, our modelling strategy depends crucially on a set of assumptions. These include homoscedasticity of error terms, normality of our data-generating distributions, and linearity of functional dependencies. By contrast, ML can be seen as a “distribution-free” approach. We make few assumptions about the way the data is distributed and instead allow the training algorithm tell which models best approximate the data-generating process that underlies our data.[iii]

Implications

In sum, there are important differences between machine learning and traditional statistical modeling. And, your choice between either should really be informed by the data problem that your facing. For example, TSM is preferable by far if you have a relatively small dataset that consists of structured data, and if your purpose is to identify if and to what extent a (limited) set of variables affect a phenomenon of interest. In other words: use the TSM toolbox when you are interested in taking a deductiveapproach to your research problem, i.e. when you start from theory and hypotheses, and use empirical data to confirm or reject your theoretical propositions.

By contrast, ML is your go-to strategy if you are dealing with large volumes of unstructured data, and your goal is to predict the extent or occurrence of a phenomenon as accurately as possible. For example: use ML if you are interested in predicting buyer behaviour in a webshop, churn, or employee retention.

This article was originally published on the Official Inspera Blog.

Notes

[i] I use terms from either discipline rather liberally in this post. For example, whereas ML speaks of “weights”, TSM usually refers to “coefficients”. The same goes for “features” vs. “independent variables”. I also do not venture into a discussion of supervised vs. unsupervised learning, and/or deep learning.

[ii] In turn, a key difference between AI and ML is that the latter does not try to imitate “intelligent” behaviour. Rather, ML is intended to complement and outperform human tasks using the strengths of computers.

[iii] ML is of course not altogether assumption-free. For one, in ML we assume the training samples the we draw from our distribution are i.i.d. (independently and identically distributed).

Comments

comments

Tags:Machine Learning methods Statistics

Cookie	Duration	Description
_GRECAPTCHA	5 months 27 days	This cookie is set by Google. In addition to certain standard Google cookies, reCAPTCHA sets a necessary cookie (_GRECAPTCHA) when executed for the purpose of providing its risk analysis.
connect.sid	1 day	This cookie is used for authentication and for secure log-in. It registers the log-in information.
cookielawinfo-checbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other".
cookielawinfo-checkbox-advertisement	1 year	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Advertisement".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
_ga	2 years	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors.
_gat_gtag_UA_69029762_1	1 minute	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visited in an anonymous form.
_gid	1 day	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visted in an anonymous form.
VISITOR_INFO1_LIVE	5 months 27 days	This cookie is set by Youtube. Used to track the information of the embedded YouTube videos on a website.
YSC	session	This cookies is set by Youtube and is used to track the views of embedded videos.

Cookie	Duration	Description
_fbp	3 months	This cookie is set by Facebook to deliver advertisement when they are on Facebook or a digital platform powered by Facebook advertising after visiting this website.
fr	3 months	The cookie is set by Facebook to show relevant advertisments to the users and measure and improve the advertisements. The cookie also tracks the behavior of the user across the web on sites that have Facebook pixel or Facebook social plugin.
IDE	1 year 24 days	Used by Google DoubleClick and stores information about how the user uses the website and any other advertisement before visiting the website. This is used to present users with ads that are relevant to them according to the user profile.
NID	6 months	This cookie is used to a profile based on user's interest and display personalized ads to the users.
test_cookie	15 minutes	This cookie is set by doubleclick.net. The purpose of the cookie is to determine if the user's browser supports cookies.

Cookie	Duration	Description
CONSENT	16 years 8 months 26 days 14 hours	No description
lang		This cookie is used to store the language preferences of a user to serve up content in that stored language the next time user visit the website.
yt-remote-connected-devices	never	This cookie is set by Youtube and stores user video player preferences for embedded YouTube videos
yt-remote-device-id	never	This cookie is set by Youtube and stores user video player preferences for embedded YouTube videos

When does traditional statistics become machine learning?

Similarities

Different philosophical foundations

Different mechanics

Different scale

The importance of assumptions

Implications

Further reading and resources

Notes

Comments

Why pro-government militias spoil peace agreements

Who is Afraid of Supreme Court Justice Brett Kavanaugh?

Niels Goet

Do local election victories herald national gains in Britain? How to measure success

The Shorter, the Better? What it Takes to and What to Take Away from Publishing a Very Short Article

When does traditional statistics become machine learning?

When does traditional statistics become machine learning?

Similarities

Different philosophical foundations

Different mechanics

Different scale

The importance of assumptions

Implications

Further reading and resources

Notes

Comments

Why pro-government militias spoil peace agreements

Who is Afraid of Supreme Court Justice Brett Kavanaugh?

Niels Goet

Related Posts

Do local election victories herald national gains in Britain? How to measure success

The Shorter, the Better? What it Takes to and What to Take Away from Publishing a Very Short Article

When does traditional statistics become machine learning?