In a recent colloquium, the speaker's abstract claimed they were using machine learning. During the talk, the only thing related to machine learning was that they perform linear regression on their data. After calculating the best-fit coefficients in 5D parameter space, they compared these coefficients in one system to the best-fit coefficients of other systems.

When is linear regression machine learning, as opposed to simply finding a best-fit line? (Was the researcher's abstract misleading?)

With all the attention machine learning has been garnering recently, it seems important to make such distinctions.

My question is like this one, except that that question asks for the definition of "linear regression", whereas mine asks when linear regression (which has a broad number of applications) may appropriately be called "machine learning".

Clarifications

I'm not asking when linear regression is the same as machine learning. As some have pointed out, a single algorithm does not constitute a field of study. I'm asking when it's correct to say that one is doing machine learning when the algorithm one is using is simply a linear regression.

All jokes aside (see comments), one of the reasons I ask this is because it is unethical to say that one is doing machine learning to add a few gold stars to your name if they aren't really doing machine learning. (Many scientists calculate some type of best-fit line for their work, but this does not mean that they are doing machine learning.) On the other hand, there are clearly situations when linear regression is being used as part of machine learning. I'm looking for experts to help me classify these situations. ;-)

share|improve this question
12  
Maybe you want to see the thread: "The Two Cultures: statistics vs. machine learning?". – usεr11852 2 days ago
49  
You should rename your regression as 'machine learning' whenever you want to double the fees on your rate card. – Sycorax 2 days ago
2  
There is a difference. Learning is a process. A best fit is an objective. See my answer below. Frankly, the words do not have the same meaning, although the can appear in the same context, like "birds fly", one can associate the two, but birds are not flight, and although flying is for the birds, it is for F-18 fighter jets as well. – Carl 2 days ago
10  
@Sycorax and deep learning when you want to quadruple – Franck Dernoncourt 2 days ago
6  
@FranckDernoncourt "I'm a data scientist using deep learning in big data environment to solve machine learning problems" sounds like a nice header for LinkedIn profile ;) – Tim yesterday

Answering your question with a question: what exactly is machine learning? Trevor Hastie, Robert Tibshirani and Jerome Friedman in The Elements of Statistical Learning, Kevin P. Murphy in Machine Learning A Probabilistic Perspective, Christopher Bishop in Pattern Recognition and Machine Learning and a number of other machine learning "bibles" mention linear regression as one of the machine learning "algorithms". Machine learning is partly a buzzword for applied statistics and the distinction between statistics and machine learning is often blurry.

share|improve this answer
1  
True but they are in large part siloed disciplines with large quantities of nonoverlapping literature, methods and algorithms. For instance, in today's world machine learning, data and computer science grads are way ahead of statistical applicants in terms of funding, grants and job opps, you name it. – DJohnson 2 days ago
3  
@DJohnson so it's applied statistics with new package, sold at higher price..? I do not think that the fact that it's trendy does not make it a buzzword. Bayesian statistics also have their own methods, journals, conferences, handbooks and applications that are partly non-overlapping with classical statistics - does it make it a discipline that is distinct to statistics? – Tim 2 days ago
1  
@DJohnson there are frequentist statisticians who have little knowledge of Bayesian statistics... ;) – Tim 2 days ago
1  
Yup. I neglected to caveat my observation about ML practitioners with the more general observation that siloed, narrowly focused practitioners are endemic to every field and profession, not just ML. It's a kind of occupational hazard -- read human failing -- that people grow blinders to information outside their immediate needs and interests. CV is no exception to this. – DJohnson 2 days ago
17  
(+1) I agree there is no clear distinction. To the extent I think of differences, I would typically think of ML as more concerned with the predictions, and statistics as more concerned with the parameter inference (e.g. experimental design for response surface modeling would not be typical in ML?). So in that sense, the OP example -- where the regression coefficients seem to be of most concern -- would be more "statistics-like" (?) – GeoMatt22 2 days ago

Linear regression is definitely an algorithm that can be used in machine learning. But, reductio ad absurdum: Anyone with a copy of Excel can fit a linear model.

Even restricting ourselves to linear models, there are a few more things to consider when discussing machine learning:

  • Machine learning on business problems may involve a lot more data. "Big data", if you want to use the buzzword. Cleaning and preparing the data may take more work than the actual modelling. And when the volume of data exceeds the capacity of a single machine to process it then the engineering challenges are as significant as the statistical challenges. (Rule of thumb: if it fits in main memory it's not big data).
  • Machine learning often involves many more explanatory variables (features) than traditional statistical models. Perhaps dozens, sometimes even hundreds of them, some of which will be categorical variables with many levels. When these features can potentially interact (e.g. in a cross effects model) the number of potential models to be fit grows rapidly.
  • The machine learning practitioner is usually less concerned with the significance of individual features, and more concerned with squeezing as much predictive power as possible out of a model, using whichever combination of features does that. (P-values are associated with explanation, not prediction.)
  • With a large number of features, and various ways of engineering those features, model selection by hand becomes infeasible. In my opinion, the real challenge in machine learning is the automated selection of features (feature engineering) and other aspects of model specification. With a linear model there are various ways of doing this, usually variants of brute force; including step-wise regression, back elimination etc, all of which again require significant computing power. (Second rule of thumb: if you are selecting features by hand, you are doing statistics, not machine learning).
  • When you automatically fit many models with many features, over-fitting is a serious potential issue. Dealing with this problem often involves some form of cross validation: i.e. yet more brute force computation!

The short answer, from my point of view, is that where machine learning deviates from traditional statistical modelling is in the application of brute force and numerical approaches to model selection, especially in domains with a large amount of data and a large number of explanatory variables, with a focus on predictive power, followed by more brute force for model validation.

share|improve this answer
    
I do like this distinction in general. However, is cross-validation ever used in "statistical" models or is this rarely needed as they are normally done by hand? Is feature engineering considered statistics then as it is done by hand? – josh 21 hours ago
    
@josh, Yes, it can be. But if you look at the cross validation tag, almost all the questions are about predictive modelling. – david25272 8 hours ago
    
@david25272 I'd be curious as to how you think of the bootstrap, .632+ bootstrap, and permutation tests -- I've always thought of them as more "applied statistics" than "machine learning" because of how they're motivated, but they're similarly "brute-force" to k-fold or leave-k-out cross-validation. I think L1 regularization can also be thought of as a type of feature selection within a statistical framework... – Patrick B. 6 hours ago
    
@Patrick stats.stackexchange.com/questions/18348 is a better answer on the uses of bootstapping for model validation than I could give. – david25272 5 hours ago

I think Mitchell's definition provides a helpful way to ground the discussion of machine learning, a sort of first principle. As reproduced on Wikipedia:

A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E.

This is helpful in a few ways. First, to your immediate question: Regression is machine learning when its task is to provide an estimated value from predictive features in some application. Its performance should improve, as measured by mean squared (or absolute, etc.) held out error, as it experiences more data.

Second, it helps delineate machine learning from related terms, and its use as a marketing buzzword. Contrast the task above with a standard, inferential regression, wherein an analyst interprets coefficients for significant relationships. Here the program returns a summary: coefficients, p-values, etc. The program cannot be said to improve this performance with experience; the task is elaborate calculation.

Finally, it helps unify machine learning sub fields, both those commonly used in introductory exposition (supervised, unsupervised) with others like reinforcement learning or density estimation. (Each has a task, performance measure and concept of experience, if you think on them enough.) It provides, I think, a richer definition that helps delineate the two fields without unnecessarily reducing either. As an example, "ML is for prediction, statistics for inference" ignores both machine learning techniques outside supervised learning, and statistical techniques that focus on prediction.

share|improve this answer

Common view is that machine learning made up of 4 areas:

1) Dimensionality Reduction

2) Clustering

3) Classification

4) Regression

Linear regression is a regression. Once the model is trained it could be used for predictions, like any other, say, Random Forest Regression.

share|improve this answer
    
There is actually a difference, although linear regression can be solved using machine learning. A common regression target is ordinary least squares, which means, that our target loss function, sum squared residuals, is to be minimized. Now, machine learning would simply refer to that method by which we minimize a loss function. – Carl 2 days ago
    
Thus conceptually, linear regression via gradient descent (learning) chooses better and better summed square residuals (loss function). The basic concepts are the same as those for much more advanced learning algorithms, such as neural networks. These algorithms simply replace the linear model with a much more complex model - and, correspondingly, a much more complex cost function.. – Carl 2 days ago
    
I am not the only person to use this exact same example, see here. On gradient descent for machine learning. – Carl 2 days ago
1  
So the answer to the OP question When is linear regression machine learning, as opposed to simply finding a best-fit line? When linear regression is performed using a definable element of machine learning, like gradient descent, it is then linear regression performed using machine learning. – Carl 2 days ago
5  
@Carl, the problem here that "machine learning" defined. To me if we can use a statistical model, and that model would have ability to predict it is machine learning. And it does not matter what approach was used to find the coefficients of the model. – Akavall 2 days ago

There's no law that says that a cabinet maker can't use a barrel maker's saw.

Machine learning and statistics are vague labels, and if well-defined there is a lot of overlap between statistics and machine learning. And this goes for methods of these two areas as well as (and separately) for people who label themselves with these two areas.

Linear regression is a very well defined mathematical procedure. I tend to associate it with the area of statistics and people who call themselves 'statisticians' and come out of academic programs with labels like 'statistics'. SVM (Support Vector Machines) is likewise a very well defined mathematical procedure that has some every similar inputs and outputs and solves similar problems. I tend to associate it however with the area of machine learning and people who call themselves computer scientists (or people who work in artificial intelligence or machine learning which tend to be considered part of computer science as a discipline).

But some statisticians might use SVM, and some AI people use logistic regression. (and just to be clear, it is more likely that a statistician or AI researcher would develop a method than actually put it to practical use).

Frankly, all the methods of machine learning I would put squarely inside the domain of statistics. Even such recent things like Deep Learning, RNNs, CNNs, LSTMs, CRFs, etc. Those are al predictive modelling methods usually labeled with 'machine learning; and rarely associated with statistics. But they are predictive models, with the allowance that they can be judged using statistical methods.

But, yes, I see and sometimes share your distaste for the misapplication of these words. Linear regression is such a fundamental part of things called statistics that it feels very strange and misleading to call its use 'machine learning'.

To illustrate, Logistic regression is identical mathematically to a Deep Learning network with no hidden nodes and the logistic function as the activation function for the single output node. I wouldn't call logistic regression a machine learning method, but it is certainly used in machine learning contexts.

It's like saying, when washing a window with water that you're using quantum chemistry. Well yeah sure that's not technically wrong but you're implying a lot more than what's needed.

share|improve this answer
2  
Logistic regression is also very similar, both practically and theoretically, to SVMs: web.stanford.edu/~hastie/Papers/svmtalk.pdf – Patrick B. 11 hours ago

I'll argue that the distinction between machine learning and statistical inference is clear. In short, machine learning = prediction of future observations; statistics = explanation.

Here is an example from my field of interest (medicine): when developing a drug, we search for gene(s) which best explain a disease state, with the goal of targeting it/them with the drug. We use statistis for that. In contrast, when developing diagnostic tests, for example predicting whether the drug will help a patient, the goal is strictly finding the best predictor of the future outcome, even if it comprises many genes and is too complicated to understand. We use machine learning for this purpose. There are multiple published examples [1], [2], [3], [4] showing that presence of the drug target is not a good predictor of the treatment outcome, hence the distinction.

Based on this, it is fair to say that one is doing machine learning when the goal is strictly predicting outcome of future/previously unseen observations. If the goal is understanding a particular phenomenon, then that is statistical inference, not machine learning. As others have pointed out, this is true regardless of the method involved.

To answer your question: in the specific research that you describe, the scientists were comparing the factor roles (weights) in different linear regression models, not comparing the model accuracies. Therefore, it is not accurate to call their inference machine learning.

[1] Messersmith WA, Ahnen DJ. Targeting EGFR in Colorectal Cancer. The New England Journal of Medicine; 2008; 359; 17.

[2] Pogue-Geile KL et al. Predicting Degree of Benefit From Adjuvant Trastuzumab in NSABP Trial B-31. J Natl Cancer Inst; 2013; 105:1782-1788.

[3] Pazdur R. FDA Approval for Vemurafenib. https://www.cancer.gov/about-cancer/treatment/drugs/fda-vemurafenib. Updated July 3, 2013.

[4] Ray T. Two ASCO Studies Show Challenge of Using MET Signaling as Predictive Marker in NSCLC Drug Trials. GenomeWeb, June 11, 2014.

share|improve this answer
5  
I agree that machine learning research has a much heavier emphasis on predictions over parameter estimation. But that's not a clear dividing line: statistics research is rich with predictive methods. – Cliff AB yesterday
    
So what about statisticians that made predictions before computers existed (or were widely available)? Were they applying paper-and-pencil machine learning?! – Tim 22 hours ago
    
I believe the distinction remains clear if the emphasis is put on predicting future observations. I will edit my answer accordingly. Thanks for your comment @Cliff AB. – ljubomir 20 hours ago
    
@Tim: very fine argument. I believe the answer is yes if they were focused on future observations, though I acknowledge in those (rare) cases the name statistical learning would be more appropriate. With the advent of computers, the term machine learning became more fashionable. The point is not the name, nor the use of computers; it is the clarity of purpose. In my view, it is almost impossible to successfully optimize both accurate prediction of previously unseen observations, and understanding of the phenomenon. Better to focus appropriately. – ljubomir 19 hours ago

Linear regression is a technique, while machine learning is a goal that can be achieved through different means and techniques.

So regression performance is measured by how close it fits an expected line/curve, while machine learning is measured by how good it can solve a certain problem, with whatever means necessary.

share|improve this answer

It can be useful to call linear regression machine learning because doing so generally implies a couple important things about how you went about solving your problem:

  1. You decided it wasn't necessary to check causal assumptions and prior theory behind your explanatory variables. It signals that your model was not intended to explain but to predict. This is perfectly reasonable in a lot of settings, for example, predicting email spam based on keywords. There isn't really a lot of literature on which words predict spam, and there are so many words it doesn't make sense to think through the theoretical significance of each word
  2. You didn't check for variable significance or use p-values but instead likely opted for a holdout set or cross validation to assess out-of-sample predictive performance. This can be perfectly valid if - back to the email spam example - if really all you care about is producing a model that effectively predicts spam, even if this comes at at the cost of including variables that might not pass traditional significance tests.

However, if your model is more intended to explain than predict, and you do rigorously check your model's theoretical causal assumptions, etc then yes, it is rather silly to call it machine learning.

share|improve this answer

It's the Machine, Stupid!

I am neither a statistician nor a Big Data(TM) expert. However, I would say that the essential distinction is that "machine learning" requires "a machine". In particular, it implies agency. The result will not be consumed leisurely by a human. Rather, the result will be the input to a closed cycle whereby an automated system improves its performance.

Closed System

This is very much in line with Sean Easter's answer, but I just want to emphasize that in commercial applications, a machine is looking at the results and acting on them. A classic example is the CineMatch algorithm which was the target of the Netflix Prize. A human could look at the output of CineMatch and learn interesting features about movie viewers. But that is not why it exists. The purpose of CineMatch is to provide a mechanism whereby Netflix servers can suggest movies to customers that they will enjoy. The output of the statistical model goes into the recommender service, which ultimately produces more input as customers rate movies, some of which were selected on the advice of CineMatch.

Open System

On the other hand, if a researcher uses an algorithm to produce statistical results which are displayed in a presentation to other humans, then that researcher is most decidedly not engaging in machine learning. This is, quite obviously to me, human learning. The analysis is performed by a machine, but it is not a machine that is doing the learning, per se. Now, it is "machine learning" to the extent that a human brain did not experience all of the sample inputs and derive the statistical results "biologically". But I would call it "statistics" because this is exactly what statisticians have been doing since the field was invented.

Conclusion

Thus, I would answer this question by asking: "Who consumes the results?" If the answer is: "humans", then it's "statistics". If the answer is: "software", then it's "machine learning." And when we say that "software consumes the results", we don't mean that it stores it somewhere for later retrieval. We mean that it performs behavior which is determined by the results in a closed loop.

share|improve this answer
4  
This is a reasonable point, but I think in practice ML models are often handed off to people to interpret & work with. – gung 10 hours ago

protected by Tim 9 hours ago

Thank you for your interest in this question. Because it has attracted low-quality or spam answers that had to be removed, posting an answer now requires 10 reputation on this site (the association bonus does not count).

Would you like to answer one of these unanswered questions instead?

Not the answer you're looking for? Browse other questions tagged or ask your own question.