This page may be out of date. Submit any pending changes before refreshing this page.
Hide this message.

How do you explain Machine Learning and Data Mining to non Computer Science people?

43 Answers
Pararth Shah
Pararth Shah, ML Enthusiast
Mango Shopping

Suppose you go shopping for mangoes one day. The vendor has laid out a cart full of mangoes. You can handpick the mangoes, the vendor will weigh them, and you pay according to a fixed Rs per Kg rate (typical story in India).


Obviously, you want to pick the sweetest, most ripe mangoes for yourself (since you are paying by weight and not by quality). How do you choose the mangoes?

You remember your grandmother saying that bright yellow mangoes are sweeter than pale yellow ones. So you make a simple rule: pick only from the bright yellow mangoes. You check the color of the mangoes, pick the bright yellow ones, pay up, and return home. Happy ending?

Not quite.

Life is complicated

Suppose you go home and taste the mangoes. Some of them are not sweet as you'd like. You are worried. Apparently, your grandmother's wisdom is insufficient. There is more to mangoes than just color.

After a lot of pondering (and tasting different types of mangoes), you conclude that the bigger, bright yellow mangoes are guaranteed to be sweet, while the smaller, bright yellow mangoes are sweet only half the time (i.e. if you buy 100 bright yellow mangoes, out of which 50 are big in size and 50 are small, then the 50 big mangoes will all be sweet, while out of the 50 small ones, on average only 25 mangoes will turn out to be sweet).

You are happy with your findings, and you keep them in mind the next time you go mango shopping. But next time at the market, you see that your favorite vendor has gone out of town. You decide to buy from a different vendor, who supplies mangoes grown from a different part of the country. Now, you realize that the rule which you had learnt (that big, bright yellow mangoes are the sweetest) is no longer applicable. You have to learn from scratch. You taste a mango of each kind from this vendor, and realize that the small, pale yellow ones are in fact the sweetest of all.

Now, a distant cousin visits you from another city. You decide to treat her with mangoes. But she mentions that she doesn't care about the sweetness of a mango, she only wants the most juicy ones. Once again, you run your experiments, tasting all kinds of mangoes, and realizing that the softer ones are more juicy.

Now, you move to a different part of the world. Here, mangoes taste surprisingly different from your home country. You realize that the green mangoes are in fact tastier than the yellow ones.

You marry someone who hates mangoes. She loves apples instead. You go apple shopping. Now, all your accumulated knowledge about mangoes is worthless. You have to learn everything about the correlation between the physical characteristics and the taste of apples, by the same method of experimentation. You do it, because you love her.

Enter computer programs

Now, imagine that all this while, you were writing a computer program to help you choose your mangoes (or apples). You would write rules of the following kind:

if (color is bright yellow and size is big and sold by favorite vendor): mango is sweet.
if (soft): mango is juicy.
etc.

You would use these rules to choose the mangoes. You could even send your younger brother with this list of rules to buy the mangoes, and you would be assured that he will pick only the mangoes of your choice.

But every time you make a new observation from your experiments, you have to manually modify the list of rules. You have to understand the intricate details of all the factors affecting the quality of mangoes. If the problem gets complicated enough, it can get really difficult to make accurate rules by hand that cover all possible types of mangoes. Your research could earn you a PhD in Mango Science (if there is one).

But not everyone has that kind of time.

Enter Machine Learning algorithms

ML algorithms are an evolution over normal algorithms. They make your programs "smarter", by allowing them to automatically learn from the data you provide.

You take a randomly selected specimen of mangoes from the market (training data), make a table of all the physical characteristics of each mango, like color, size, shape, grown in which part of the country, sold by which vendor, etc (features), along with the sweetness, juicyness, ripeness of that mango (output variables). You feed this data to the machine learning algorithm (classification/regression), and it learns a model of the correlation between an average mango's physical characteristics, and its quality.

Next time you go to the market, you measure the characteristics of the mangoes on sale (test data), and feed it to the ML algorithm. It will use the model computed earlier to predict which mangoes are sweet, ripe and/or juicy. The algorithm may internally use rules similar to the rules you manually wrote earlier (for eg, a decision tree), or it may use something more involved, but you don't need to worry about that, to a large extent.

Voila, you can now shop for mangoes with great confidence, without worrying about the details of how to choose the best mangoes. And what's more, you can make your algorithm improve over time (reinforcement learning), so that it will improve its accuracy as it reads more training data, and modifies itself when it makes a wrong prediction. But the best part is, you can use the same algorithm to train different models, one each for predicting the quality of apples, oranges, bananas, grapes, cherries and watermelons, and keep all your loved ones happy :)

And that, is Machine Learning for you. Tell me if it isn't cool.

Machine Learning: Making your algorithms smart, so that you don't need to be. ;)
Kevin Lin
Kevin Lin, works at Google
I'll talk about one aspect/technique of machine learning/data mining.

Let me begin with an admittedly contrived situation. Suppose there are a bunch of tiny balls that are magically floating in a room. (Bear with me...) We'd like to know whether there's any particular structure to the positions of the balls. For example, do the balls tend to cluster together in certain areas? Do the balls avoid certain spots? Are they evenly distributed everywhere?

However, the room is completely dark, so we can't see anything. But we do have a flash camera that allows us to take pictures of the floating balls in the room.

So we take a photo, and it looks like this:
From this photo we're not able to discern much structure, if there even is any, in the positions of the balls. The balls look more or less evenly distributed from this perspective. So we try moving laterally and taking another photo from that new vantage point.
The balls still look pretty much randomly distributed, with no particular patterns. Let's try taking a photo from a higher angle.

Hmm, still nothing notable here. Okay, let's try it one last time, lowering our perspective.
Ah-ha! We've just discovered something interesting: it looks like the balls are either located near the ground or near the ceiling of the room, and there are no balls that are located in between those two clusters. In order to discover this structure, we needed to take a photo of the room from a "good" angle. The structure could not have been discovered from the previous "bad" angles.

In the situation I've just described, we are looking at 3-dimensional data points --- the positions of our floating balls are described by a collection of 3 numbers (x coordinate, y coordinate, and z coordinate). But there are problems in which our data points are described by much larger collections of numbers. For example, a medical record for a hospital patient may consist of 500 numbers: date of birth, height, weight, blood pressure, date of last hospital visit, cholesterol, etc etc etc. We may be interested in figuring out whether these data points have any structure --- for example, are heart attack sufferers's data points clustered together in any way? If so, if in the future we identify a new hospital patient's data point as being close to that cluster, then we may label them as being at risk for a heart attack. (NB: In reality it probably wouldn't be so simple, of course.)

The data in this case is difficult or impossible for a human to visualize. How can we possibly visualize 500 dimensions? Just as we could not see anything in the contrived "dark room" example above, we similarly cannot "see" data points in 500 dimensions. In my previous example, we were taking 2-dimensional photographs of 3-dimensional data points --- and we can just as well take lower dimensional "photographs" of 500 dimensional data points in an analogous way.

So by taking these "photographs" from appropriate "angles", we can find structures and patterns in the data that could be difficult to find otherwise. This is an example of what people are talking about when they talk about the question of "finding insights" in "big data".

For the experts: I've attempted to describe/motivate principal component analysis for laypeople. The graphics above were made using matplotlib.
Analogy 1 : Growing up in the World

From the childhood you have been meeting, observing and interacting with people. Their behavior and impression on you gets stored in your brain. Your brain becomes a huge data center. You keep on adding more data as you meet new people. Soon you are able to guess how your experience will be with the next person you meet. The person smiles well, wears spectacles and has short hair. You become friendly with him because other smiling people who wear specs are good to you. Then a big 6' man with a beard and broken tooth comes and you run away as a kid. This is all part of Data Mining within your brain.

As you grow up, you realize that spectacles, beards and size are not the only things that can tell you what people are like. You begin to see their position in society and their behavior in new situations. So the relevant attributes may change. Your algorithms improve by themselves. This is machine learning.

Analogy 2 : Belief in Astrology

This is all my supposition and I'm working to verify my belief. Your birth date has a sum that psychics, astrologers, numerologists call a Birth Number. Hundreds of years ago, they'd have noticed that there are patterns in a person's personality and his birth number. For example, people with birth number X are good in making up weird yet interesting analogies (like this one). People with birth number X are bad in relationships. As they met more people with birth number X who were in poor relationships, it added to the "support" and "confidence" of their data. Then, after meeting a person with birth number X, happily married for 20 years to a person with a certain birth number Y, they made an adaptation to their rules of prediction. This kept increasing till the point when they were able to predict the personality types up to 99% of times.

So this is again a combination of Data Mining and Machine Learning

P.s. I still don't believe in astrology.

Analogy 3 : Business Management
<a deeper explanation>

You collect lot of data from the processes of your large retail store. Whenever someone makes a purchase, the computer at the billing counter adds a record to your database. I'm a regular visitor at your store and today I brought an expensive pen and a writing pad. Two records are created:
<Aditya brought Item # 12220 (Pen) today as a part of transaction # 222333>
<Aditya brought Item # 12243 (Pad) today as a part of transaction # 222333>

Now the first part is organizing the data. You will see if there are any records like
<Aditya brought Item # 000000 (nothing) today as a part of transaction # 222333>
You will get rid of them. This is called CLEANING.

The records in the database will be stored in a Data Warehouse, which is a large database arranged in a fashion that will simplify the process of finding good results.

Now you perform CLUSTERING.
You will find out the transactions/the people/ the products which are related to a group. For example, the people who buy stationary from your store will be in one cluster. You can use this information to see cool information like People who buy expensive pads buy cheap pens, and don't buy anything else which a regular housewife buys.

Then comes CLASSIFICATION
If a person who is 23 years old, doesn't have a job and doesn't earn much comes to your store to look for a new computer, he's not going to buy it.

ASSOCIATION MINING
A person who brought an expensive $100 pen mostly purchases a few novels too. You can use this information to modify your store so that the pens section is far apart from novels section. The person will have to travel a long way over multiple aisles which may cause them to buy another thing on the way.

REGRESSION
When you had put up a Christmas sale with 20% discount on all products for kids, you had an surplus profit of $50,000 in the last year. Depending on the products and their quantities/costs available right now, this year you can earn up to $60,000 of profit even with a discount offer of 30%.

There's lots of such stuff. But by now anyone new to Data Mining will have seen enough to be interested in learning how this all works.
Shehroz Khan
Shehroz Khan, ML Researcher, Postdoc @U of Toronto
Lets be short, sweet and non-bookish.

Machine Learning (ML), as the name suggests imply to enable machines/computers to learn (over time) to take decisions. What are those decisions, and how are they taken, forms the core of ML and there is lot of fancy maths to support it.

In general, ML is used to perform three types of actions, viz. classification, clustering and prediction. In classification, we normally have abundant 'labelled' data and the scope of our decision space is known a priori. For e.g. if a classifier is exposed to only labelled pictures of cats and dogs and if a new image is shown, it will classify it as one of the category, even if it is a cow. However, it can be useful in many situations where the domain of decisions is pre-set, for e.g. credit card fraud detection - you only want to know if a transaction is legitimate or fraudulent.

Clustering deals with cases when you have data but no information about it. Imagine you are shown images but they are not labelled...so it is for the classifier (or you) to form certain clusters or groups based on some notion of similarity.  for e.g. among a set of pictures you may build clusters based on if they are animals (cat, dog, cow) or birds (crow, sparrow) or another set as herbivorous (cow, sparrow) and carnivorous (cat, dog, crow). Here the notion of similarity plays a major role, because we don’t know what we are looking at before-hand. Our brain does this clustering and grouping of objects all the time based on our perceived notions about things and objects.

Finally, prediction means to learn from historical data to predict the future or at least the present. Weather forecast, Stock price movement, customer churn rate etc are some examples.

However, ML is not magic or rocket science. There is no 'one-fit-all' algorithm as stated by No Free Lunch Theorems. ML algorithms are data dependent, parameters dependent, and objective dependent. However, it is amazing when you start to observe it's connection with the actual decision making of our mind - which we generally take for granted.
Peter Johnston
Peter Johnston, I run DataScience Oxofrd and GDG Oxford (GDG is Google Developer Group)

First thing is machine learning isn't about mangoes. It is about the brain.

If you are asked to go to the shop and get 3 things, you'll probably remember them. But if you are asked to get 5 or 6, you may forget 1 or 2. And if you are asked to get 15 - weeeell, let's say the accuracy isn't gonna be that high, unless there's a connection.

Same with tasks. The old jokes about multi-tasking are wrong. None of us multi-task, we just do things sequentially and break them into small chunks, so it looks like multi-tasking.

Let's take a simple task. Shoot all the aliens in Space, Invaders, for example.

Hand-eye co-ordination issues to work on first, with a lag between what we see and our reaction. We calculate trajectories, both theirs and ours, making complex calculations to allow for this lag.

Then we try to remember them - we manage one or two, but then we hit our memory problem. We also hit our sequential action problem, having to worry about the one firing at us, the spaceship worth points at the top and the targets we are firing at. As it gets faster, it gets beyond us - we miss more, we get hit more often - we can't watch everything at once.

Game over. Next time, of course, we've learned a little. We do a little bit better. And we get a dopamine rush for doing so. That's why games are addictive.

Now imagine the computer doing the same task. It has a memory which is, for these purposes, infinite. It can handle millions of transactions every second. And it can calculate trajectories from many more datapoints, using algorithms.

So it starts by losing lives, missing targets and not seeing the spaceships. But it amasses data. Soon it has hundreds, then thousands, then millions of movements as its database. It has tried hundreds of actions too, learning from each one which works and which doesn't. Simply by doing more of the stuff which works and less of the stuff which doesn't, it quickly creates a highly evolved, complex set of algorithms to cover every possibility, all of which say "If this happens on screen do this".

Now we call this experiential learning. It is what we try to accelerate in schools, for humans. It is also known as reinforcement learning - each experience reinforces what is known to work, so it focuses on those actions. We call that training.

The difference is in the scale, between people and computers. Just by leaving the Space Invaders system to play itself, it learned from hundreds of games.

This is exactly what the team at Google DeepMind have been doing to create Artificial General Intelligence. This differs from a program in that the system learns itself what is to be done and how to do it well, rather than simply being programmed by a person. The recent Strachey lecture by Demis Hassabis explains this in excellent simplicity - the Space Invaders part is 18 minutes in.

Then Hassibis' team moved to Go. Here Data Mining comes into play. Deep Mind looked at the main online game platforms - millions of people worldwide who play games against eachother online. They downloaded 100,000 games to begin with - by analysing what worked in each of these games the system was able to learn very fast indeed. It was then set to play against itself and improved exponentially. Soon they had analysed 90 million games - more than any person could experience in their lifetime.

Within months it had reached the level of European Champion, winning 5-0. By later this week, Demis believes it will have reached World Champion level - he is putting it to the test with a match against the top GO player over the last decade - Lee Sedol. Watch it here from Wednesday (9th March 2016): https://www.youtube.com/channel/...

Some takeaways:

  1. Machines may be slower to learn than humans, but machine time has minimal cost. Thus they can be left to play against themselves, or even do so in the background. It thus learns from much more data than human learning.
  2. Machines have a practically infinite amount of memory and with fast retrieval, they can access all of it within reasonable thinking time. They can access open data from a host of tasks worldwide and access millions of datapoints in milliseconds.
  3. Machines can hold multiple actions in memory and, while still currently sequential, the cycle time is much faster than humans. They can thus, it would appear, multi-task through much more actions than humans.
  4. This machine learning is not subject to human biases such as wishing for a particular action to be correct, eliminating suggestions from other humans, subjective weighted evaluation (setting the judging parameters so yours would win) and extrapolation from small sample sizes.

A final thought...

Machines don't stop learning when they reach the best we can do. They continue to leverage their strengths of more data, faster processor and fewer biases to go way beyond human capabilities. New technologies such as quantum computers which truly multi-task will transform capability.

"When the car came along, it wasn't enough to be a faster horse, or a stronger horse - all horses were made obsolete."

"In the first industrial revolution, we removed the limitations of human muscle. In the second, we remove all the limitations of the human mind" Erki Brynolfsson

View More Answers