Machine Learning - Doug Enterprises

Many organizations that try to implement machine learning are guilty of putting the cart before the horse. They build a machine learning team before they have any idea of what they will actually do with machine learning. They have no specific problems to solve or questions to answer that they cannot already solve or answer with their existing tools — their business intelligence software or spreadsheet application. They end up building a knowledgeable data science team, but all the team ends up doing is playing with the technology. Nobody on the team knows enough about the organization to identify areas in which machine learning could be of practical use. They need to start to ask data science questions.

How Not to Launch a Machine Learning Initiative

I once worked for an organization whose leadership was committed to machine learning and made a considerable investment in it. They assembled a team of machine learning experts from a local university and provided them access to the organization’s data warehouse. The team built the infrastructure it needed to implement machine learning, but it quickly reached a dead end. Nobody in the organization had given much thought to how this amazing new technology would benefit the organization.

When the team began to ask, “What questions do you need answered?,” “What problems do you need to solve?,” and “What insights gained would help drive business?,” nobody had an answer. In fact, nobody in the organization ever imagined asking such questions. The organization had a strong control culture in place, so employees generally did what they were told. They were not rewarded for asking interesting questions and often felt discouraged from doing so. When they did ask a question, it was something like, "What type of promotions do our customers like?," which is something that could be solved with traditional database or spreadsheet tools.

The members of the machine learning team felt as though they had built a Formula One race car that was just sitting in a garage.

Nurturing a Culture of Curiosity

Whether you have a data science team in place or are planning to create such a team, the first step is to build a culture of curiosity. Start by educating everyone in the organization about machine learning, so that, at the very least, they can recognize various ways it can be applied. Next, encourage everyone in the organization to start asking questions, looking for problems to solve, and sharing their ideas. Machine learning can benefit every team in your organization — including research and development, manufacturing, shipping and receiving, marketing, sales, and customer service. Have each department maintain a list of problems, questions, and desired insights; prioritize the items on the list; and then consider which technology would be the most effective for addressing each item. Keep in mind that the best technology isn't necessarily machine learning; sometimes, all you need is a data warehouse and business intelligence software.

Of course, questions, problems, and desired insights vary depending on the organization, but here are a few sample questions to get you thinking:

What can we do to market more effectively to our customers and prospective customers?
How likely is it that a user will click on this ad?
Are there any processes or procedures that could benefit from greater levels of automation?
Who are our best customers?
If we could predict the future, which predictions would be most valuable to our organization?
How could we use machine learning to optimize preventive machine maintenance?
How can we use machine learning to identify the best qualified candidates to fill our current openings?

Here are a couple concrete ways to encourage people in your organization to start asking interesting questions:

Conduct question meetings. The sole purpose of the question meeting is to ask interesting and relevant questions and call attention to problems. Do not try to answer questions or solve problems during the meeting. Ban all cell phones and other electronic devices, and have your research lead conduct the meeting.
Place a question board in a well-trafficked area. A question board invites people to post questions and problems and provides inspiration for additional questions. In large organizations, consider posting multiple question boards, so everyone can participate.

Asking questions and calling attention to problems seems like a no-brainer. For any organization to survive and thrive, innovation is essential, and what sparks innovation are compelling questions and difficult problems. Unfortunately, many organizations have a strong control culture in which people are not rewarded and are often punished for asking questions and challenging the status quo. If that sounds like your organization, you need to find a way to break it free from its control culture and make everyone in the organization feel free to share their ideas and concerns.

In one of my previous articles What Is Machine Learning? I explain the basics of machine learning (ML) and point out the fact that big data plays a key role in ML. However, I stopped short of explaining the connection between ML and big data in detail. In this article, I take a deeper dive into the important role that big data plays in machine learning.

Key Machine Learning Components

Machine learning requires the following four key components:

An input device to bring data from the outside world into the machine in a digital format.
One or (usually more) powerful computer processors running in parallel.
A number of machine learning algorithms that run on those processors. (An algorithm is a process or set of rules to be followed in calculations or other problem-solving operations.) The combination of processing power and algorithm constitutes an artificial neuron — the smallest unit in an artificial neural network. These neurons must be arranged in layers — an input layer, one or more hidden layers, and an output layer.
Data sets that the machine can use to identify patterns in the data. The more high-quality data the machine has, the more it can fine-tune its ability to identify patterns and anything that diverges from those patterns.

Three Types of Machine Learning

Machines learn in the following three ways:

Supervised learning: With supervised learning, a trainer feeds a set of labeled data into the computer. This enables the computer to then identify patterns in that data and associate them with the labels provided. For example, a trainer may feed in photographs of 20 cats and tell the machine, "These are cats." She then feeds in 20 photos of dogs and tells the machine, "These are dogs." Finally, she feeds the machine 20 random photographs of dogs and cats without telling the machine whether each picture is of a dog or a cat. When the machine makes an error, the trainer corrects it, so the neural network can tune itself to greater accuracy.
Unsupervised learning: With unsupervised learning, data that is neither classified nor labeled is fed into the system. The system then identifies hidden patterns in the data that humans may be unable to detect or may have overlooked. Unsupervised learning is primarily used for clustering. For example, you'd feed in 100 photos of animals and tell the machine to divide them into five groups. The machine then looks for matching patterns in the photos and creates five groups based on similarities and differences in the photos. These may be groups a human would recognize, such as cat, dog, snake, octopus, and elephant, or they may be groups you would never imagine, such as snakes, dogs, and cats all being in the same group because the neural network focused on the cat and dog tails instead of other features.
Semi-supervised learning: This is a cross between supervised and unsupervised learning. Supervised learning is used initially to train the system on a small data set, then a large amount of unlabeled data is fed into the system to increase its accuracy.

Big Data Fuels Machine Learning

As you can see, data is important for machine learning, but that is no surprise; data also drives human learning and understanding. Imagine trying to learn anything while floating in a deprivation tank; without sensory, intellectual, or emotional stimulation, learning would cease. Likewise, machines require input to develop their ability to identify patterns in data.

The availability of big data (massive and growing volumes of diverse data) has driven the development of machine learning by providing computers with the volume and types of data they need to learn and perform specific tasks. Just think of all the data that is now collected and stored — from credit and debit card transactions, user behaviors on websites, online gaming, published medical studies, satellite images, online maps, census reports, voter records, financial reports, and electronic devices (machines equipped with sensors that report the status of their operation).

This treasure trove of data has given neural networks a huge advantage over the physical-symbol-systems approach to machine learning. Having a neural network chew on gigabytes of data and report on it is much easier and quicker than having an expert identify and input patterns and reasoning schemas to enable the computer to deliver accurate responses (as is done with the physical symbol systems approach to machine learning).

The Evolution of Machine Learning

In some ways, the evolution of machine learning is similar to how online search engines developed over time. Early on, users would consult website directories such as Yahoo! to find what they were looking for — directories that were created and maintained by humans. Website owners would submit their sites to Yahoo! and suggest the categories in which to place them. Yahoo! personnel would then review the user recommendations and add them to the directory or deny the request. The process was time-consuming and labor-intensive, but it worked well when the web had relatively few websites. When the thousands of websites proliferated into millions and then crossed the one billion threshold, the system broke down fairly quickly. Human beings couldn’t work quickly enough to keep the Yahoo! directories current.

In the mid-1990s Yahoo! partnered with a smaller company called Google that had developed a search engine to locate and categorize web pages. Google’s first search engine examined backlinks (pages that linked to a given page) to determine each page's relevance and relative importance. Since then, Google has developed additional algorithms to determine a page’s rank; for example, the more users who enter the same search phrase and click the same link, the higher the ranking that page receives. With the addition of machine learning algorithms, the accuracy of such systems increases proportionate to the volume of data they have to draw on.

So, what can we expect for the future of machine learning? The growth of big data isn't expected to slow down any time soon. In fact, it is expected to accelerate. As the volume and diversity of data expand, you can expect to see the applications for machine learning grow substantially, as well.

In one of my previous articles Supervised and Unsupervised Machine Learning, I pointed out that machine learning can be used to analyze data in four different ways — two of which are predictive, and two of which are descriptive:

Predictive: With these methods, supervised learning enables the machine to forecast outcomes based on established patterns. Predictive analysis includes:

Classification: Assigning items to different labeled classes
Regression: Identifying the connection between a dependent variable and one or more independent variables

Descriptive: With these methods, unsupervised learning enables the machine to detect patterns that reveal deeper insights into the data. Descriptive analysis includes:

Clustering: Creating groups of like things
Association: Identifying associations between things

Understanding Regression Analysis

To understand machine learning regression analysis, imagine those tube-shaped balloons you see at children's parties. You squeeze one end, and the other end expands. Release, and the balloon returns to normal. Squeeze both ends, the center expands. Release one end, and the expanded area moves to the opposite end. Each squeeze is an independent variable. Each bulge is a dependent variable; it differs depending on where you squeeze.

Now imagine a talented balloon sculptor twisting together five or six of these balloons to form a giraffe. Now the relationship between squeezing and expanding is more complex. If you squeeze the body, maybe the tail expands. If you squeeze the head, maybe two legs expand. Each change to the independent variable results in a change to one or more dependent variables. Sometimes that relationship is easy to predict, and other times may be very difficult.

Business Applications of Regression Analysis

Regression analysis is commonly used in the financial industry to analyze risk. For example, I once worked for a credit card company that was looking for a way to predict which customers would struggle to make their monthly payments. They used a regression algorithm to identify relationships between different variables and discovered that many customers start to use their credit card to pay for essentials just before they have trouble paying their bills. A customer who typically used their card only for large purchases, such as a television or computer, would suddenly start using it to buy groceries and gas and pay their electric bill. The company also discovered that people who had a lot of purchases of less than five dollars were likely to struggle with their monthly payments.

The dependent variable was whether the person would have enough money to cover the monthly payment. The independent variables were the items the customer purchased and the purchase amounts. Based on the results of the analysis, the credit card company could then decide whether to suspend the customer's account, reduce the account's credit line, or maintain the account's current status in order to limit the company's exposure to risk.

Businesses often use regression analysis to identify which factors contribute most to sales. For example, a company may want to know how it can get the most bang for its buck in terms of advertising; should it spend more money on its website, on social media, on television advertising, on pay-per-click (PPC) advertisements, and so on. Regression analysis can identify which items contribute most to not at all. The company can then use the results of that analysis to predict how its various advertising investments will perform.

Identifying the Dependent and Independent Variables

When performing regression analysis, the first step is to identify the dependent and independent variables:

Dependent variable is what you are trying to understand or predict; for example, whether a customer is about to miss one of his credit card payments.
Independent variables are factors that may have an impact on the dependent variable; for example, the customer's spending habits or purchase amounts prior to the date on which the next credit card payment is due.

An Important Reminder

Keep in mind that correlation does not prove causation. Just because regression analysis shows a correlation between an independent and a dependent variable, that does not mean that a change in the independent variable caused the change observed in the dependent variable, so avoid the temptation to assume it does.

Instead, perform additional research to prove or disprove the correlation or to dig deeper to find out what's really going on. For example, regression analysis may show a correlation between the use of certain colors on a web page and the amount of time users spend on those pages, but other unidentified factors may be contributing and perhaps to a greater degree. A web designer would be wise to run one or more experiments first before making any changes.

While regression analysis is very useful for identifying relationships among a dependent variable and one or more independent variables, use these relationships as a starting point for gathering more data and developing deeper insight into the data. Ask what the results mean and what else could be driving those results before drawing any hard and fast conclusions.

When you’re working with data (regardless of the size of your data sets), you’re likely to encounter two terms that are often confused — data mining vs machine learning:

Data mining is any way of extracting useful information or insights from data, primarily for the purpose of making better decisions about the future. (Note that you’re not mining data; you’re mining information and insights from that data.)
Machine learning is the science of getting computers to do things they weren't specifically programmed to do. (Machine learning is often used to mine data — to extract valuable insights from data — but it is only one way of doing so.)

In short, data mining is much broader than machine learning, but it certainly includes machine learning.

More About Data Mining

Data mining uses a very broad toolset to extract meaning from data. This toolset includes data warehouses and data lakes to store and manage data; extract, transform, and load (ETL) processes to bring data into the data warehouse; and business intelligence (BI) and visualization tools, which provide an easy means to combine, filter, sort, summarize, and present data in similar (though more sophisticated) ways than a spreadsheet application can do.

Visualizations, such as the following, are particularly useful because they reveal patterns in the data that might otherwise go unnoticed:

Tables
Charts/graphs
Maps
Timelines

More About Machine Learning

In the context of data mining, machine learning harnesses the computational power of a computer to find patterns, associations, and anomalies in large data sets in order to identify patterns in the data and use those patterns to make predictions. While BI and visualization tools enable humans to more readily identify patterns in data, machine learning sort of automates the process and often goes one step further to act on the meaning extracted from the data. For example, machine learning may identify patterns in credit card transaction data that are indicative of fraud and then use this insight to identify any future transactions as fraudulent or not, and block any suspected fraudulent transactions.

Machine learning is also useful for clustering — grouping like items in a data set to reveal patterns in the data that humans may have overlooked or never imagined looking for. For example, machine learning has been used in medicine to identify patterns in medical images that help to distinguish different forms of cancer with a high level of accuracy.

Choosing the Right Approach

When your goal is to extract meaning from data, don't get hung up on the terminology or the differences between data mining and machine learning. Focus instead on the question you’re trying to answer or the problem you’re trying to solve, and team up with or consult a data scientist to determine the best approach. Here are a couple general guidelines:

If you have a clear idea of the insight you hope to gain, such as the number of people visiting your website over a specific period of time, a database or data warehouse coupled with BI or data visualization software is probably sufficient.
If you need to extract meaning from a large volume of data and do not have a clear idea of how to answer a question or solve a particular problem, then you probably need to employ some type of machine learning — supervised or unsupervised. (See my previous article "Comparing Supervised and Unsupervised Machine Learning" for details.)

Think of it this way: Imagine you manage a hospital and you're trying to determine why certain patients have better outcomes than others. You could approach this challenge from several different angles, including these two:

Use BI or data visualization software: Start by asking questions that you can answer by consulting the BI software, such as “Which doctors on staff have the greatest success rates?” or “Which patient follow-up programs resulted in the least number of return visits to the doctor?” Based on your findings, you can produce reports that state and support the conclusions you've drawn. The reports could lead to more questions requiring additional analysis.
Employ machine learning: Use unsupervised machine learning on an artificial neural network. You throw all the data into the artificial neural network hoping that it will identify useful patterns. With patterns in hand, it’s up to you and your team to determine the relevance of those patterns and find out the cause(s) behind those patterns.

Each of these approaches has its own advantages and disadvantages. With the BI software approach, you would probably develop a deeper knowledge of the data and be able to explain the reasoning that went into the conclusions you've drawn. The process might even lead you to ask more interesting questions. Machine learning with an artificial neural network is more likely to identify unexpected patterns; the machine would view the data in a different way than humans typically do. This approach can also find non-interpretable patterns, which may make sense to the machine but not to the humans.

What's important is that you consider your options carefully. Avoid the common temptation to choose machine learning solely because it is the latest, greatest technology. Sometimes, Excel is all you need to answer a simple question.

Like people, machines can learn through supervised and unsupervised machine learning, but human learning differs from machine learning. With humans, supervised learning consists of formal education. An instructor presents the material, students study it and are tested on it, and areas of weakness are addressed, hopefully to the point at which students achieve mastery in that given subject area. Unsupervised learning is experiential, often referred to as "common sense." You venture out in the world and engage in daily activities, learning on your own and from making mistakes.

Machine learning differs in that it involves only a couple forms of learning, and those are determined by what you want the machine to do:

Supervised learning if you want to use the machine for classification (assigning items to different labeled classes) or regression (identifying the connection between an independent and a dependent variable).
Unsupervised learning if you want to use the machine for clustering (creating groups of like things) and association (identifying associations between things).

Supervised Learning

With supervised learning, a human trainer labels items in a small data set often referred to as the training data set. The machine has an advantage of knowing how the human trainer has classified the data. For example, suppose you want to train a machine to be able to distinguish between spam email messages and not-spam email messages. You feed several examples of spam messages into the machine and tell the machine, "These are spam." Then, you feed several examples of not-spam messages into the machine and tell it, "These are not spam."

The machine identifies patterns in both message groups — certain patterns that are characteristic of spam and other patterns characteristic of not-spam. Now, when you feed a message into the machine that is not labeled spam or not-spam, the machine should be able to tell whether the message is or is not spam.

Unfortunately, machines make mistakes. A certain message may not have a clear pattern that characterizes it as either spam or not-spam, so it may send some messages that are not spam to the Spam folder and allow some spam messages to reach your Inbox. Your machine clearly needs more training.

Additional training occurs when you mark a message in the Spam folder as "not spam," or when you move a spam message from your inbox to the spam folder. This provides the machine with valuable feedback that enables it to fine-tune its neural network, increasing its accuracy.

Supervised learning tends to be more useful than unsupervised learning in the following applications:

Classification: Assigning items to different classes, such as spam and not-spam.
Regression: Identifying the connection between an independent and a dependent variable; for example, a customer's spending habits and their ability to pay the mortgage.

Classification and regression are both considered to be predictive because they can be used to forecast the probability that a given input will result in a given output. For example, if you use a regression algorithm to identify a relationship between family income and high school graduation rates, you can use that relationship to predict a student's likelihood of graduating by looking at the student's family income.

Unsupervised Learning

With supervised learning, you feed the machine a data set and instruct it to group like items without providing it with labeled groups; the machine must determine the groups based on similarities and differences among the items in the data set.

For example, you might feed 1000 medical images into a machine and have it group the images based on patterns it detects in those images. The machine creates 10 groups and assigns images to each group. A doctor can then examine the different groups in an attempt to figure out why the machine grouped the images as it did. The benefit here is that the machine may identify patterns that doctors never thought to look for — patterns that may provide insights into diagnosis and treatment options.

Unsupervised learning tends to be more useful than supervised learning in the following applications:

Clustering: Splitting the data set into groups based on their similarities, as in the medical images example provided above.
Identifying associations: Identifying two or more patterns that tend to occur together in a data set. For example, a retailer may use unsupervised learning to identify products that are often purchased together, in order to develop marketing strategies to increase the sales of those products.

Clustering and association are considered to be descriptive, as opposed to predictive, because they identify patterns that reveal insights into the data.

When starting your own artificial intelligence project, carefully consider the available data and what you want to do with that data. If you already have well defined categories that you want the machine to use to classify input, you probably want to stick with supervised learning. If you’re unsure how to group and categorize the data or you want to look at the data in a new way, unsupervised learning is probably the better approach because it enables the computer to identify similarities and differences you may have never considered otherwise.

Like people, machines can learn through supervised vs unsupervised learning. With supervised learning, a human labels the data. So the machine has an advantage of knowing the human definition of the data. The human trainer gives the machine a stack of cat pictures and tells the machine, “These are cats.” With unsupervised learning, the machine figures out on its own how to cluster the data.

Consider the earlier example of the marching band neural network. Suppose you want the band to be able to classify whatever music it’s presented, and the band is unfamiliar with the different genres. If you give the band music by Merle Haggard, you want the band to identify it as country music. If you give the band a Led Zeppelin album, it should recognize it as rock.

To train the band using supervised learning, you give it a random subset of data called a training set. In this case, you provide two training sets — one with several country music songs and the other with several rock songs. You also label each training set with the category of songs — country and rock. You then provide the band with additional songs in each category and instruct it to classify each song. If the band makes a mistake, you correct it. Over time, the band (the machine) learns how to classify new songs accurately in these two categories.

But let's say that not all music can be so easily categorized. Some old rock music sounds an awful lot like folk music. Some folk music sounds a lot like the blues. In this case, you may want to try unsupervised learning. With unsupervised learning you give the band a large variety of songs — classical, folk, rock, jazz, rap, reggae, blues, heavy metal and so forth. Then you tell the band to categorize the music.

The band won't use terms like jazz, country, or classical. Instead it groups similar music together and applies its own labels, but the labels and groupings are likely to differ from the ones that you’re accustomed to. For example, the marching band may not distinguish between jazz and blues. It may also divide jazz music into two different categories, such as cool and classic.

Having your marching band create its own categories has advantages and disadvantages. The band may create categories that humans never imagined, and these categories may actually be much more accurate than existing categories. On the other hand, the marching band may create far too many categories or far too few for its system to be of use.

When starting your own AI project, think about how you'd like to categorize your data. If you already have well defined categories that you want the machine to use to classify input, you probably want to stick with supervised learning. If you’re unsure how to group and categorize the data or you want to look at the data in a new way, unsupervised learning is probably the better approach; it’s likely to enable the computer to identify similarities and differences you would probably overlook.

The symbolic systems approach and AI planning work great for applications that have a limited number of matching patterns; for example, a program that helps you complete your tax return. The IRS provides a limited number of forms and a collection of rules for reporting tax-relevant data. Combine the forms and instructions with the capability to crunch numbers and some heuristic reasoning, and you have a tax program that can step you through the process. With heuristic reasoning, introduced in the previous chapter, you can limit the number of patterns; for example, if you earned money from an employer, you complete a W-2 form. If you earned money as a sole proprietor, you complete Schedule C.

The limitation with this approach is that the database is difficult to manage, especially when rules and patterns change. For example, malware (viruses, spyware, computer worms and so forth) evolve too quickly for anti-malware companies to manually update their databases. Likewise, digital personal assistants, such as Siri and Alexa, need to constantly adapt to unfamiliar requests from their owners.

To overcome these limitations, early AI researchers started to wonder whether computers could be programmed to learn new patterns. Their curiosity led to the birth of machine learning — the science of getting computers to do things they weren't specifically programmed to do.

Machine learning got its start very shortly after the first AI conference. In 1959, AI researcher Arthur Samuel created a program that could play checkers. This program was different. It was designed to play against itself so it could learn how to improve. It learned new strategies from each game it played and after a short period of time began to consistently beat its own programmer.

A key advantage of machine learning is that it doesn't require an expert to create symbolic patterns and list out all the possible responses to a question or statement. On its own, the machine creates and maintains the list, identifying patterns and adding them to its database.

Imagine machine learning applied to the Chinese room experiment. The computer would observe the passing of notes between itself and the person outside the room. After examining thousands of exchanges, the computer identifies a pattern of communication and adds common words and phrases to its database. Now, it can use its collection of words and phrases to more quickly decipher the notes it receives and quickly assemble a response using these words and phrases instead of having to assemble a response from a collection of characters. It may even create its own dictionary based on these matching patterns, so it has a complete response to certain notes it receives.

Machine learning still qualifies as weak AI, because the computer doesn't understand what's being said; it only matches symbols and identifies patterns. The big difference is that instead of having an expert provide the patterns, the computer identifies patterns in the data. Over time, the computer becomes "smarter."

Machine learning has become one of the fastest growing areas in AI primarily because the cost of data storage and processing has dropped dramatically. We are currently in the era of data science and big data — extremely large data sets that can be computer analyzed to reveal patterns, trends and associations. Organizations are collecting vast amounts of data. The big challenge is to figure out what to do with all this data. Answering that challenge is machine learning, which can identify patterns even when you really don't know what you're looking for. In a sense, machine learning enables computers to find out what's inside your data and let you know what it found.

Machine learning moves past the limitations with symbolic systems. Instead of memorizing symbols a computer system uses machine learning algorithms to create models of abstract concepts. It detects statistical patterns by using machine learning algorithms on massive amounts of data.

So a machine learning algorithm looks at the eight pictures of different dogs. Then it breaks down these pictures into individual dots or pixels. Then it looks at these pixels to detect patterns. Maybe it sees a pattern all of these animals as having hair. Maybe it sees a pattern for noses or ears. It could even see a pattern that humans are unable to perceive. Collectively, the patterns create what might be considered a statistical expression of “dogness.”

Sometimes humans can help machines learn. We can feed the machine millions of pictures that we’ve already determined contained dogs, so the machine doesn’t have to worry about excluding images of cats, horses or airplanes. This is called supervised learning, and the data, consisting of the label “dog” and the millions of pictures of dogs is called a training set. Using the training set, a human being is teaching the machine that all of the patterns it identifies are characteristics of “dog.”

Machines can also learn completely on their own. We just feed massive amounts of data into the machine and let it find its own patterns. This is called unsupervised learning.

Imagine a machine examining all the pictures of people on your smart phone. It might not know if someone was your husband, wife, boyfriend or girlfriend. But it could create clusters of people that it sees are closest to you.