Prior to starting an AI project, the first choice you need to make is whether to use an expert system (a rules based system) or machine learning. Basically the choice comes down to the amount of data, the variation in that data and whether you have a clear set of steps for extracting a solution from that data. An expert system is best when you have a sequential problem and there are finite steps to find a solution. Machine learning is best when you want to move beyond memorizing sequential steps, and you need to analyze large volumes of data to make predictions or to identify patterns that you may not even know would provide insight — that is, when your problem contains a certain level of uncertainty.

Think about it in terms of an automated phone system.

Automated Phone System

Older phone systems are sort of like expert systems; a message tells the caller to press 1 for sales, 2 for customer service, 3 for technical support and 4 to speak to an operator. The system then routes the call to the proper department based on the number that the caller presses.

Newer, more advanced phone systems use natural language processing. When someone calls in, the message tells the caller to say what they’re calling about. A caller may say something like, “I’m having a problem with my Android smart phone,” and the system routes the call to technical support. If, instead, the caller said something like, “I want to upgrade my smartphone,” the system routes the call to sales.

The challenge with natural language processing is that what callers say and how they say it is uncertain. An angry caller may say something like “That smart phone I bought from you guys three days ago is a piece of junk.” You can see that this is a more complex problem. The automated phone system would need accurate speech recognition and then be able to infer the meaning of that statement so that it could direct the caller to the right department.

With an expert system, you would have to manually input all the possible statements and questions, and the system would still run into trouble when a caller mumbled or spoke with an accent or spoke in another language.

In this case, machine learning would be the better choice. With machine learning, the system would get smarter over time as it created its own patterns. If someone called in and said something like, “I hate my new smart phone and want to return it,” and they were routed to sales and then transferred to customer service, the system would know that the next time someone called and mentioned the word “return,” that call should be routed directly to customer service, not sales.

When you start an AI program, consider which approach is best for your specific use case. If you can draw a decision tree or flow chart to describe a specific task the computer must perform based on limited inputs, then an expert system is probably the best choice. It may be easier to set up and deploy, saving you time, money and the headaches of dealing with more complex systems. If, however, you’re dealing with massive amounts of data and a system that must adapt to changing inputs, then machine learning is probably the best choice.

Some AI experts mix these two approaches. They use an expert system to define some constraints and then use machine learning to experiment with different answers. So you have three choices — an expert system, machine learning or a combination of the two.

Fueling the rise of machine learning and deep learning is the availability of massive amounts of data, often referred to as big data. If you wanted to create an AI program to identify pictures of cats, you could access millions of cat images online. The same is true, or more true, of other types of data. Various organizations have access to vast amounts of data, including charge card transactions, user behaviors on websites, data from online games, published medical studies, satellite images, online maps, census reports, voter records, economic data and machine-generated data (from machines equipped with sensors that report the status of their operation and any problems they detect).

This treasure trove of data has given machine learning a huge advantage over symbolic systems. Having a neural network chew on gigabytes of data and report on it is much easier and quicker than having an expert identify and input patterns and reasoning schemas to enable the computer to deliver accurate responses.

In some ways the evolution of machine learning is similar to how online search engines evolved. Early on, users would consult website directories such as Yahoo! to find what they were looking for — directories that were created and maintained by humans. Website owners would submit their sites to Yahoo! and suggest the categories in which to place them. Yahoo! personnel would then vet the sites and add them to the directory or deny the request. The process was time-consuming and labor-intensive, but it worked well when the web had relatively few websites. When the thousands of websites proliferated into millions and then crossed the one billion threshold, the system broke down fairly quickly. Human beings couldn’t work quickly enough to keep the Yahoo! directories current.

In the mid-1990s Yahoo! partnered with a smaller company called Google that had developed a search engine to locate and categorize web pages. Google’s first search engine examined backlinks (pages that linked to a given page) to determine the relevance and authority of the given page and rank it accordingly in its search results. Since then, Google has developed additional algorithms to determine a page’s rank (or relevance); for example, the more users who enter the same search phrase and click the same link, the higher the ranking that page receives. This approach is similar to the way neurons in an artificial neural network strengthen their connections.

The fact that Google is one of the companies most enthusiastic about AI is no coincidence. The entire business has been built on using machines to interpret massive amounts of data. Rosenblatt's preceptrons could look through only a couple grainy images. Now we have processors that are at least a million times faster sorting through massive amounts of data to find content that’s most likely to be relevant to whatever a user searches for.

Deep learning architecture adds even more power, enabling machines to identify patterns in data that just a few decades ago would have been nearly imperceptible. With more layers in the neural network, it can perceive details that would go unnoticed by most humans. These deep learning artificial networks look at so much data and create so many new connections that it’s not even clear how these programs discover the patterns.

A deep learning neural network is like a black box swirling together computation and data to determine what it means to be a cat. No human knows how the network arrives at its decision. Is it the whiskers? Is it the ears? Or is it something about all cats that we humans are unable to see? In a sense, the deep learning network creates its own model for what it means to be a cat, a model that as of right now humans can only copy or read, but not understand or interpret.

In 2012, Google’s DeepMind project did just that. Developers fed 10 million random images from YouTube videos into a network that had over 1 billion neural connections running on 16,000 processors. They didn’t label any of the data. So the network didn’t know what it meant to be a cat, human or a car. Instead the network just looked through the images and came up with its own clusters. It found that many of the videos contained a very similar cluster. To the network this cluster looked like this.

Cat Detection

A “cat” from “Building high-level features using large scale unsupervised learning”

Now as a human you might recognize this as the face of a cat. To the neural network this was just a very common something that it saw in many of the videos. In a sense it invented its own interpretation of a cat. A human might go through and tell the network that this is a cat, but this isn’t necessary for the network to find cats in these videos. In fact the network was able to identify a “cat” 74.8% of the time. In a nod to Alan Turing, the Cato Institute’s Julian Sanchez called this the “Purring Test.”

If you decide to start working with AI, accept the fact that your network might be sensing things that humans are unable to perceive. Artificial intelligence is not the same as human intelligence, and even though we may reach the same conclusions, we’re definitely not going through the same process.

The symbolic approach and AI planning work great for applications that have a limited number of matching patterns; for example, a program that helps you complete your tax return. The IRS provides a limited number of forms and a collection of rules for reporting tax-relevant data. Combine the forms and instructions with the capability to crunch numbers and some heuristic reasoning, and you have a tax program that can step you through the process. With heuristic reasoning, introduced in the previous chapter, you can limit the number of patterns; for example, if you earned money from an employer, you complete a W-2 form. If you earned money as a sole proprietor, you complete Schedule C.

The limitation with this approach is that the database is difficult to manage, especially when rules and patterns change. For example, malware (viruses, spyware, computer worms and so forth) evolve too quickly for anti-malware companies to manually update their databases. Likewise, digital personal assistants, such as Siri and Alexa, need to constantly adapt to unfamiliar requests from their owners.

To overcome these limitations, early AI researchers started to wonder whether computers could be programmed to learn new patterns. Their curiosity led to the birth of machine learning — the science of getting computers to do things they weren't specifically programmed to do.

Machine learning got its start very shortly after the first AI conference. In 1959, AI researcher Arthur Samuel created a program that could play checkers. This program was different. It was designed to play against itself so it could learn how to improve. It learned new strategies from each game it played and after a short period of time began to consistently beat its own programmer.

A key advantage of machine learning is that it doesn't require an expert to create symbolic patterns and list out all the possible responses to a question or statement. On its own, the machine creates and maintains the list, identifying patterns and adding them to its database.

Imagine machine learning applied to the Chinese room experiment. The computer would observe the passing of notes between itself and the person outside the room. After examining thousands of exchanges, the computer identifies a pattern of communication and adds common words and phrases to its database. Now, it can use its collection of words and phrases to more quickly decipher the notes it receives and quickly assemble a response using these words and phrases instead of having to assemble a response from a collection of characters. It may even create its own dictionary based on these matching patterns, so it has a complete response to certain notes it receives.

Machine learning still qualifies as weak AI, because the computer doesn't understand what's being said; it only matches symbols and identifies patterns. The big difference is that instead of having an expert provide the patterns, the computer identifies patterns in the data. Over time, the computer becomes "smarter."

Machine learning has become one of the fastest growing areas in AI primarily because the cost of data storage and processing has dropped dramatically. We are currently in the era of data science and big data — extremely large data sets that can be computer analyzed to reveal patterns, trends and associations. Organizations are collecting vast amounts of data. The big challenge is to figure out what to do with all this data. Answering that challenge is machine learning, which can identify patterns even when you really don't know what you're looking for. In a sense, machine learning enables computers to find out what's inside your data and let you know what it found.

Machine learning moves past the limitations with symbolic systems. Instead of memorizing symbols a computer system uses machine learning algorithms to create models of abstract concepts. It detects statistical patterns by using machine learning algorithms on massive amounts of data.

Statistical Dog

So a machine learning algorithm looks at the eight pictures of different dogs. Then it breaks down these pictures into individual dots or pixels. Then it looks at these pixels to detect patterns. Maybe it sees a pattern all of these animals as having hair. Maybe it sees a pattern for noses or ears. It could even see a pattern that humans are unable to perceive. Collectively, the patterns create what might be considered a statistical expression of “dogness.”

Sometimes humans can help machines learn. We can feed the machine millions of pictures that we’ve already determined contained dogs, so the machine doesn’t have to worry about excluding images of cats, horses or airplanes. This is called supervised learning, and the data, consisting of the label “dog” and the millions of pictures of dogs is called a training set. Using the training set, a human being is teaching the machine that all of the patterns it identifies are characteristics of “dog.”

Machines can also learn completely on their own. We just feed massive amounts of data into the machine and let it find its own patterns. This is called unsupervised learning.

Imagine a machine examining all the pictures of people on your smart phone. It might not know if someone was your husband, wife, boyfriend or girlfriend. But it could create clusters of people that it sees are closest to you.

Combining Memorization with Generalization and Specification

In one of my previous posts "The General Problem Solver," I discussed an approach to artificial intelligence (AI) referred to as the physical symbol system hypothesis (PSSH). The theory behind this approach is that human intelligence consists of the ability to take symbols (recognizable patterns), combine them into structures (expressions), and manipulate them using various processes to produce new expressions.

As philosopher John Searle pointed out with his Chinese room argument, this ability, in and of itself, does not constitute intelligence, because it requires no understanding of the symbols or expressions. For example, if you ask your virtual assistant (Siri, Alexa, Bixby, Cortana, etc.) a question, it searches through a list of possible responses, chooses one, and provides that as the answer. It doesn't understand the question and has no desire to answer the question or to provide the correct answer.

AI built on the PSSH is proficient at storing lots of data (memorization) and pattern-matching, but it is not so good at learning. Learning requires the ability not only to memorize but also to generalize and specify.

Memorization + Generalization + Specification = Learning

Human evolution has made us experts at memorization, generalization, and specification. To a large degree, it is how we learn. We record (memorize) details about our environment, experiences, thoughts, and feelings; form generalizations that enable us to respond appropriately to similar environments and experiences in the future; and then fine-tune our impressions through specification. For example, if you try to pet a dog, and it snaps at you, you may generalize to avoid dogs in the future. However, over time, you develop more nuanced thoughts about dogs—that not all dogs snap when you try to pet them and that there are certain ways to approach dogs that make them less likely to snap at you.

The combination of memorization, generalization, and specification is a valuable survival skill. It also plays a key role in enabling machine learning — providing machines with the ability to recognize unfamiliar patterns based on what they already "know" about familiar patterns.


Look at the following eight images. You can tell that the eight images represent different breeds of dogs, even though these aren't photographs of actual dogs.

Eight Dogs

You know that these are all pictures of dogs, because you have encountered many dogs in your life (in person, in photos and drawings, and in videos), and you have formed in your mind an abstract idea of what a dog looks like.

In their early stages of learning, children often overgeneralize. For example, upon learning the word "dog," they call all furry creatures with four legs "dog." To these children, cats are dogs, cows are dogs, sheep are dogs, and so on. This is where specification comes into play. As children encounter different species of four-legged mammals, they begin to identify the qualities that make them distinct.

Computers are far better at memorization and far worse at generalization and specification. The computer could easily memorize the eight images of dogs, but if you fed the computer a ninth image of a dog, as shown below, it would likely struggle to match it to one of the existing images and identify it as an image of a dog. In other words, the computer would quickly master memorization and pattern-matching but struggle to learn due to its inability to generalize and specify.

For this same reason, language translation programs have always struggled with accuracy. Developers have created physical symbol systems that translate words and phrases from one language to another, but these programs never really learn the language. Instead, they function merely as an old-fashioned foreign language dictionary and phrasebook — quickly looking up words and phrases in the source language to find the matching word or phrase in the destination language and then stitching together the words and phrases to provide the translation. These translation programs often fail when they encounter unfamiliar words, phrases, and even syntax (word order).

Making Machines That Can Learn

Currently, machines have the ability to learn. The big challenge in the future of artificial intelligence and machine learning will be to enable machines to do a better job of generalizing and specifying. Instead of merely matching memorized symbols, newer machines will create abstract models based on the patterns they observe (the patterns they are fed). These models will have the potential to help these machines learn more effectively, so they can more accurately interpret unfamiliar future input.

As machines become better at generalizing and specializing, they will achieve greater levels of intelligence. It remains to be seen, however, whether machines will ever have the capacity to develop self-awareness and self-determination — key characteristics of human intelligence.

Artificial Intelligence Planning

Artificial intelligence planning is a branch of AI whose purpose is to identify strategies and action sequences that will, with a reasonable degree of confidence, enable the AI program to deliver the correct answer, solution, or outcome.

As I explained in a previous post "The General Problem Solver," one of the limitations of early AI, which was based on the physical symbol system hypothesis (PSSH), is combinatorial explosion — a mathematical phenomenon in which the number of possible combinations increases beyond the computer's capability to explore all of them in a reasonable amount of time.

Heuristic Reasoning

AI planning attempts to solve the problem of combinatorial explosion by using something called heuristic reasoning — an approach that attempts to give artificial intelligence a form of common sense. Heuristic reasoning enables an AI program to rule out a large number of possible combinations by identifying them as impossible or highly unlikely. This approach is sometimes referred to as "limiting the search space."

A heuristic is a mental shortcut or rule-of-thumb that enables people to solve problems and make decisions quickly. For example, the Rule of 72 is a heuristic for estimating the number of years it would take an investment to double your money. You divide 72 by the rate of return, so an investment with a 6% rate of return would double your money in about 72/6 = 12 years.

Heuristic reasoning is common in innovation. Inventors rarely consider all the possibilities for solving a particular problem. Instead, they start with an idea, a hypothesis, or a hunch based on their knowledge and prior experience, then they start experimenting and exploring from that point forward. If they were to consider all the possibilities, they would waste considerable time, effort, energy, and expertise on futile experiments and research.

Heuristic Reasoning Combined with a Physical Symbol System

With AI planning, you might combine heuristic reasoning with a physical symbol system to improve performance. For example, imagine heuristic reasoning applied to the Chinese room experiment I introduced in my previous post on the general problem solver.

In the Chinese room scenario, you, an English-only speaker, are locked in a room with a narrow slot on the door through which notes can pass. You have a book filled with long lists of statements in Chinese, and the floor is covered in Chinese characters. You are instructed that upon receiving a certain sequence of Chinese characters, you are to look up a corresponding response in the book and, using the characters strewn about the floor, formulate your response.


What you do in the Chinese room is very similar to how AI programs work. They simply identify patterns, look up entries in a database that correspond to those patterns, and output the entries in response.

With the addition of heuristic reasoning, AI could limit the possibilities of the first note. For example, you could program the software to expect a message such as "Hello" or "How are you?” In effect, this would limit the search space, so that the AI program had to search only a limited number of records in its database to find an appropriate response. It wouldn't get bogged down searching the entire database to consider all possible messages and responses.

The only drawback is that if the first message was not one of those that was anticipated, the AI program would need to search its entire database.

A Real-World Example

Heuristic reasoning is commonly employed in modern AI applications. For example, if you enter your location and destination in a GPS app, the app doesn't search its vast database of source data, which consists of satellite and aerial imagery; state, city, and county maps; the US Geological Survey; traffic data; and so on. Instead, it limits the search space to the area that encompasses the location and destination you entered. In addition, it limits the output to the fastest or shortest route (not both) depending on which setting is in force, and it likely omits a great deal of detail from its maps to further expedite the process.

The goal is to deliver an accurate map and directions, in a reasonable amount of time, that lead you from your current location to your desired destination as quickly as possible. Without the shortcuts to the process provided by heuristic reasoning, the resulting combinatorial explosion would leave you waiting for directions . . . possibly for the rest of your life.

Good Old-Fashioned AI

Even though many of the modern AI applications are built on what are now considered old-fashioned methods, AI planning allows for the intelligent combination of these methods, along with newer methods, to build AI applications that deliver the desired output. The resulting applications can certainly make computers appear to be intelligent beings — providing real-time guidance from point A to point B, analyzing contracts, automating logistics, and even building better video games.

If you're considering a new AI project, don't be quick to dismiss the benefits of good old-fashioned AI (GOFAI). Newer approaches may not be the right fit.

The General Problem Solver

In a previous post entitled "Playing the Imitation Game," I discussed Alan Turing's vision, published in 1936, of a single, universal machine that could be programmed to solve any particular problem. In 1959, Allen Newell and Herbert A. Simon took a different approach. Their goal was to develop a computer program that could function as a universal problem solver.

Newell and Simon

Newell and Simon - Courtesy Carnegie Mellon University Libraries

In theory, their general problem solver (GPS) would be able to solve any problem that could be presented in the form of specific types of mathematical formulas that are useful in programming logic. This type of problem would include geometric proofs, which start with definitions, axioms (statements accepted as fact), postulates, and previously proven theorems, and use logic to arrive at reasoned conclusions.

The Tower of Hanoi

One of the problems GPS solved was the Tower of Hanoi — a game or puzzle consisting of three rods and a number of disks of different sizes, which can slide onto any rod.

Tower of hanoi

When you start, all the disks are on one rod, ordered from largest to smallest from the bottom up. The goal is to move the entire stack to another rod in the least number of moves following these rules:

● Move only one disk at a time.

● Do not place a larger disk on top of a smaller one.

● Each move consists of taking the top disk from one stack and placing it on an empty rod or on the top of an existing stack.

The minimum number of moves to solve the Tower of Hanoi is 2n – 1, where n is the number of disks, so for three disks, the minimum number of moves is (2 x 2 x 2) – 1 = 7.

The Physical Symbol System Hypothesis

One of the key parts of the general problem solver was what Newell and Simon called the physical symbol system hypothesis (PSSH). According to Newell and Simon, "A physical symbol system has the necessary and sufficient means for general intelligent action." Such a system would be able to take patterns (symbols), combine them into structures (expressions), and manipulate them using various processes to produce new expressions.

Newell and Simon believed that human intelligence was no more than a complex physical symbol system. They thought that a key part of human reasoning consisted merely of connecting symbols — that our language, ideas, and concepts were just broad groupings of interconnected symbols. For example, when we see a chair or a picture of a chair, we associate it with the act of sitting. When we smell smoke, we associate it with fire, which is associated with danger, which may trigger a fight-or-flight response.

Symbolic Reasoning

Newell and Simon argued that by feeding a machine enough physical symbols, creating a sufficient number of associations, and putting rules in place for combining symbols into structures and manipulating them to create new expressions, machines could be made to "think" like we humans do. This theory forms the basis of what drives most of machine learning and artificial intelligence to this day.

Refuting the Theory: The Chinese Room Argument

Not everyone buys into the notion that a physical symbol system is necessary and sufficient for human intelligence. In 1980, philosopher John Searle argued that merely connecting symbols could not be considered intelligence. To support his argument against the idea that manipulating physical symbols constituted intelligence, he presented what is commonly referred to as the Chinese room argument.

Imagine yourself, an English-only speaker, locked in a room with a narrow slot on the door through which you can pass notes. You have a book filled with long lists of statements in Chinese, and the floor is covered in Chinese characters. You are instructed that upon receiving a certain sequence of Chinese characters, you are to look up a corresponding response in the book and, using the characters strewn about the floor, formulate your response.

Chinese Room Experiment

Someone outside the room who speaks and writes fluent Chinese writes a note on a sheet of paper and passes it to you through the slot on the door. Following the instructions you were given, you look up a response in the book, copy the response using characters from the floor to create your note, and pass it through the slot to the person who delivered the original message.

The native speaker may believe that the two of you are communicating and that you know the language. However, Searle argues that this is no proof of intelligence, because you have no understanding of the messages you are receiving or sending.

You can try a similar experiment with your smart phone. If you ask Siri or Alexa how she's feeling, she will answer your question even though she feels nothing at all. She doesn't even understand the question. This artificially "intelligent" being is merely matching your question to what is considered an acceptable answer and delivering that answer to you.

Combinatorial Explosion: A Major Obstacle

A huge obstacle to achieving artificial intelligence through a physical symbol system is what's known as combinatorial explosion — the rapid growth of symbol combinations that makes pattern-matching increasingly difficult. Combinatorial explosion is far greater than exponential growth. The formula for exponential growth can be expressed as y = 2x, whereas the formula for combinatorial explosion is y = x! (the factorial of x). For example, if x = 20, then

exponential growth: y = 2x = 220 = 2 x 2 x 2 x 2 x 2 x 2 x 2 x 2 x 2 x 2 x 2 x 2 x 2 x 2 x 2 x 2 x 2 x 2 x 2 x 2 = 1,048,576

combinatorial explosion: y = x! = 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10 x 11 x 12 x 13 x 14 x 15 x 16 x 17 x 18 x 19 x 20 = 2,432,902,008,176,640,000

With each added symbol, the number of combinations increases dramatically. Considering all the possible combinations would require immense computational resources over a considerable amount of time.

Even with these challenges, pattern-matching has remained the cornerstone of artificial intelligence, regardless of whether it is even in the same ballpark as human intelligence.