The Perceptron

In my previous post, "The Hidden Layers of a Neural Network," I presented a simple example of how multi-layer artificial neural networks learn. At a more basic level is the perceptron — a single-layer neural network. The perceptron is worth looking at because it sheds light on how individual neurons within a neural network function. If you know how a perceptron functions, you know how an artificial neuron functions.

A Perceptron's Structure and Function

A perceptron consists of five components:

● Inputs

● Weights

● Weighted sum

● Linear/binary activation function

● Bias


Basically, here's how a perceptron works:

  1. Inputs are multiplied by weights. Weights enable the perceptron to assign more importance to some inputs than others.
  2. The weighted values are totaled to create the weighted sum.
  3. Bias is added, if necessary, as another adjustment to ensure accurate output.
  4. Based on the weighted sum (and any bias), the activation function delivers the perceptron's output, which is linear or binary (yes or no, 1 or 0, cat or dog, etc.)

Weights and Bias

Weights and bias are primarily responsible for enabling machine learning in a neural network. The neural network can adjust the weights of the various inputs and the bias to improve the accuracy of its binary classification system.

For example, the figure below illustrates how the output function of a perceptron might draw a line to distinguish between pictures of cats and dogs. If one or more dog pictures ended up on the line or slightly below the line, bias could be used to adjust the position of the line so it more precisely separated the two groups.

Linear Problem

The Birth of the Perceptron

Frank Rosenblatt invented the perceptron in 1958 while working as a professor at Cornell University. He then used it to build a machine, called the Mark 1 Perceptron, which was designed for image recognition. The machine had an array of photocells connected randomly to neurons. Potentiometers were used to determine weights, and electric motors were used to update the weights during the learning phase.

Rosenblatt's goal was to train the machine to distinguish between two images. Unfortunately, it took thousands of tries, and even then the Mark I struggled to distinguish between distinctly different images.

The Fall and Rise of the Perceptron

While Rosenblatt was working on his Mark I Perceptron, MIT professor Marvin Minsky was pushing hard for a symbolic approach. Minsky and Rosenblatt debated passionately about which was the best approach to AI. The debates were almost like family arguments. They had attended the same high school and knew each other for decades.

In 1969 Minsky co-authored a book called Perceptrons: An Introduction to Computational Geometry with Seymour Papert. In it they argued decisively against the perceptron, showing that it would only ever be able to solve linearly separable functions and thus be able to distinguish between only two classes. Minsky and Papert also, mistakenly, claimed that the research being done on the perceptron was doomed to fail because of the perceptron's limitations.

Sadly, two years after the book was published, Rosenblatt died in a boating accident. Without Rosenblatt to defend perceptrons and with many experts in the field believing that research into the perceptron would be unproductive, funding for and interest in Rosenblatt's perceptron dried up for over a decade.

Not until the early 1980s did interest in the perceptron experience a resurgence, with the addition of a hidden layer in neural networks that enables these multi-layer neural networks to solve more complex problems.

Like people, machines can learn through supervised or unsupervised learning. With supervised learning, a human labels the data. So the machine has an advantage of knowing the human definition of the data. The human trainer gives the machine a stack of cat pictures and tells the machine, “These are cats.” With unsupervised learning, the machine figures out on its own how to cluster the data.

Consider the earlier example of the marching band neural network. Suppose you want the band to be able to classify whatever music it’s presented, and the band is unfamiliar with the different genres. If you give the band music by Merle Haggard, you want the band to identify it as country music. If you give the band a Led Zeppelin album, it should recognize it as rock.

To train the band using supervised learning, you give it a random subset of data called a training set. In this case, you provide two training sets — one with several country music songs and the other with several rock songs. You also label each training set with the category of songs — country and rock. You then provide the band with additional songs in each category and instruct it to classify each song. If the band makes a mistake, you correct it. Over time, the band (the machine) learns how to classify new songs accurately in these two categories.

But let's say that not all music can be so easily categorized. Some old rock music sounds an awful lot like folk music. Some folk music sounds a lot like the blues. In this case, you may want to try unsupervised learning. With unsupervised learning you give the band a large variety of songs — classical, folk, rock, jazz, rap, reggae, blues, heavy metal and so forth. Then you tell the band to categorize the music.

The band won't use terms like jazz, country, or classical. Instead it groups similar music together and applies its own labels, but the labels and groupings are likely to differ from the ones that you’re accustomed to. For example, the marching band may not distinguish between jazz and blues. It may also divide jazz music into two different categories, such as cool and classic.

Having your marching band create its own categories has advantages and disadvantages. The band may create categories that humans never imagined, and these categories may actually be much more accurate than existing categories. On the other hand, the marching band may create far too many categories or far too few for its system to be of use.

When starting your own AI project, think about how you'd like to categorize your data. If you already have well defined categories that you want the machine to use to classify input, you probably want to stick with supervised learning. If you’re unsure how to group and categorize the data or you want to look at the data in a new way, unsupervised learning is probably the better approach; it’s likely to enable the computer to identify similarities and differences you would probably overlook.

Prior to starting an AI project, the first choice you need to make is whether to use an expert system (a rules based system) or machine learning. Basically the choice comes down to the amount of data, the variation in that data and whether you have a clear set of steps for extracting a solution from that data. An expert system is best when you have a sequential problem and there are finite steps to find a solution. Machine learning is best when you want to move beyond memorizing sequential steps, and you need to analyze large volumes of data to make predictions or to identify patterns that you may not even know would provide insight — that is, when your problem contains a certain level of uncertainty.

Think about it in terms of an automated phone system.

Automated Phone System

Older phone systems are sort of like expert systems; a message tells the caller to press 1 for sales, 2 for customer service, 3 for technical support and 4 to speak to an operator. The system then routes the call to the proper department based on the number that the caller presses.

Newer, more advanced phone systems use natural language processing. When someone calls in, the message tells the caller to say what they’re calling about. A caller may say something like, “I’m having a problem with my Android smart phone,” and the system routes the call to technical support. If, instead, the caller said something like, “I want to upgrade my smartphone,” the system routes the call to sales.

The challenge with natural language processing is that what callers say and how they say it is uncertain. An angry caller may say something like “That smart phone I bought from you guys three days ago is a piece of junk.” You can see that this is a more complex problem. The automated phone system would need accurate speech recognition and then be able to infer the meaning of that statement so that it could direct the caller to the right department.

With an expert system, you would have to manually input all the possible statements and questions, and the system would still run into trouble when a caller mumbled or spoke with an accent or spoke in another language.

In this case, machine learning would be the better choice. With machine learning, the system would get smarter over time as it created its own patterns. If someone called in and said something like, “I hate my new smart phone and want to return it,” and they were routed to sales and then transferred to customer service, the system would know that the next time someone called and mentioned the word “return,” that call should be routed directly to customer service, not sales.

When you start an AI program, consider which approach is best for your specific use case. If you can draw a decision tree or flow chart to describe a specific task the computer must perform based on limited inputs, then an expert system is probably the best choice. It may be easier to set up and deploy, saving you time, money and the headaches of dealing with more complex systems. If, however, you’re dealing with massive amounts of data and a system that must adapt to changing inputs, then machine learning is probably the best choice.

Some AI experts mix these two approaches. They use an expert system to define some constraints and then use machine learning to experiment with different answers. So you have three choices — an expert system, machine learning or a combination of the two.

Fueling the rise of machine learning and deep learning is the availability of massive amounts of data, often referred to as big data. If you wanted to create an AI program to identify pictures of cats, you could access millions of cat images online. The same is true, or more true, of other types of data. Various organizations have access to vast amounts of data, including charge card transactions, user behaviors on websites, data from online games, published medical studies, satellite images, online maps, census reports, voter records, economic data and machine-generated data (from machines equipped with sensors that report the status of their operation and any problems they detect).

This treasure trove of data has given machine learning a huge advantage over symbolic systems. Having a neural network chew on gigabytes of data and report on it is much easier and quicker than having an expert identify and input patterns and reasoning schemas to enable the computer to deliver accurate responses.

In some ways the evolution of machine learning is similar to how online search engines evolved. Early on, users would consult website directories such as Yahoo! to find what they were looking for — directories that were created and maintained by humans. Website owners would submit their sites to Yahoo! and suggest the categories in which to place them. Yahoo! personnel would then vet the sites and add them to the directory or deny the request. The process was time-consuming and labor-intensive, but it worked well when the web had relatively few websites. When the thousands of websites proliferated into millions and then crossed the one billion threshold, the system broke down fairly quickly. Human beings couldn’t work quickly enough to keep the Yahoo! directories current.

In the mid-1990s Yahoo! partnered with a smaller company called Google that had developed a search engine to locate and categorize web pages. Google’s first search engine examined backlinks (pages that linked to a given page) to determine the relevance and authority of the given page and rank it accordingly in its search results. Since then, Google has developed additional algorithms to determine a page’s rank (or relevance); for example, the more users who enter the same search phrase and click the same link, the higher the ranking that page receives. This approach is similar to the way neurons in an artificial neural network strengthen their connections.

The fact that Google is one of the companies most enthusiastic about AI is no coincidence. The entire business has been built on using machines to interpret massive amounts of data. Rosenblatt's preceptrons could look through only a couple grainy images. Now we have processors that are at least a million times faster sorting through massive amounts of data to find content that’s most likely to be relevant to whatever a user searches for.

Deep learning architecture adds even more power, enabling machines to identify patterns in data that just a few decades ago would have been nearly imperceptible. With more layers in the neural network, it can perceive details that would go unnoticed by most humans. These deep learning artificial networks look at so much data and create so many new connections that it’s not even clear how these programs discover the patterns.

A deep learning neural network is like a black box swirling together computation and data to determine what it means to be a cat. No human knows how the network arrives at its decision. Is it the whiskers? Is it the ears? Or is it something about all cats that we humans are unable to see? In a sense, the deep learning network creates its own model for what it means to be a cat, a model that as of right now humans can only copy or read, but not understand or interpret.

In 2012, Google’s DeepMind project did just that. Developers fed 10 million random images from YouTube videos into a network that had over 1 billion neural connections running on 16,000 processors. They didn’t label any of the data. So the network didn’t know what it meant to be a cat, human or a car. Instead the network just looked through the images and came up with its own clusters. It found that many of the videos contained a very similar cluster. To the network this cluster looked like this.

Cat Detection

A “cat” from “Building high-level features using large scale unsupervised learning”

Now as a human you might recognize this as the face of a cat. To the neural network this was just a very common something that it saw in many of the videos. In a sense it invented its own interpretation of a cat. A human might go through and tell the network that this is a cat, but this isn’t necessary for the network to find cats in these videos. In fact the network was able to identify a “cat” 74.8% of the time. In a nod to Alan Turing, the Cato Institute’s Julian Sanchez called this the “Purring Test.”

If you decide to start working with AI, accept the fact that your network might be sensing things that humans are unable to perceive. Artificial intelligence is not the same as human intelligence, and even though we may reach the same conclusions, we’re definitely not going through the same process.

The Hidden Layers of a Neural Network

In my previous post "Artificial Neural Networks: The Basics," I explain what an artificial neural network (or simply a neural network) is and what it does. I also point out that what enables a neural network to perform its magic is the layering of neurons. A neural network consists of three layers of neurons — an input layer, one or more hidden layers, and an output layer.

The input and output layers are fairly self-explanatory. The input layer receives data from the outside world and passes it to the hidden layer(s) for processing. The output layer receives the processed data from the hidden layer(s) and coveys it in some way to the outside world.

However, what goes on in the hidden layer(s) is more mysterious.

The Purpose of the Hidden Layer

Early neural networks lacked a hidden layer. As a result, they were able to solve only linear problems. For example, suppose you needed a neural network to distinguish cats from dogs. A neural network without a hidden layer could perform this task. It could create a linear model like the one shown below and classify all input that characterizes a cat on one side of the line and all input that characterizes a dog on the other.

Linear Problem

However, if you had a more complex problem, such as distinguishing different breeds of dogs, this linear neural network would fail the test. You would need several layers to examine the various characteristics of each breed.

What Goes on in the Hidden Layers?

Suppose you have a neural network that can identify a dog's breed simply by "looking" at a picture of a dog. A neural network capable of learning to perform this task could be structured in many different ways, but consider the following (admittedly oversimplified) example of a neural network with several layers, each containing 20 neurons (or nodes).

When you feed a picture of a dog into this fictional neural network, the input layer creates a map of the pixels that comprise the image, recording their positions and grayscale values (zero for black, one for white, and between zero and one for different shades of gray). It then passes this map along to the 20 neurons that comprise the first hidden layer.

The 20 neurons in the first hidden layer look for patterns in the map that identify certain features. One neuron may identify the size of the dog; another, its overall shape; another, its eyes; another, its ears; another, its tail; and so forth. The first hidden layer then passes its results along to the 20 neurons in the second hidden layer.

The neurons in the second hidden layer are responsible for associating the patterns found in the first layer with features of the different breeds. The neurons in this layer may assign a percentage to reflect the probability that a certain feature in the image corresponds to different breeds. For example, based solely on the ears in the image, the breed is 20% likely to be a Doberman, 30% likely to be a poodle, and 50% likely to be a Labrador retriever. The second hidden layer passes its results along to the third hidden layer.

The neurons in the third hidden layer compile and analyze the results from the second hidden layer and, based on the collective probabilities of the dog being a certain breed, determine what that breed is most likely to be. This final determination is then delivered to the output layer, which presents the neural network's determination.

The Well-Connected Neurons

While the example I presented focuses on the layers of the neural network and the neurons (nodes) that comprise those layers, the connections between the neurons play a very important role in how the neural network learns and performs its task.

Every neuron in one layer is connected to every neuron in its neighboring layer. In the example I presented, that's 400 connections between each layer. The strength of each connection can be dialed up or down to change the relative importance of input from one neuron to another. For example, each neuron in the first hidden layer can dial up or down its connection with each neuron in the input layer to determine what it needs to focus on in the image, just as you might focus on different parts of an image.

When the neural network is being trained with a set of test data, it is given the answers — it is shown a picture of each breed and labeled with the name of the breed. During this training session, the neural network makes adjustments within the nodes and between the nodes (the connections). As the neural network is fed more and more images of dogs, it fine-tunes its connections and makes other adjustments to improve its accuracy over time.

Again, this example is oversimplified, but it gives you a general idea of how artificial neural networks operate. The key points to keep in mind are that artificial neural networks contain far more connections than they contain neurons, and that they learn by making adjustments within and between neurons.

Artificial Neural Networks: The Basics

An artificial neural network (often referred to simply as a neural network) is a computer system modeled after the structure of a biological brain that facilitates machine learning.

The human brain is composed of about 100 billion neurons that communicate with one another electrochemically across minute gaps called synapses. A single neuron can have up to 10,000 connections with other neurons. Working together, neurons are responsible for receiving sensory input from the external world, regulating bodily functions, controlling muscle movement, forming and recording memories and thoughts, and more.

Neurons increase the strength of their connections based on learning and practice. Whether you're studying a new language, learning to play a musical instrument, or training for the World Cup, your neurons strengthen existing connections and create new connections for developing the requisite knowledge and skills. That's why the more you practice the better you get; selected neurons build new and more efficient paths between and among one another. Eventually, with enough study and practice, you perform certain tasks with little to no conscious effort.

How an Artificial Neural Network Is Structured

Instead of being made up of neurons, an artificial neural network consists of nodes. Each node receives input from one or more other nodes or from an external source and computes an output. A node's output is then sent to one or more other nodes in the neural network or is communicated to the outside world. This communication might be as the answer to a question or as the solution to a problem.

Nodes are arranged in layers: an input layer, hidden layers, and an output layer. Data (such as a spoken word or phrase, an image, or a question) enters the input layer, is processed in the hidden layers, and the result is delivered via the output layer.

Artifical Neural Network

The Marching Band Analogy

Picture nodes in a neural network as players in a marching band and each row of band members as a layer. Assume that none of the players knows the music to be played or how to move during the performance. Only the front row of band members can see the band leader (the drum major). The drum major gives the first row a signal that's passed through the remaining rows (layers), enabling all players to coordinate their movements and the playing of their instruments.

Artificial Neural Network Band

At first, players would be bumping into one another and playing the wrong notes, but with more and more practice, the players would get in sync and perform as a unit. They would learn.

To smooth the learning curve, the band creates a system that enables band members to provide feedback. As they move and play, the band members choose numbers that indicate their level of confidence (say from 0 to 100 percent) that they are doing it right. Based on each band member's confidence level, neighboring band members make small adjustments and then check to see whether their adjustments increased or decreased their neighbor's confidence level. The goal is to achieve a 100 percent confidence level for all band members.

The idea here is that this neural marching band network will learn on its own without additional input or correction from an outside source. Theoretically, at least, the nodes will eventually make enough small adjustments to produce the correct output (a stellar performance) through trial and error, learning from their mistakes.

Expediting the Learning Process

As you can imagine, learning by trial and error can be very chaotic and time-consuming, especially when you have multiple entities making their own adjustments based on input from numerous other entities. In the case of our fictional marching band, band members would be bumping into one another and playing the wrong notes for hours, days, or weeks before they actually coordinated their efforts.

To overcome this challenge, AI developers attempt to tweak the network to make it more efficient. For example, suppose you gave more weight to feedback from the drummers because they set the rhythm. Perhaps you give their confidence level four times the importance as other band members. Now, when the band members make adjustments, they look more to the drummers to determine the net impact of the adjustments they made, and the marching band learns much faster.

Eventually, the band delivers a nicely choreographed and well-orchestrated performance to the output layer. If this were a neural network, the output could then be stored, and whenever instructed to do so, it could repeat its performance. In addition, the strengthened connections between certain neurons might make learning new musical arrangements easier.

Real-World Examples of Machine Learning Applications

In a previous post "What Is Machine Learning?" I discuss how machine learning developed as a way to overcome certain limitations in the early days of artificial intelligence. Without machine learning, machines would be able to do only what they were told or were programmed to do. Machine learning expands their capabilities beyond what they were merely programmed to do.

As shown below, machine learning has real-world applications across a wide variety of fields ranging from data security and software development to investing and healthcare.

Practical Applications of ML

One of the best ways to understand machine learning is to look at the various applications of machine learning in the real world:

Data security: Malware (viruses, worms, etc.) is constantly evolving to avoid detection, but changes to malware code typically constitute only about two to ten percent of code; the rest of the code remains unchanged. With machine learning, security software can identify patterns in the code and distinguish what has changed from what hasn't. This enables the software to identify new versions of malware. Machine learning is also useful for detecting early warning signs of infection from unknown malware, such as an unexplained drop in available system resources.

Investing: Machine learning algorithms drive about 70 percent of all trading volume on the U.S. stock exchanges. With machine learning, computers can process vast amounts of financial data and quickly analyze stocks, bonds, trading trends, and other information to identify which investments have the greatest potential for positive returns. Computers are also capable of executing trades faster than humanly possible, which may provide investors with another advantage.

Online software development: Software developers can use machine learning to create software that automatically adapts to user behaviors. For example, as someone who plays an online game becomes more skilled, the game can make itself more challenging. Developers can also use machine learning to identify ideas for new features and new ways to monetize the software.

Healthcare: It is highly unlikely that machines will replace doctors anytime soon, but machine learning has become a valuable tool in the healthcare field. Machine learning can identify patterns in medical images or symptoms to improve the accuracy of diagnoses and treatments. Machines may also be better at reviewing the medications a patient is taking and alerting the patient or pharmacist of possible drug interactions.

Personalized marketing: Companies have been using machine learning for some time to market their products and services to consumers. For example, Google and Amazon keep track of your search and purchase history in order to make targeted product recommendations. Netflix and Spotify use machine learning to recommend movies and music based on your viewing or listening history.

Fraud detection and prevention: Credit card companies keep track of where cardholders use their cards, what they buy, the average transaction amount, and more. These companies then use machine learning algorithms to identify any transactions that break the cardholder's usage patterns. Any suspicious activity triggers a fraud alert and possibly an automatic suspension of the account. The cardholder may then be required to call the credit card company to have the suspension lifted.

Online searches: Google, Bing, Yahoo!, and other search engines use machine learning to rank items in their search results, which is why search results typically differ based on several factors, including your browser's search history, your current geographical location, and the relevance of various websites to the search word or phrase. If you use your smartphone to search for "grocery store," for example, you're likely to be presented a list of grocery stores in your general vicinity.

Smart devices: Smart devices collect data regarding their usage, then personalize their operation based on those patterns. For example, a smart home may learn that whenever you unlock the front door at a certain time in the evening, it means you have returned home from work. The smart lock then signals the smart thermostat to adjust the temperature accordingly. Smart devices may even use facial recognition technology and security cameras to identify a home's residents and then warn the homeowner (or notify police) if someone other than a resident approaches or enters the home at certain times.

Self-driving cars: Self-driving cars have made the transition from science fiction to the real world. By combining machine learning, video, GPS, robotics, and a host of other technologies, cars can now drive themselves, although some mishaps have occurred.

These are only a few of the vast number of machine learning applications that are possible. As machine learning matures, you are likely to see many more real-world applications and consumer products and services driven by machine learning.

The symbolic approach and AI planning work great for applications that have a limited number of matching patterns; for example, a program that helps you complete your tax return. The IRS provides a limited number of forms and a collection of rules for reporting tax-relevant data. Combine the forms and instructions with the capability to crunch numbers and some heuristic reasoning, and you have a tax program that can step you through the process. With heuristic reasoning, introduced in the previous chapter, you can limit the number of patterns; for example, if you earned money from an employer, you complete a W-2 form. If you earned money as a sole proprietor, you complete Schedule C.

The limitation with this approach is that the database is difficult to manage, especially when rules and patterns change. For example, malware (viruses, spyware, computer worms and so forth) evolve too quickly for anti-malware companies to manually update their databases. Likewise, digital personal assistants, such as Siri and Alexa, need to constantly adapt to unfamiliar requests from their owners.

To overcome these limitations, early AI researchers started to wonder whether computers could be programmed to learn new patterns. Their curiosity led to the birth of machine learning — the science of getting computers to do things they weren't specifically programmed to do.

Machine learning got its start very shortly after the first AI conference. In 1959, AI researcher Arthur Samuel created a program that could play checkers. This program was different. It was designed to play against itself so it could learn how to improve. It learned new strategies from each game it played and after a short period of time began to consistently beat its own programmer.

A key advantage of machine learning is that it doesn't require an expert to create symbolic patterns and list out all the possible responses to a question or statement. On its own, the machine creates and maintains the list, identifying patterns and adding them to its database.

Imagine machine learning applied to the Chinese room experiment. The computer would observe the passing of notes between itself and the person outside the room. After examining thousands of exchanges, the computer identifies a pattern of communication and adds common words and phrases to its database. Now, it can use its collection of words and phrases to more quickly decipher the notes it receives and quickly assemble a response using these words and phrases instead of having to assemble a response from a collection of characters. It may even create its own dictionary based on these matching patterns, so it has a complete response to certain notes it receives.

Machine learning still qualifies as weak AI, because the computer doesn't understand what's being said; it only matches symbols and identifies patterns. The big difference is that instead of having an expert provide the patterns, the computer identifies patterns in the data. Over time, the computer becomes "smarter."

Machine learning has become one of the fastest growing areas in AI primarily because the cost of data storage and processing has dropped dramatically. We are currently in the era of data science and big data — extremely large data sets that can be computer analyzed to reveal patterns, trends and associations. Organizations are collecting vast amounts of data. The big challenge is to figure out what to do with all this data. Answering that challenge is machine learning, which can identify patterns even when you really don't know what you're looking for. In a sense, machine learning enables computers to find out what's inside your data and let you know what it found.

Machine learning moves past the limitations with symbolic systems. Instead of memorizing symbols a computer system uses machine learning algorithms to create models of abstract concepts. It detects statistical patterns by using machine learning algorithms on massive amounts of data.

Statistical Dog

So a machine learning algorithm looks at the eight pictures of different dogs. Then it breaks down these pictures into individual dots or pixels. Then it looks at these pixels to detect patterns. Maybe it sees a pattern all of these animals as having hair. Maybe it sees a pattern for noses or ears. It could even see a pattern that humans are unable to perceive. Collectively, the patterns create what might be considered a statistical expression of “dogness.”

Sometimes humans can help machines learn. We can feed the machine millions of pictures that we’ve already determined contained dogs, so the machine doesn’t have to worry about excluding images of cats, horses or airplanes. This is called supervised learning, and the data, consisting of the label “dog” and the millions of pictures of dogs is called a training set. Using the training set, a human being is teaching the machine that all of the patterns it identifies are characteristics of “dog.”

Machines can also learn completely on their own. We just feed massive amounts of data into the machine and let it find its own patterns. This is called unsupervised learning.

Imagine a machine examining all the pictures of people on your smart phone. It might not know if someone was your husband, wife, boyfriend or girlfriend. But it could create clusters of people that it sees are closest to you.

Combining Memorization with Generalization and Specification

In one of my previous posts "The General Problem Solver," I discussed an approach to artificial intelligence (AI) referred to as the physical symbol system hypothesis (PSSH). The theory behind this approach is that human intelligence consists of the ability to take symbols (recognizable patterns), combine them into structures (expressions), and manipulate them using various processes to produce new expressions.

As philosopher John Searle pointed out with his Chinese room argument, this ability, in and of itself, does not constitute intelligence, because it requires no understanding of the symbols or expressions. For example, if you ask your virtual assistant (Siri, Alexa, Bixby, Cortana, etc.) a question, it searches through a list of possible responses, chooses one, and provides that as the answer. It doesn't understand the question and has no desire to answer the question or to provide the correct answer.

AI built on the PSSH is proficient at storing lots of data (memorization) and pattern-matching, but it is not so good at learning. Learning requires the ability not only to memorize but also to generalize and specify.

Memorization + Generalization + Specification = Learning

Human evolution has made us experts at memorization, generalization, and specification. To a large degree, it is how we learn. We record (memorize) details about our environment, experiences, thoughts, and feelings; form generalizations that enable us to respond appropriately to similar environments and experiences in the future; and then fine-tune our impressions through specification. For example, if you try to pet a dog, and it snaps at you, you may generalize to avoid dogs in the future. However, over time, you develop more nuanced thoughts about dogs—that not all dogs snap when you try to pet them and that there are certain ways to approach dogs that make them less likely to snap at you.

The combination of memorization, generalization, and specification is a valuable survival skill. It also plays a key role in enabling machine learning — providing machines with the ability to recognize unfamiliar patterns based on what they already "know" about familiar patterns.


Look at the following eight images. You can tell that the eight images represent different breeds of dogs, even though these aren't photographs of actual dogs.

Eight Dogs

You know that these are all pictures of dogs, because you have encountered many dogs in your life (in person, in photos and drawings, and in videos), and you have formed in your mind an abstract idea of what a dog looks like.

In their early stages of learning, children often overgeneralize. For example, upon learning the word "dog," they call all furry creatures with four legs "dog." To these children, cats are dogs, cows are dogs, sheep are dogs, and so on. This is where specification comes into play. As children encounter different species of four-legged mammals, they begin to identify the qualities that make them distinct.

Computers are far better at memorization and far worse at generalization and specification. The computer could easily memorize the eight images of dogs, but if you fed the computer a ninth image of a dog, as shown below, it would likely struggle to match it to one of the existing images and identify it as an image of a dog. In other words, the computer would quickly master memorization and pattern-matching but struggle to learn due to its inability to generalize and specify.

For this same reason, language translation programs have always struggled with accuracy. Developers have created physical symbol systems that translate words and phrases from one language to another, but these programs never really learn the language. Instead, they function merely as an old-fashioned foreign language dictionary and phrasebook — quickly looking up words and phrases in the source language to find the matching word or phrase in the destination language and then stitching together the words and phrases to provide the translation. These translation programs often fail when they encounter unfamiliar words, phrases, and even syntax (word order).

Making Machines That Can Learn

Currently, machines have the ability to learn. The big challenge in the future of artificial intelligence and machine learning will be to enable machines to do a better job of generalizing and specifying. Instead of merely matching memorized symbols, newer machines will create abstract models based on the patterns they observe (the patterns they are fed). These models will have the potential to help these machines learn more effectively, so they can more accurately interpret unfamiliar future input.

As machines become better at generalizing and specializing, they will achieve greater levels of intelligence. It remains to be seen, however, whether machines will ever have the capacity to develop self-awareness and self-determination — key characteristics of human intelligence.

Artificial Intelligence Planning

Artificial intelligence planning is a branch of AI whose purpose is to identify strategies and action sequences that will, with a reasonable degree of confidence, enable the AI program to deliver the correct answer, solution, or outcome.

As I explained in a previous post "The General Problem Solver," one of the limitations of early AI, which was based on the physical symbol system hypothesis (PSSH), is combinatorial explosion — a mathematical phenomenon in which the number of possible combinations increases beyond the computer's capability to explore all of them in a reasonable amount of time.

Heuristic Reasoning

AI planning attempts to solve the problem of combinatorial explosion by using something called heuristic reasoning — an approach that attempts to give artificial intelligence a form of common sense. Heuristic reasoning enables an AI program to rule out a large number of possible combinations by identifying them as impossible or highly unlikely. This approach is sometimes referred to as "limiting the search space."

A heuristic is a mental shortcut or rule-of-thumb that enables people to solve problems and make decisions quickly. For example, the Rule of 72 is a heuristic for estimating the number of years it would take an investment to double your money. You divide 72 by the rate of return, so an investment with a 6% rate of return would double your money in about 72/6 = 12 years.

Heuristic reasoning is common in innovation. Inventors rarely consider all the possibilities for solving a particular problem. Instead, they start with an idea, a hypothesis, or a hunch based on their knowledge and prior experience, then they start experimenting and exploring from that point forward. If they were to consider all the possibilities, they would waste considerable time, effort, energy, and expertise on futile experiments and research.

Heuristic Reasoning Combined with a Physical Symbol System

With AI planning, you might combine heuristic reasoning with a physical symbol system to improve performance. For example, imagine heuristic reasoning applied to the Chinese room experiment I introduced in my previous post on the general problem solver.

In the Chinese room scenario, you, an English-only speaker, are locked in a room with a narrow slot on the door through which notes can pass. You have a book filled with long lists of statements in Chinese, and the floor is covered in Chinese characters. You are instructed that upon receiving a certain sequence of Chinese characters, you are to look up a corresponding response in the book and, using the characters strewn about the floor, formulate your response.


What you do in the Chinese room is very similar to how AI programs work. They simply identify patterns, look up entries in a database that correspond to those patterns, and output the entries in response.

With the addition of heuristic reasoning, AI could limit the possibilities of the first note. For example, you could program the software to expect a message such as "Hello" or "How are you?” In effect, this would limit the search space, so that the AI program had to search only a limited number of records in its database to find an appropriate response. It wouldn't get bogged down searching the entire database to consider all possible messages and responses.

The only drawback is that if the first message was not one of those that was anticipated, the AI program would need to search its entire database.

A Real-World Example

Heuristic reasoning is commonly employed in modern AI applications. For example, if you enter your location and destination in a GPS app, the app doesn't search its vast database of source data, which consists of satellite and aerial imagery; state, city, and county maps; the US Geological Survey; traffic data; and so on. Instead, it limits the search space to the area that encompasses the location and destination you entered. In addition, it limits the output to the fastest or shortest route (not both) depending on which setting is in force, and it likely omits a great deal of detail from its maps to further expedite the process.

The goal is to deliver an accurate map and directions, in a reasonable amount of time, that lead you from your current location to your desired destination as quickly as possible. Without the shortcuts to the process provided by heuristic reasoning, the resulting combinatorial explosion would leave you waiting for directions . . . possibly for the rest of your life.

Good Old-Fashioned AI

Even though many of the modern AI applications are built on what are now considered old-fashioned methods, AI planning allows for the intelligent combination of these methods, along with newer methods, to build AI applications that deliver the desired output. The resulting applications can certainly make computers appear to be intelligent beings — providing real-time guidance from point A to point B, analyzing contracts, automating logistics, and even building better video games.

If you're considering a new AI project, don't be quick to dismiss the benefits of good old-fashioned AI (GOFAI). Newer approaches may not be the right fit.