Dark logo

The Role of Big Data in Machine Learning

Published August 5, 2021
Doug Rose
Author | Agility | Artificial Intelligence | Data Ethics

In one of my previous articles What Is Machine Learning? I explain the basics of machine learning (ML) and point out the fact that big data plays a key role in ML. However, I stopped short of explaining the connection between ML and big data in detail. In this article, I take a deeper dive into the important role that big data plays in machine learning.

Key Machine Learning Components

Machine learning requires the following four key components:

  1. An input device to bring data from the outside world into the machine in a digital format.
  2. One or (usually more) powerful computer processors running in parallel.
  3. A number of machine learning algorithms that run on those processors. (An algorithm is a process or set of rules to be followed in calculations or other problem-solving operations.) The combination of processing power and algorithm constitutes an artificial neuron — the smallest unit in an artificial neural network. These neurons must be arranged in layers — an input layer, one or more hidden layers, and an output layer.
  4. Data sets that the machine can use to identify patterns in the data. The more high-quality data the machine has, the more it can fine-tune its ability to identify patterns and anything that diverges from those patterns.

Three Types of Machine Learning

Machines learn in the following three ways:

  1. Supervised learning: With supervised learning, a trainer feeds a set of labeled data into the computer. This enables the computer to then identify patterns in that data and associate them with the labels provided. For example, a trainer may feed in photographs of 20 cats and tell the machine, "These are cats." She then feeds in 20 photos of dogs and tells the machine, "These are dogs." Finally, she feeds the machine 20 random photographs of dogs and cats without telling the machine whether each picture is of a dog or a cat. When the machine makes an error, the trainer corrects it, so the neural network can tune itself to greater accuracy.
  2. Unsupervised learning: With unsupervised learning, data that is neither classified nor labeled is fed into the system. The system then identifies hidden patterns in the data that humans may be unable to detect or may have overlooked. Unsupervised learning is primarily used for clustering. For example, you'd feed in 100 photos of animals and tell the machine to divide them into five groups. The machine then looks for matching patterns in the photos and creates five groups based on similarities and differences in the photos. These may be groups a human would recognize, such as cat, dog, snake, octopus, and elephant, or they may be groups you would never imagine, such as snakes, dogs, and cats all being in the same group because the neural network focused on the cat and dog tails instead of other features.
  3. Semi-supervised learning: This is a cross between supervised and unsupervised learning. Supervised learning is used initially to train the system on a small data set, then a large amount of unlabeled data is fed into the system to increase its accuracy.

Big Data Fuels Machine Learning

As you can see, data is important for machine learning, but that is no surprise; data also drives human learning and understanding. Imagine trying to learn anything while floating in a deprivation tank; without sensory, intellectual, or emotional stimulation, learning would cease. Likewise, machines require input to develop their ability to identify patterns in data.

The availability of big data (massive and growing volumes of diverse data) has driven the development of machine learning by providing computers with the volume and types of data they need to learn and perform specific tasks. Just think of all the data that is now collected and stored — from credit and debit card transactions, user behaviors on websites, online gaming, published medical studies, satellite images, online maps, census reports, voter records, financial reports, and electronic devices (machines equipped with sensors that report the status of their operation).

This treasure trove of data has given neural networks a huge advantage over the physical-symbol-systems approach to machine learning. Having a neural network chew on gigabytes of data and report on it is much easier and quicker than having an expert identify and input patterns and reasoning schemas to enable the computer to deliver accurate responses (as is done with the physical symbol systems approach to machine learning).

The Evolution of Machine Learning

In some ways, the evolution of machine learning is similar to how online search engines developed over time. Early on, users would consult website directories such as Yahoo! to find what they were looking for — directories that were created and maintained by humans. Website owners would submit their sites to Yahoo! and suggest the categories in which to place them. Yahoo! personnel would then review the user recommendations and add them to the directory or deny the request. The process was time-consuming and labor-intensive, but it worked well when the web had relatively few websites. When the thousands of websites proliferated into millions and then crossed the one billion threshold, the system broke down fairly quickly. Human beings couldn’t work quickly enough to keep the Yahoo! directories current.

In the mid-1990s Yahoo! partnered with a smaller company called Google that had developed a search engine to locate and categorize web pages. Google’s first search engine examined backlinks (pages that linked to a given page) to determine each page's relevance and relative importance. Since then, Google has developed additional algorithms to determine a page’s rank; for example, the more users who enter the same search phrase and click the same link, the higher the ranking that page receives. With the addition of machine learning algorithms, the accuracy of such systems increases proportionate to the volume of data they have to draw on.

So, what can we expect for the future of machine learning? The growth of big data isn't expected to slow down any time soon. In fact, it is expected to accelerate. As the volume and diversity of data expand, you can expect to see the applications for machine learning grow substantially, as well.

Related Posts
February 12, 2018
The Symbolic Systems Approach

The symbolic systems approach is one of the original approaches to artificial intelligence. Symbolic AI is rule-based approach that has become less popular than machine learning.

Read More
April 24, 2016
Who Will Teach Machines Right from Wrong?

Right now we focus on capacity, but we need to teach machines right from wrong. This is a difficult data ethics challenge in artificial intelligence. How to build a moral machine?

Read More
August 8, 2021
How to Ask Data Science Questions

With machine learning you have to create a culture that can ask data science questions.

Read More
1 2 3 13
9450 SW Gemini Drive #32865
Beaverton, Oregon, 97008-7105
Dark logo
© 2022 Doug Enterprises, LLC All Rights Reserved
linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram