Dark logo

Choosing the Right Machine Learning Algorithm

Published August 8, 2021
Doug Rose
Author | Agility | Artificial Intelligence | Data Ethics

In my previous article, Machine Learning Algorithms, I explain what machine-learning algorithms are and describe the following commonly used algorithms:

  • Decision trees
  • K-nearest neighbor
  • K-means clustering
  • Regression analysis
  • Naïve Bayes

Based on the descriptions of the machine learning algorithms I presented in that post, you could already start to figure out which algorithm would be best for answering a certain type of question or solving a certain type of problem. In this article, I provide some additional guidance.

General Guidelines

Your choice of algorithm generally depends on what you want the algorithm to do:

  • Decision: If you want the machine to make a decision, choose the best course of action, or draw a conclusion based on the evidence provided, a decision tree algorithm is probably the best choice.
  • Classification and Clustering: If you want the machine to classify, categorize, or group, then you'll want to consider a classification algorithm, such as K-nearest neighbor, K-means clustering (for grouping), and Naïve Bayes.
  • Prediction/Estimation: If you want the machine to predict a value in a continuous range of values, a regression algorithm is best — linear or logical regression.

When choosing an algorithm, consider a more empirical (experimental) approach. After narrowing your choice to two or more algorithms, you can train and test the machine using each algorithm with the data you have and see which one delivers the most accurate results. For example, if you're looking at a classification problem, you can run your training data on K-nearest neighbor and Naïve Bayes and then run your test data through each of them to see which one is best able to accurately predict which class a particular unclassified entity belongs to.

Taking a More Systematic Approach

There is a more formal method for choosing a machine-learning algorithm, as presented in the following sections.

 Step 1: Categorize the Problem

The first step is to figure out the nature of the problem you are trying to solve via machine learning. Categorize the problem by both input and output:

1. Categorize the problem by input:

  • Supervised learning, if the data is labeled.
  • Unsupervised learning, if the data is unlabeled and your goal is to discover hidden patterns in the data.
  • Reinforcement learning, if your goal is to optimize a certain function of the machine by interacting with a given environment, such as learning to play a game.

2. Categorize the problem by output:

  • Regression, if the model’s output is a number.
  • Classification, if the model’s output is a class.
  • Clustering, if the model’s output is a set of groups.

Step 2: Examine Your Data

The data you have also informs your choice of machine-learning algorithm:

  • Data quantity: Some algorithms perform well on small data sets, whereas others require very large data sets. For example, linear/logistic regression and naïve Bayes algorithms (with only a few parameters) may work with certain small data sets. Reinforcement learning may also work well with a small data set, because the machine will generate the data it needs through trial and error. In contrast, using unsupervised learning to solve a clustering problem typically requires a very large data set.
  • Descriptive (summary) statistics: Statistics that describe your data, such as percentiles, averages, medians, and correlations can be valuable in identifying the right machine-learning algorithm. For example, if two variables have a strong correlation, a linear regression algorithm would probably work best.
  • Data visualizations: Chart (graph) your data in various ways to identify relationships, spreads, and outliers. For example, a scatter plot may reveal several groupings of data points, which would suggest that K-means clustering or K-nearest neighbor is likely to be an effective algorithm.

Step 3: Consider the Constraints

Conditions beyond your control may influence your choice of machine-learning algorithm. For example:

  • Limited storage or compute capacity may prevent the use of very large data sets.
  • The speed at which the machine needs to be able to learn may require a training model that supports fast learning. For example, you may want to train the same model on different data sets.
  • The speed at which the machine needs to make predictions can also influence your choice of algorithm. For example, a driverless car needs to be able to make split-second decisions.

Also, ask the following questions:

  • How accurate does the model need to be?
  • How complex is the model?
  • How scalable is the model?

Step 4: Choose an Algorithm

The final step involves making your choice. The following table provides a list of algorithms along with specific use cases in which each application may be most suitable, as well as the pros and cons of each algorithm.

Remember, prior to building a machine learning model, it is always wise to consult others on your data science team, particularly your resident data scientist, if you are fortunate enough to have one. Choosing a machine learning algorithm is a combination of art and science, so you’re likely to benefit by having someone look at the problem from another perspective.

Related Posts
January 8, 2018
Turing's Imitation Game

Turing's imitation game considers the question of whether an artificial intelligence agent is truly intelligent. Alan Turing proposed this to test the computing power of artificial general intelligence.

Read More
August 9, 2021
Overcome Artificial Intelligence Challenges

Overcome artificial intelligence challenges and embrace data science as a way to get value from AI and machine learning.

Read More
January 15, 2018
Solve General Problems with Artificial Intelligence

The General Problem Solver In a previous post entitled "Playing the Imitation Game," I discussed Alan Turing's vision, published in 1936, of a single, universal machine that could be programmed to solve any particular problem. In 1959, Allen Newell and Herbert A. Simon took a different approach. Their goal was to develop a computer program […]

Read More
1 2 3 13
9450 SW Gemini Drive #32865
Beaverton, Oregon, 97008-7105
Dark logo
© 2022 Doug Enterprises, LLC All Rights Reserved
linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram