Machine Learning Dos and Don'ts

Published August 9, 2021

Doug Rose

Author | Agility | Artificial Intelligence | Data Ethics

I have worked with several organizations over the years helping them implement machine learning, often after failed attempts to do so on their own. It is no surprise that the organizations that succeed generally do everything right and those that fail often do so as a result of making common mistakes. In this post, I present machine learning dos and don’ts to increase your chances of achieving a successful launch of your machine learning initiative.

Do Start by Asking Relevant, Compelling Questions

Before you even start to introduce machine learning to your organization, you need to find a way to connect your organization’s business needs to machine-learning technology. Otherwise, you are likely to put a program in place and assemble a team with the requisite technical expertise only to find them spending all their time just playing around with the technology.

To avoid this common mistake, take the following steps:

Educate C-level executives and all managers on the topic of machine learning with an emphasis on use cases that are relevant to the industry in which your organization operates. Case studies are a great way to get leadership thinking about practical applications of machine learning in the organization.
Schedule regular meetings, encouraging participants to bring to the meeting relevant and compelling questions they are struggling to answer, problems they are trying to solve, or business insights that would make the organization more competitive.
Create a list of problems, questions, and desired insights; prioritize the items on the list; and then consider which technology would be the most effective for addressing each item. Flag items on the list which are likely to benefit from machine learning.
Create another list of processes or procedures that may benefit from machine-learning-driven automation — tasks such as preventing unauthorized access to the organization’s information system or weeding out unqualified job applicants.

Keep in mind that the best technology isn't necessarily machine learning. Your organization may be able to answer most questions and solve most problems and gain valuable insights with the use of a data warehouse and good business intelligence (BI) software. It may not need a dedicated machine-learning team.

Don't Mix Training Data with Test Data

Machine learning often involves supervised learning — feeding the machine labeled data, so the machine can learn the connection between the labels and the data inputs. A common mistake is to mix some of the training data into the test data, which is often tempting when the availability of relevant data is limited. To avoid this mistake, before you engage in supervised learning, create two separate data sets:

Training data helps the machine figure out the relationships between inputs and outputs — between the data that is input into the system and the label you want the machine to output. For example, if you are creating a machine that will be able to identify different pieces of fruit, you may show it a banana and tell it, “This is a banana.” You show it an apple and say “This is an apple.” During the training session, the machine learns to associate the label “banana” with images of bananas, “apple” with images of apples, and so on.
Test data gauges the ability of the machine to make accurate predictions of outcomes based on unfamiliar inputs and enables the machine to make adjustments to improve the accuracy of its predictions. For example, if you show the machine a pear, and it identifies the pear as an apple, you correct the machine by telling it, “No, this is a pear.” The machine learns from these corrections.

If, after training the machine, you mix some of your training data in with your test data, you won't have a clear picture of how well the machine performed on the test. It would be like giving students a sheet of paper with some of the test questions and their correct answers just before they take the test. The test results wouldn't accurately represent what they had learned or where they were struggling.

Do Know Your Algorithms and Functions

Algorithms and functions are the engines that drive machine learning, and data is the fuel. Although the machine does the learning and ultimately creates the model that the computer follows to perform the desired task, it is up to you to construct a “brain” that enables the learning process. The building blocks you have to work with are algorithms and functions:

Algorithm: An algorithm is a process (a series of steps) for performing a task or calculation. Algorithms are often conceptual (not concrete). They can be generalized to work with a variety of inputs.
Function: A function is an implementation of an algorithm that shows the relationship between inputs and outputs. By definition, a function is a special mathematical relationship in which every input has a single output. In contrast to algorithms, functions are concrete, such as a block of code. In addition, functions can be graphed to show the relationships between inputs and outputs, as shown below.

When you are building a machine that can learn, you need to be familiar with a wide variety of algorithms and functions, so you will know which ones to choose and how to arrange them.

Do Manage Expectations

After training a machine, you may be tempted to show it off — to demonstrate that your model can actually do something useful and perhaps even amazing. You collect your test data and schedule a presentation to demonstrate the power and precision of your new machine-learning model.

Whoa! This irrational exuberance can end in disaster, maybe not during the presentation but afterward, when someone in your audience uses the model and it misses the mark.

You can avoid the potential embarrassment by running your new model on test data first, so the machine can adjust the model, if necessary, to improve its accuracy. Several rounds of testing (with different test data) and adjustments may be required before your model is ready for prime time.

Of course, there are other pitfalls that you would be wise to avoid when starting out with machine learning, but by steering clear of the common pitfalls covered in this post, you will be far ahead of the game!

Machine Learning Dos and Don'ts

Do Start by Asking Relevant, Compelling Questions

Don't Mix Training Data with Test Data

Do Know Your Algorithms and Functions

Do Manage Expectations

Quick Links

Contact

Follow Me On