The frequency of headlines related to advancements in machine learning is increasing, bringing the dreams of science fiction fans to reality. For those less interested in the "wow" factor and more so in the applications of smarter AI to make work more efficient and ease various aspects of human life, the terms related to machine learning can be confusing. There are different approaches to training AI to complete the tasks humans give them, depending on the AI itself (a physical system, like a robot, or a model like ChatGPT). This article will briefly introduce the basic types of machine learning and then examine model-based reinforcement learning in more detail.

Basic machine learning types

Data is at the center of machine learning and forms the basis of how models learn to interact with their environments, whether they be a sidewalk or text inputs. In both cases, researchers feed data to the model to teach it to make the best decisions. The central question, however, concerns the state of the data that is used for the learning process, and the three basic machine learning types. All of these approaches reflect human learning to some degree, which you'll see in the explanations below.

Unsupervised learning

In unsupervised learning, a machine model analyzes unlabeled data, meaning it lacks any specific definitions or labels to identify internal patterns, clusters, or hidden factors that may be present in the data. The goal of this approach is to understand the data and determine their structure and relationships between objects and features. Unsupervised learning benefits further data analysis, decision-making, and supporting other tasks, including supervised learning, based on patterns and data structures discovered in the process.

This is similar to how a child may learn a new language. They don’t have a teacher or dictionary to teach them how to use the language in their environment. Instead, they observe and listen, trying to establish connections and understand the rules independently. At first, the child notices tone and emotion in the voices of those around them—some phrases are spoken with great excitement, others with pleasure, and some with sadness. After building these initial associations, the child notices the contexts in which these words and phrases are typically used. They begin to understand that certain words and phrases are used only in specific situations or by certain people. They’ve gathered unlabeled data and have made conclusions to complete their understanding. This is unsupervised learning.

Unsupervised machine learning

Supervised learning

In supervised learning, the model is trained based on labeled data. The data the model encounters corresponds to a known correct label. The model uses this information to extract patterns and create connections between input data and their corresponding labels to learn how to predict them when the model encounters new, unlabeled input data.

Returning to our child from above, supervised learning is applied when it’s time to teach them certain skills or concepts. You provide examples and explanations of correct answers, and the child generalizes these examples and applies them to new situations. When teaching a child to read, you show them the correct pronunciation of letters and words and explain their correct use in a particular context. Then they apply this knowledge when reading new texts or writing their own.

Supervised machine learning

To delve deeper into these two paradigms, check out our article on machine learning.

Reinforcement learning

The unsupervised and supervised learning approaches typically involve static datasets, but when models need to deal with a dynamic environment, researchers use another method, reinforcement learning. The goal of this approach is to find the best sequence of actions that will generate the optimal outcome, which in reinforcement learning means to collect the most reward.

In this technique, an agent explores and interacts with an environment and receives positive or negative rewards that help it understand what actions to take in the future. The agent's "brain," called a policy, takes inputs or observations from the environment and matches them with outputs to take an action. A reinforcement learning algorithm is a neural network that analyzes the data from the actions, observations from the environment, and rewards to determine the model's best course of action.

This is similar to how humans interact with our environment. We take an action, experience a state or make observations, and receive a reward. For example, if we go to bed early and get a full night’s rest, we experience feeling energetic and our reward is being productive, enjoying the day, etc. We learn to get a full night's sleep. If we go to bed late and sleep less than we need to, we receive a negative reward, being unproductive. We also, hopefully, learn from this experience and decide not to stay up late anymore.

📖 To learn about how a model learns by comparing different properties of one object to another, take a look at this article.

Reinforcement machine learning

Reinforcement learning: breakdown

There are two types of reinforcement learning: model-free and model-based. We’ll explore both of these in more detail, but first, let's examine two terms that we’ve already used and which are important to understanding machine learning in general and reinforcement learning in particular.

  • Agent: An agent is what’s interacting with an environment.
  • Environment: An environment doesn’t refer to the physical environment around us. It doesn’t have to be a real, physical place. In reinforcement learning, the environment refers to that which is external to the agent.
  • Models: A model refers to a simulated experience that an agent can use to understand an environment without taking any action or without having to take all possible actions within the environment.

Here's an example using human learning. Imagine you, the agent, want to go for a walk outside, the environment. You look at the weather forecast, and it shows there’s a 50% chance of rain. Now, here’s where the models come into play. If you've been told about rain and how unpleasant it is to get wet, you’ll take your umbrella with you on your walk. In this example, you haven't experienced rain before, but through a simulation (someone's told you about it, and you imagine it), you understand that the rain could spoil your time outside. You understand what the environment will be like if it rains without having to experience it at this moment.

This is the model-based approach in reinforcement learning. A model-free approach would look like this:

You don't know what rain is, and no one has ever told you about it. So, you go for a walk, it starts to train, and you begin to collect rewards from the environment. You get wet, you get cold, you feel bad. You’re learning through direct interaction with the environment.

To sum up, reinforcement learning involves an agent exploring and interacting with an environment and receiving rewards that help it choose the most optimal course of action. Model-free reinforcement learning implies direct interaction with the environment. It's a straightforward approach to machine learning that works best with software systems. Since models can't always accurately represent an environment, it's sometimes easier to let an agent do the exploration itself. It uses the environment to check its policy and correct it if necessary.

Model-based reinforcement learning

The difference between model-based and model-free reinforcement learning is that the agent receives a model of the environment without directly interacting with it. Like in our example above, you have a model of what rain is, so you take your umbrella with you when you go outside. Model-based reinforcement learning offers some benefits over the model-free approach. These include:

  • Sample efficiency: the model lets researchers feed proper data to the AI
  • Safer for hardware: since it’s a simulation, the environment can't harm a physical AI system
  • Speed: models reduce the learning time
  • All environments are possible: researchers can create models of environments that are difficult to create in the real world

These benefits, especially those involving the environment, make model-based reinforcement learning a better option for systems placed in hardware. For example, a lab testing a robot or self-driving car would choose this approach to avoid damaging the hardware or others during the initial learning process. The robot won’t fall, and the car won’t crash. However, the major drawback of this approach is the tremendous resources required to create and run a model (simulation).

Sub-approaches to model-based reinforcement learning

Now that you have a clear understanding of model-based reinforcement learning, let's delve a bit deeper into two sub-approaches that researchers use when training models. One of these is agent-based modeling. Now, if we look at agent-based modeling vs reinforcement learning in the terms we described above, the key difference is that an agent-based model is a computer simulation that allows people to study the interaction between agents. In this case, agents can be people, things, places, and time. Each of the individual agents is assigned separate attributes and is designed to act in a certain way toward their environment and other agents. This allows researchers to study complex systems in experiments that may be impossible to realize today due to resource constraints or their unethical nature.

Another of these sub-approaches is model-based deep reinforcement learning. The types of machine learning we examined earlier all assume human involvement in the organization of the datasets. Even in unsupervised learning, where the data is left unlabeled, a human still manipulates it to match a specific format for the model to use. In deep reinforcement learning, neural networks remove some of the responsibilities of human experts by analyzing unlabeled and unstructured data and extracting what they need from it.

Both sub-approaches follow reinforcement learning principles while applying updated technologies and processes to produce better results. Some model-based reinforcement learning examples connected with these two methods include health researchers modeling how diseases may spread without infecting anyone to experiment with agent-based modeling. Deep learning is used to increase an agent’s learning speed in self-driving cars. More research is required to prove the effectiveness of these applications in real life, though the reduction in the number of samples for tests will increase the speed of developing better models. Unfortunately, model-based approaches in these scenarios are limited in their ability to account for various situations, which can raise questions about their results.


As systems become more complex, the methods for training machine learning models are adapting to meet the challenge. Model-based reinforcement learning is one iteration of these methods that allows an agent to test itself in a simulated environment that may be too difficult or resource-demanding to create in real life or that would endanger the agent before it's ready to interact with the surroundings. This approach is especially beneficial to hardware-based systems since damage to a physical agent can hamper a project's progress.

This can be a complex process. Luckily, Mad Devs is experienced in machine learning and our experts are ready to help you in robotics, computer vision, natural language processing, and many more fields. We're ready to create and consult. Contact us today!


Which machine learning approach is best for my project?

What's the difference between AI and machine learning?