Machine learning review

In another article, we briefly outlined the basic machine-learning approaches and explored model-free reinforcement learning in detail. There are many more interesting machine learning techniques to explore, which is what we will do in this article on contrastive learning. If you've read our previous articles or others on the topic, you'll meet some familiar terms, like supervised and unsupervised learning, as they are combined to create the supervised contrastive learning approach. This article will examine contrastive learning in more detail to move beyond the theory to its real-world applications.

Let's do a quick review before we start. There are three overarching approaches to machine learning: unsupervised, supervised, and reinforcement learning. Unsupervised learning involves giving a model unlabeled data to analyze and identify internal patterns in the data in order to understand and determine the structure and relationships between objects and features. Imagine you're in a kitchen and ready to cook something. You open up your spice drawer and find unlabeled bottles with different spices. You start tasting each one and separate them according to different parameters, such as salty, sweet, spicy, bitter, etc. You're performing unsupervised learning.

Unsupervised learning as a human's study of different spices

In supervised learning, the model is trained based on labeled data. Any data it encounters corresponds to a known correct label. The model finds patterns and creates connections that it can use to predict labels for new, unlabeled input data in the future. Returning to the kitchen, you have labels (tastes) for the spices in front of you. You open another drawer and find more spices, but this time they have labels, and you know their names. You can put each one into the proper category from your previous experiment without tasting them.

The third approach, reinforcement learning, involves an agent that explores and interacts with an environment to receive rewards. These help the agent choose the most optimal action for different situations. As the cook, you are the agent, and the kitchen is your environment. While tasting the spices, you encounter ones you like and ones you don't like. These are the rewards and help you determine what actions you'll take when cooking. You've learned which spices you like and which you don't, so you'll know what to add to your food.

These three approaches serve as a foundation for training a machine learning model, and researchers apply different variations and methods to solve problems in different fields and situations. One of these methods is contrastive learning.

What is contrastive learning?

Before we dive into the details of contrastive learning, there are a couple of words that we'll be using to discuss it and its various aspects. Let's take some time to define what they mean:

  • Anchor image: this is the data sample that is the focus of analysis.
  • Positive sample: this is a data point that belongs to the same distribution as the anchor.
  • Negative sample: a data point that belongs to a different distribution.

Here's an example with a Google image search. First, you upload an image of a car, for example. This is the anchor. Hopefully, Google will return positive samples: pictures of other cars. If Google shows you an image of a giraffe, that would be a negative sample.

What is anchor, positive and negative samples

Contrastive learning, then, is a machine learning paradigm that puts data points against each other to teach a model which of the points are similar and which are not. Hence the name, contrastive, from the word "contrast," difference. If you imagine a space with different data points, the goal of this process is to group similar samples together, reducing the distance between them while pushing those that are different further away, increasing the distance.

Different methods are used to accomplish this. One of them is instance discrimination, where the anchor image is copied and transformed in different ways to produce positive samples that vary in color, size, and other features. This helps the model return samples that match the content of the anchor but not the color or size. Here are ways to do this:

  • Color jittering: the brightness, contrast, and saturation of an image are randomly changed randomly
  • Image rotation: an image is rotated randomly from 0 to 90 degrees
  • Image flipping: an image is flipped (mirrored) according to its center, either vertically or horizontally
  • Image noising: random noise (pixels are changed, such as to black and white) is added to an image
  • Random affine: an image's lines and parallels are preserved, but the distances and angles may be changed

Another method is image subsampling/patching, where an image is broken into multiple smaller images or patches, one of which is used as the anchor image. If we return to your Google image search example and the picture of a car, the patches will include pieces of the car. A patch with a headlight is used as the anchor image, and the model should then return positive samples with cars.

Data and contrastive learning

Data is literally the food that machine models eat. We use the word "feed" to mean giving a model data. However, one of the main issues is that there is too much data in the world, or rather, a lot of this data is unlabeled. Using the three machine-learning approaches examined at the beginning of the article, researchers have created sub-approaches to meet current challenges. Contrastive learning can be applied to all of them:

  • Supervised contrastive learning
  • Self-supervised contrastive learning
  • Semi-supervised contrastive learning

When used in supervised learning, a contrastive method involves feeding a model labeled data for it to analyze and build comparisons within that data set. Returning to the problem of unlabeled data, contrastive learning can be applied to a semi-supervised approach, where the model receives both data types. Using the labeled data, it can begin to draw conclusions about how to classify the unlabeled data it encounters. In self-supervised contrastive learning, the model uses the anchor image to create positive and negative samples without being fed a labeled data set at the beginning. In each of these sub-approaches, the principles of contrastive learning remain the same. The difference is in the type of data researchers use.

Labeled and unlabeled data

Loss functions

A contrastive learning model seeks to reduce the distance between the anchor image and the positive sample while increasing the distance between the negative sample. A model though doesn't intuitively understand that it needs to do this, so loss functions are used to define how it separates the samples and how it should understand when it achieves success. Loss functions provide a means of measuring the distance between the samples. Here are the common contrastive learning loss functions:

  • Contrastive loss: penalizes a model for pushing positive samples too far from the anchor image
  • Triplet loss: computes the difference of the distance between the anchor image and positive sample and the anchor image and negative sample
  • N-pair loss: Tries to minimize the distance of many positive samples to the anchor image instead of individual instances
  • InfoNCE: Focuses on the samples and grouping positive samples closer while pushing negative samples further away

Benefits and limitations of contrastive learning

As mentioned above, the main benefit of contrastive learning is its ability to analyze data unlabeled data. The abundance of such data is one of the primary problems in machine learning. Contrastive learning, as a form of unsupervised learning, also enjoys a high transferability to other tasks within the scope of its analysis.

There are limitations to this approach too. While contrastive learning is great for a world of large quantities of unlabeled data, it still needs to be of a certain quality. Gathering this data requires time and other resources that aren't always available. Likewise, the process of applying contrastive learning is resource-intensive in terms of computational power. We listed the ways in which images can be modified to produce better results in contrastive learning models. Well, all of these small changes create a model that is difficult to properly tune. If not done correctly, the results may be poor.

How is contrastive learning used today?

The abundance of unlabelled data is one of the reasons contrastive learning exists. From medicine to Natural Language Processing (NLP) and social problems, contrastive learning has applications that are being explored in research and used by companies to achieve change and progress.

Medicine and research

The medical field is an area where the unlabelled problem is especially prevalent. Applying contrastive learning has benefits for identifying and diagnosing potential medical issues and is the focus of various research that trains models to process medical data.

These include analyzing data from medical time series, such as health-related signals such as electroencephalography (EEG), electrocardiography (ECG), and intensive care unit (ICU) readings. Contrastive learning can help find patterns and trends in this data that will lead to better procedures, assessments, and treatments. In a similar way, contrastive learning can be applied to medical imaging analysis to sift through the incredible amount of data that is unlabeled or classified in individual ways. In a specific use case, this approach can be used to diagnose diabetic retinopathy, a condition that causes vision loss in people with diabetes, by analyzing retinal imagining and allowing for faster identification of the early stages of the condition.

As the models become more robust, patients will be able to receive faster care that will reduce wait times and unpleasant procedures.


Practical applications

Research, such as the example above, leads to contrastive learning models that users can experience daily. Two examples are OpenAI's CLIP and Dall-E models, which generate images based on text. The contrastive approach helped to train these models to produce better-quality images and learn how to interpret the text inputs they receive. Another important factor was for the models to understand what kinds of images are acceptable to produce and which are not, such as those that feature violence or explicit content.

In terms of internet safety, the startup Henrietta AI has developed a model that identifies unsolicited, explicit images that users receive in their social media messages and blocks the content before they access it. This would not have been possible without contrastive learning to train the model to understand unsolicited and explicit content parameters in order to filter them out of a user's messages properly. Recently, Henrietta AI won 6th place at the Berlin-Brandenburg business plan competition, showing its perspective for continued development.


Contrastive learning is a form of semi-supervised learning that compares data points to teach a model which points are similar and which are not. With more data available, labeling it all becomes too time-consuming and resource-intensive for the effective use of supervised learning models. Contrastive learning can use what labeled data is available for a particular dataset to learn how to interpret unlabeled data, thus optimizing analysis in many fields, especially medicine.

Understanding which machine-learning approach best fits your project can be a challenge. Luckily, Mad Devs is experienced in machine learning and our experts are ready to consult you in robotics, computer vision, natural language processing, and many more fields. Contact us today!


What is machine learning?

What is computer vision?

What is natural language processing?