0 article rate

Updated: Feb 24, 2023
11 min read

Cloud Cost Optimization: How to Reduce Your Cloud Bill

Maksim Glotov DevOps Engineer

Andrew Sapozhnikov CIO & CTO

DevOps

Cloud Cost Optimization: How to Reduce Your Cloud Bill

64% of a survey said that cost management and containment is the biggest concern with running the cloud.

Pepperdata

This is not surprising because, in 2024, 80% of companies will be unaware of their mistakes in their cloud adoption and will overspend by 20 to 50%. If you haven’t optimized your cloud, you’re probably not spending your money in the right ways.

Сost management and cost optimization best practices have a lot of overlap. When you’re working with the cloud, you should optimize it. Today, we’re going to look at Mad Devs' cloud cost optimization experience, and also we will offer you nine important best practices for optimizing your cloud costs.

Cloud cost optimization case study

Not so far, the Mad Devs team started working on a new project from the world of FinTech. This project is aimed at helping invest funds more profitably. The project came to us with infrastructural tasks.

Our first goal was to help the DevOps engineering team clean up and optimize the infrastructure. And then proceed to more complex tasks that are waiting for their turn. The customer just wanted to get a quality result.

Therefore, after onboarding, we decided to look back and analyze what is there in terms of the usage and management of resources and approaches to cost optimization. As it turned out, no one was doing this, so we decided to start from here and, at the same time, along the way, get to know the entire infrastructure and its components better.

As you know, the cost of cloud infrastructure directly depends on the size of the team that has access to it. During operation, a large number of services and resources can be accumulated and forgotten. In addition, suboptimal instances or resource types can be used, leading to excessive costs. In general, there are three large areas where you can work on reducing costs:

Operational Expenses. This includes how you pay for your cloud—whether you use credits, saving plans, or other available means. You can also attribute the time to developing or raising, and maintaining your service or using ready-made solutions.
Resource Usage Expenses. It costs CPU, memory, storage, and network resources. The most interesting things often happen here because you need to analyze all applications and services, their real consumption of all resources, play with different types of processors, sizes, types of instances, and so on with everything.
Architectural Expenses. Design mistakes ultimately lead to performance and downtime issues, expensive support, slow update and delivery of new features, a growing support team, and infrastructure bills. The entire team or platform team should already address this issue globally—the most time-consuming way to save in the future.

How did Mad Devs helps optimize the cost of the project?

Disclaimer: Since the project is under the NDA, we cannot provide some private information.

Cost optimization for outgoing service traffic between regions

Two types of environments were used inside the project: staging and production. But they were deployed in different geographical regions, for example, region A and region B.

To collect metrics and logs, Elasticsearch is used as a storage, and its products: metricbeat—for collecting metrics and filebeat—for collecting logs, APM—for monitoring application performance. Elastic Cloud hosts Elasticsearch. So, the Elasticsearch cluster is deployed in region B. We get the following diagram:

Monthly traffic bill was: ~1400$/month.
After preliminary calculations, running a separate Elasticsearch cluster for a staging environment would be around 450$/month.
Result: ~1000$/month savings.

We have configured Metricbeat not to send more than 30 duplicates of the same metrics and removed redundant/unnecessary metrics at the moment. The billing picture looks like this:

The green bar is the daily cost of traffic sent between regions (sending logs/metrics).

Cost optimization per monitoring cluster

Once the staging environment has been picked up, it's time to optimize Production.

Elasticsearch allows us to set up several types of nodes for storing indices (hot, warm, and cold data), where each type has its own configuration and cost. For example, more CPU and memory are used for hot data—high-speed but smaller disks. Therefore, it is necessary to correctly approach the choice of data storage format and storage lifecycle period because this will directly affect the price of the cluster.

What indices and their parameters are needed.
How long each index needs to be stored on each type of node.

Having collected the necessary data, we used the Lifecycle Policies mechanism in Elasticsearch, which is applied to index templates and allows you to manage the movement of data between tiers flexibly.

Result: ~2500$/month savings on the current amount of data (production environment).

Optimization of costs for computing resources

When configuring a k8s cluster, you should be careful about the type of instances you use. In addition to the workload that runs in the cluster, a bunch of utility software runs on each cluster node.

The service load also needs resources. Sometimes, the overhead will consume more than a third of the available resources on the node if the chosen instance type is very small. Additionally, the node itself may not be fully utilized because the new workload no longer fits there, and you have to start a new node. That's what was happening here as well.

Result: ~250$/monthsavings on current workloads (staging environment)

Cost optimization per storage

When infrastructure is “owned” by a large group of people, some things get out of hand. Therefore, to avoid aggravating the situation and simplify project control, it was decided to use Kafka and Zookeeper as data exchange buses.

Both applications use storage persistence to store data/state of the cluster. Everything would be fine if not for the following picture in the staging environment:

Unfortunately, as you can see from the graph, each Kafka node had a 1TB disk, but the actual usage was at 9GB.

Also, a backup was made once for Kafka/Zookeeper/Cassandra, which doubled the space occupied.

It was decided to redeploy Kafka/Zookeeper with a new optimal configuration and remove backups and the schedule for their creation.

Result: $534/month storage savings + $237/month after deleting backups/snapshots = $771/month (staging environment)

Cost optimization for load balancers

To speed up the development process, almost all services in staging can be connected via VPN.

Kubernetes provides several ways to access a service from outside the cluster:

Setting up a separate load balancer for each service.
Setting up a unique port for the service on each node of the cluster (not a very convenient approach because you need to know the IP addresses of the instances, and in a dynamically scaling environment, this can be said to be unrealistic)
Using one load balancer and Ingress controller. In this case, you can imagine an Nginx configuration with a bunch of server {} blocks.

Initially, the project used the 1st option, and the total number of load balancers on staging was more than 20. Therefore, it was decided to migrate to the 3rd option and reduce the number of load balancers as much as possible.

Before

After

Result: ~237$/month savings (staging environment).

To summarize, we get monthly savings of about ~$ 4700-4800 in the project after 2 weeks of our engineer's work.

What Is FinOps and How It Changes Approach to the Cloud Financial Management?

As organizations adopt a multi-cloud strategy, they are finding it hard to manage the cost and value of their cloud spending. Find out how to better control and manage cloud spending with FinOps.

Nine ways to cloud cost optimization

These best practices can help you create a cloud cost optimization strategy that links costs to specific business activities, so you can see who, what, why, and how you are spending your cloud budget.

1. Set up your account for monitoring

Set up a master organization payer account after ensuring their cost data rolls up into that master account. It is very difficult to track cloud costs down the road when you have separate accounts. The next step is to capture context. As a result, you can get a sense of what's happening in the system. Last but not least, start tracking your costs. By enabling cost and usage reporting, you can review past spending and identify anomalous expenses.

2. Regularly audit your cloud costs

Many companies are paying too much for their cloud because they haven't audited it in some time. In the long run, cloud costs can build organically and not always be applied to the right items. Cloud costs may help you identify areas of overpayment or underpayment, overperformance, or underperformance. As you audit your cloud costs, look at the areas that you feel are gaps or potential overlaps. You may find areas of your cloud development that you need to improve or shore up to become truly effective.

3. Review pricing and billing information

Cloud vendors provide billing details explaining cloud service costs. This information can be used to identify high-cost areas and generate savings. Analyze and prioritize high-spend services. You can avoid paying for redundant resources by understanding the cloud's costs.

4. Create cross-functional teams

Team members, and developers, may have different ideas regarding where money should be spent. In spite of the fact that they may all have their views, it is the employees on the ground who are the most likely to understand where cloud costs should be focused—with some guidance from administrators and supervisors. If you create cross-functional teams, you will be less likely to waste money on areas where it is not needed and more likely to see where improvement is needed.

5. Enable the right size of the services

By right-sizing, you can analyze computing services and modify them for maximum efficiency. As many possible combinations, as well as memory, graphics, database, storage capacity, and throughput options, are difficult to size instances manually. Using right-sizing tools, you can get recommendations for changes across instance families. Cloud optimization helps reduce cloud costs and optimize cloud usage, helping to maximize the performance of existing resources.

6. Identify and consolidate IDLE resources

The next step is to address idle resources. For example, if your CPU utilization is 10% but the provider charges for 100%, you are wasting a significant amount of computing resources. Identifying such instances and consolidating computing jobs into fewer instances is one of the key strategies for optimizing cloud costs.

7. Right-size resources with sizing tools

Modern cloud solutions, such as Microsoft Azure and AWS, offer right-sizing tools administrators can use. By right-sizing, instances and systems utilize the right resources, often more efficiently than manual analysis for optimizing cloud costs.

8. Analyze system usage with visual tools

Without data visualizations, it can be challenging to understand how a system is being used. In particular, heat maps make it easier to identify potential hotspots before they become disruptive. You can start balancing and adjusting the system load as soon as you see it heading in one direction. Analyzing raw numbers may not be helpful, but looking at a map or graph might. Administrators often cannot make adjustments because data are not presented understandably.

9. Optimize tomorrow’s costs today with a cloud-native design

Rehosting (lift-and-shift migration) is the most common method of migrating to the cloud. On-premises systems are moved into a cloud environment without being modified. While rehosting is fast and cost-effective in a team effort, it may result in runaway costs if on-premises inefficiencies are moved to the cloud.

You can still make incremental changes to eliminate inefficiencies if you don't have the time, funds, or skills to refactor your legacy applications and mission-critical workloads.

Three approaches of the cost optimization trade

Once you have a firm grasp on how to approach cost optimization in the cloud, it’s time to think about the various tools at your disposal. At a high level, cost management on Google Cloud relies on three broad kinds of tools.

Cost visibility—this includes knowing what you spend in detail, how specific services are billed, and the ability to display how (or why) you spent a specific amount to achieve a business outcome. Here, keep in mind key capabilities such as the ability to create shared accountability, hold frequent cost reviews, analyze trends, and visualize the impact of your actions on a near-real-time basis. Using a standardized strategy for organizing your resources, you can accurately map your costs to your organization's operational structure to create a showback/chargeback model. You can also use cost controls like budget alerts and quotas to keep your costs in check over time.
Resource usage optimization—this is reducing waste in your environment by optimizing usage. The goal is to implement a specific set of standards that draws an appropriate intersection between cost and performance within an environment. This is the lens to look through when reviewing whether there are idle resources, better services on which to deploy an app, or even whether launching a custom VM shape might be more appropriate. Most successful companies that avoid waste optimize resource usage in a decentralized fashion, as individual application owners are usually the best equipped to shut down or resize resources due to their intimate familiarity with the workloads. In addition, you can use Recommender to help detect issues like under- or over-provisioned VM instances or idle resources. Enabling your team to surface these recommendations automatically is the aim of any great optimization effort.
Pricing efficiency—this includes capabilities such as sustained use discounts, committed use discounts, flat-rate pricing, per-second billing, or other volume discounting features that allow you to optimize rates for a specific service. These capabilities are best leveraged by more centralized teams within your company, such as a Cloud Center of Excellence (CCoE) or FinOps team that can lower the potential for waste while optimizing coverage across all business units. This is something to continue to review both pre-cloud migration and regularly once you go live.

Considering people and processes will go a long way toward ensuring your standards are useful and aligned with what your business needs. Similarly, understanding Google Cloud’s cost visibility, resource usage optimization, and pricing efficiency features will give you the tools you need to optimize costs across all your technologies and teams.

Conclusion

Optimizing cloud costs isn’t a checklist, it’s a mindset; you’ll have the best results if you think strategically and establish strong processes to help you stay on track. But you can also take many service-specific steps to get your bill under control. If you feel that you’re not getting enough from your cloud—if you aren’t getting the bang for the buck—then you may need to embark on cloud cost optimization. But that’s not always easy to do alone. Most cloud optimization starts with an audit. A third-party audit is often essential.