Engineering platforms are a relatively new phenomenon, but they attract much attention from leading companies and hold promise for reshaping the technological landscape. This article will explore the reasons for its emergence and core concepts, learn about its specific principles and components, its unique values and challenges, and discover key practices and standards.
Whether you are a business owner or a technical expert, here you will get valuable insights to enhance efficiency and competitiveness in today's rapidly evolving technological environment.
What is platform engineering?
Platform engineering is an approach where infrastructure, including a set of services, tools, processes, and approaches, is designed to facilitate the coordination and standardization of software development and ensure consistency in software development among different teams, providing them with their preferred level of abstraction, thus allowing them to focus on the development of new features and the quick launch of new applications, while engineering platform handles the operational aspects of the application lifecycle.
What's DevOps versus platform engineering versus Site Reliability engineering
Most platform engineering concepts, principles, and features overlap with DevOps and make you confuse it with Site Reliability Engineering (SRE). But these are all different things that have appeared at other times in response to additional industry challenges.
Initially, there was the traditional software development model where development and operations teams worked separately. This led to coordination issues, slowed down the software delivery process, and increased the number of errors. DevOps emerged as a response to these problems, erasing the boundaries between development and operations and fostering a culture of collaboration and automation, which together simplify the software delivery process and improve its quality. DevOps also include practices such as CI, CD, and containerization and orchestration tools.
This shift led to a rapid acceleration of software development, simplifying the creation of increasingly capable and more complex systems. However, ensuring the stability and reliability of the development of such systems became more challenging. In turn, Google introduced SRE as a response, detailed in a few books about SRE. This approach mostly overlaps with DevOps but emphasizes system reliability, introducing additional practices such as incident management, continuous process improvement, and tools like Service-Level Indicator (SLI) Service-Level Objectives (SLOs), and especially Service-Level Agreement (SLA) for measuring and improving system reliability.
However, the pace of development continued to increase, and the number of tools needed that responded to the industry's ever-changing requests. In response, the engineering platform emerged, carrying all the necessary and current practices and tools for consistent, efficient, and reliable development and its continuous improvement and standardization for specific companies.
Of course, the flexibility and adaptability of such a platform requires a cloud infrastructure, which asks for separate management so that developers can focus on specific tools, not their configuration. In response, GitOps emerged, representing an approach to managing this infrastructure through code.
All these approaches do not cancel but complement each other, sharing common principles, practices, and tools. The difference between them is that they have emerged in response to different industry demands and focus more on providing specific aspects of software development improvement.
Which concepts are behind platform engineering?
Given the accelerating technology advances and the growing data threats, the concepts platform engineering relies on are becoming increasingly important today, enabling them to provide reliable and adaptive software development.
- Flexibility is about how the system can adapt to changing requirements and conditions. It includes quickly adding new features, modifying existing ones, and integrating with other systems. This is important as it allows teams to respond quickly to new business requirements or technological trends.
- Scalability means the system can handle increasing load while maintaining performance and reliability. It includes adding additional resources to handle an increasing volume of data or requests and distributing the load among these resources to ensure stable operation. This is critically important for maintaining a high level of service as the business grows.
- Security is about protecting the system and data from threats. It includes modern encryption, authentication, and authorization methods, as well as implementing security policies and monitoring to detect and prevent potential threats. This is important for protecting valuable information and maintaining user trust.
Engineering platform principles
Now consider the principles of platform engineering, which ensure its capabilities and features.
- Modularity. It goes beyond simply dividing the system into independent services. It includes creating consistent interfaces and contracts between services, ensuring their compatibility and interaction. This allows development teams to implement and update services without the risk of disrupting the operation of other platform components. It also implies the ability to replace or update individual services without changing the entire platform, providing flexibility and resilience in the face of constantly changing requirements and technologies.
- Automation. It is not limited to simply reducing manual labor. It includes automating deployment, monitoring, scaling, and recovery processes for services. This ensures quick and reliable deployment of updates and quick detection and resolution of issues to maintain high effectiveness and availability of services.
- Standardization. It means not only using universally accepted standards and protocols but creating and adhering to internal standards and best practices in the design, development, testing, and deployment of services. This ensures consistency and quality of services throughout their lifecycle and simplifies their integration and interaction.
- Monitoring. It covers tracking the state and performance of services and collecting and analyzing metrics at all platform levels — from infrastructure to applications. This leads to universal optimization of performance and scalability and allows teams to quickly detect and resolve issues and predict and prevent potential problems.
Engineering platform components
There are many components that help to implement the above concepts and principles into a single system of continuously working tools. Let's consider them and their purpose.
- Cloud providers provide the basic infrastructure and services that all engineering platform infrastructure builds upon. These include computing resources, storage, networking, and various cloud platform services. So cloud providers offer the raw materials that the platform uses to build and run applications. Examples of cloud providers include the Azure Cloud platform, Google Cloud Platform, and Amazon Web Services.
- Infrastructure as Code (IaC) approach manages and provides computing infrastructure with code rather than physical hardware configuration or interactive configuration tools. So it allows for the automatic setup, modification, and management of infrastructure and makes it easier to manage and scale the infrastructure that supports your applications. Examples of IaC tools include Pulumi, Terraform, and Cloudformation.
- Infrastructure control plane is responsible for low-level managing the infrastructure resources and provides a unified interface for managing all the resources and services provided by the cloud provider, such as computing, storage, and networking, needed to run applications. Examples of Infrastructure control plane include Terraform Cloud, Atlantis, and Spacelift.
- Continuous Integration (CI) and Continuous Delivery (CD) are key approaches here. CI enables developers regularly merge their code changes into a shared repository, ensuring the codebase's integrity and that new features don't disrupt existing functionality. And CD follows CI, aiming to promptly build, test, and deploy new features to the platform. CI/CD creates a seamless pipeline for simple, swift, reliable software releases. Examples of CI/CD tools include Circle CI, Codefresh, Bitbucket Pipelines, GitLab CI, GitHub Actions, CodeBuild, Jenkins, Travis, Azure DevOps, and Google Cloud Build.
- Registry stores all CI/CD process build artifacts and lets you distribute Docker images with new platform features. Examples of registry tools include Docker Registry, JFrog Artifactory, Harbor, AWS ECR Registry, Azure Google Container Registry, and Google Container Registry.
- Kubernetes serves as a comprehensive platform orchestrator, managing containerized applications. It provides a unified interface for maintaining the desired state of a cluster and coordinates the deployment, scaling, and healing of applications, ensuring their efficient lifecycle management across the platform. Managed Kubernetes services like AWS EKS, GCP GKE, Azure AKS, and RedHat Openshift.
- Service catalog enables a catalog of services developers can use, a portal for developers to manage their applications, and a user interface for managing the platform. Once the platform orchestrator deploys and manages the feature, it is made available to other developers through the Service Catalog. Examples of such tools include Backstage, LeanIX, Port.
- Database & Storage makes the data persistence layer for applications and provision databases for storing structured data and storage systems for storing unstructured data of platform applications and features. Examples include Aiven, PostgreSQL, Redis, Amazon S3, MariaDB, Kafka, MySQL, Elasticsearch, and MongoDB.
- Messaging enables a system for sending and receiving messages between applications. It allows for asynchronous communication between applications and their features and can be used to decouple senders and receivers, making architectures more scalable. Examples of such tools include RabbitMQ, Kafka, and ActiveMQ.
- Security components include features like authentication, authorization, encryption, and security monitoring. It ensures that applications are secure, that data is protected, and new features development and use pose no threats to the platform. Examples of such tools include Azure Sentinel, Synk, Gremlin, Armo, and Tigera.
- Logging collects, stores, and analyzes log data from applications and infrastructure, allows for the monitoring and troubleshooting of applications and features' performance and usage, anddelivers insights into application performance and usage. Examples of such tools are Stackdriver, Fluentbit, Logz, and Datadog.
Values of engineering platforms
It's a relatively new approach but is rapidly gaining popularity. According to a Gartner article, it is one of the main trends of this and the coming years, as the infrastructure of companies includes more and more services and applications, the effective operation and constant improvement of which cannot do without platform engineering.
Ensuring uniformity and consistency
It allows organizations to standardize their tools and processes across the entire platform, ensuring uniformity and consistency. This simplifies management and support, reduces the likelihood of errors, and increases deployment speed. Unlike DevOps, where standardization may be limited to individual projects or teams, platform engineering ensures standardization at the level of the entire organization.
Simplifying management and monitoring
It enables centralized tools for management and monitoring, simplifying performance tracking, and identifying and eliminating downtime. This allows organizations to respond faster to changes and improve the quality of their products and services. Unlike SRE, where the focus is mostly on the reliability of the infrastructure, platform engineering ensures control over the infrastructure.
Increasing the speed of innovation
It lets teams quickly and efficiently experiment and implement new technologies. This stimulates innovation and helps organizations remain competitive in a rapidly changing technological landscape. Unlike DevOps, where the focus is on improving existing processes, platform engineering provides a basis for continuous innovative growth.
Challenges with platform engineering
As a complex and relatively new practice, it presents organizations with challenges during its implementation. Let's explore these challenges in more technical detail.
Inefficient development processes
Its implementation requires organizations to adopt new processes and workflows. This transition can be challenging as teams may initially struggle with adjusting to the new practices, resulting in inefficiencies and productivity bottlenecks. Developers may need time and help to understand and adapt to the platform's tools and methodologies.
Tooling inconsistency
It involves utilizing various tools and technologies to enable efficient infrastructure provisioning, deployment, and management. However, inconsistencies in tooling choices across different teams or projects can lead to compatibility issues, data fragmentation, and difficulty sharing resources and knowledge. Establishing standardized and compatible tooling frameworks is crucial for seamless collaboration and integration within the platform.
Scalability and complexity
It aims to provide a scalable and robust infrastructure for application development. However, managing the complexity of a large-scale platform can be challenging. Scaling infrastructure components, ensuring high availability, and managing dependencies between different services and applications require careful planning and implementation. Organizations may face difficulties maintaining and scaling their platform without proper architectural design and automation.
Governance and compliance
With the introduction of a platform, organizations need to establish effective governance and compliance practices. It involves defining policies, ensuring regulatory compliance, and monitoring adherence to security and data protection standards. Organizations need significantly invest in robust governance frameworks to mitigate risks and maintain the integrity and security of their platforms.
Skillset and knowledge gap
Introducing new practices may require organizations to upskill their teams or hire new talent with extensive expertise in the relevant technologies and practices. It can pose a challenge in finding qualified professionals, providing necessary training, and bridging the knowledge gap between existing teams and the new platform's requirements. Organizations need to invest in continuous learning and development initiatives to ensure their teams have the necessary skills to utilize and contribute to the platform effectively.
That's why companies need to support an engineering culture that makes professionals interested in learning and applying new technologies, helping and welcoming them to do so properly. You can read more about why and how we do it in Mad Devs Engineering’s Handbook.
By addressing these challenges through proper planning, training, and implementation strategies, organizations can overcome the obstacles associated with platform engineering and harness its benefits for efficient and scalable software development.
How to build a platform engineering team
Building a platform engineering team is a strategic endeavor requiring careful planning and execution. Accordingly, this involves many activities, but let's highlight a few key ones.
Recruitment process
Clearly define the required skill set and experience level for each role within the platform engineering team. Leverage job descriptions and interviews to assess candidates' technical expertise, problem-solving abilities, and familiarity with relevant tools and technologies. Look for individuals who demonstrate strong collaboration skills, adaptability, and a passion for automation and improving development processes.
Roles and responsibilities
Identify key roles within the platform engineering team, such as platform engineer, DevOps Engineer, Infrastructure Engineer, Security Engineer, etc., based on your organization's specific needs. Clearly define the responsibilities and expectations for each role to avoid role ambiguity and facilitate effective collaboration. Consider establishing cross-functional teams with members specializing in different areas, such as infrastructure provisioning, deployment automation, security, and monitoring.
Technical skills and competencies
Seek candidates with a strong background in infrastructure automation, cloud computing, containerization, and configuration management tools (e.g., Terraform, Kubernetes, Ansible). Look for experience building and managing scalable and reliable infrastructure platforms, leveraging modern software engineering principles and practices. Familiarity with CI/CD pipelines and observability tools is crucial for seamless platform development and maintenance.
Collaboration and communication
Emphasize the importance of collaboration and effective communication within the platform engineering team. Encourage using agile methodologies like Scrum or Kanban to foster transparency, accountability, and continuous improvement. Promote a culture of knowledge sharing, documentation, and cross-team collaboration to leverage collective expertise and facilitate learning.
Continuous learning and skill development
Encourage team members to stay updated with the latest industry trends, emerging technologies, and best practices in platform engineering. Provide opportunities for professional growth, such as training programs, workshops, and certifications related to cloud platforms, infrastructure automation, and software development.
Team dynamics and culture
Foster a positive, inclusive team culture that values diversity, encourages creativity, and embraces a growth mindset. Promote a healthy work-life balance, collaboration, and recognition of achievements to maintain team morale and motivation. Create an environment that encourages innovation and experimentation, allowing team members to explore new ideas and approaches.
For instance, in Mad Devs, we pay immense attention to corporate culture, understanding that the environment shapes the individual just as the company shapes the employee. You can read more about this in our Mad Devs Corporate Culture Overview.
Remember, building a platform engineering team is not a one-time effort. It requires ongoing commitment and investment to ensure that the team continues to evolve and improve as the needs of the organization change. Now let's move to platform engineering practices itself.
Best practices in platform engineering
Adopting key practices and methodologies is pivotal in maintaining the relevance and efficiency of the platform. These practices ensure that the platform remains up-to-date, highly performant, and capable of supporting the rapid and reliable deployment of applications and services.
- Maintain a DevOps
DevOps culture platform engineering should embrace the roots of DevOps, focusing not only on tools and technology but also on fostering a collaborative culture. This involves encouraging open feedback, facilitating positive dialogue, and considering developers as partners. This is fundamental to creating a high-performing environment and ensuring that the best practices are continuously evolving in response to the needs of the teams. - Cloud-Native architecture
Cloud-native architecture is an approach to developing and deploying applications that fully leverage the benefits of cloud technologies. This involves creating applications and services that utilize automatic scaling, resource management, and high availability. This allows teams to deploy and update applications rapidly and ensures the platform's high stability and performance. - Treating the Platform as a Product
When providing tools that are internal-facing, it's important to remember that your developers are your consumers. Therefore, treating the platform as a product, publishing roadmaps, and actively seeking feedback is essential. This will win the trust of your internal teams and ensure the platform aligns with their needs. - Balanced Tooling Approach
It's essential not to mandate standardized adoption of all tools but controlling tool sprawl. Striking a balance in tool adoption and reducing rogue tooling is necessary for an efficient team workflow and platform engineering process. - Improve Developer Experience
Invest in enhancing the developer experience associated with using the platform. This involves reducing friction, seamlessly integrating new capabilities, and ensuring developers find the platform conducive to their workflows. - Maintain GitOps
GitOps is an approach to managing infrastructure and configurations using version control tools like Git. This means using Git as the "source of truth" for all infrastructure and configurations. This allows teams to easily track changes, make modifications, and restore previous states at the level of the entire engineering platform. - Observability
Observability is the ability to understand the internal state of a system by observing its output data. This means collecting and analyzing metrics, logs, and traces to understand the performance and state of the platform and allows teams to quickly identify and resolve issues, as well as optimize system performance. - Plan-Do-Check-Act
PDCA, also known as the Deming cycle, implies continuous improvement of processes and products based on cyclical analysis and adjustment. PDCA allows teams to quickly adapt to changes, improve their processes and products, and ensure high performance. This makes PDCA an ideal tool for managing the complex and dynamic environment of platform engineering.
Regularly revisiting and updating these practices in line with emerging technological trends and standards is crucial to ensure the platform remains at the forefront of industry advancements.
Security practices in platform engineering
Security practices play a crucial role in making the platform safe and reliable. They protect the platform from potential threats and ensure compliance with various security standards and regulations.
Web application security updation
It requires constant updation, identification, and mitigation of web application security threats. The practice of updation covers a wide range of threats, including injections, incorrect security configurations, session handling vulnerabilities, and many more. Adhering to critical security measures against common and relevant vulnerabilities is facilitated by OWASP Top 10, and auditing and certification ensure the high security of developed and deployed web applications and services.
Application of security standards in cloud computing
Cloud computing forms the backbone of platform engineering. Therefore, security in cloud computing also becomes critical. This includes data encryption, access management, as well as the application of security policies and audit procedures. ISO/IEC 27017 provides guidance on information security for cloud services, while ISO/IEC 27018 establishes principles for protecting personal information in cloud services. Together, these standards help ensure security and privacy, and their certification confirms that the platform is developed and maintained with these aspects of security in mind.
Adhering to these security practices can significantly reduce the risk of security breaches and enhance the overall security posture of the platform. It's important to regularly review and update these practices to keep up with evolving security threats and standards.
Summary
This article discussed what engineering platforms are and what platform engineering is about. Now you know why platform engineering emerged, as well as the core concepts behind it, such as flexibility, scalability, and security. You understand the specific principles of platform engineering and the components through which they are implemented. We've highlighted the key values and warned about the complexities on the path to achieving them. Also, a set of key practices for platform engineering, such as Cloud-Native Architecture, GitOps, Observability, and the PDCA cycle, were discussed. And, of course, the importance of security in platform engineering and the corresponding security standards.