[Guide: Assessment of the Planned Amount of Work in ML-Based Projects – Key Aspects]

Introduction

Creating solutions that use machine learning is an unpredictable process with nuances that make it different from regular software development. In this article, I'm going to share my experience and discuss the important factors to consider when evaluating projects in DS/ML/AI. If you consider these nuances when planning deadlines and resources, you can improve your chances of avoiding mistakes. An incorrect evaluation of a project can lead to its closure, financial losses, and many other unpleasant problems for you and your team. Stay alert!

Before evaluating a project

✅ Background context

Check that the client has shared all the important background context with you when sharing the goals of the project. For example, the client may want to simplify running their business. This will influence the approach to creating the project's PoC/ MVP, and more applied solutions focused on solving specific tasks with a simplified interface may be appropriate. Another case may be that the client wants to develop the project as a separate business and will look for investors to support it. Here, the UI may be more important than in the first case. We assume the client has done all of this, but it's best to check. If they haven't, you could spend time working on goals the client may not even want.

✅ Market research

Ask if the client has conducted market research to understand their competitors and target audience and if they've evaluated the market's potential and tendencies. Again, we assume they've done this, but it's better to check. Without this step, the project could finish before it even starts.

✅ Budget and finances

Discuss the financial resources (investors, grants, personal funds, etc.) the client plans to use in the project. This affects the project's budget and its realistic scope.

Evaluating the project

When estimating the volume of work in ML-based projects, it's essential to consider the following factors:

Defining the problem and goals:

Clearly define the problem that the customer wants to solve with ML tools. Consider whether other instruments can solve the problem with fewer expenses, more effectively, or faster. This can be an alternative solution.
Outline the goals, KPIs (key performance indicators), and/or target metrics that will help assess the project's success. Make these points clear to the team that is evaluating the project.

Here are a few more specific examples. If the project involves increasing data processing speed with ML tools, then define which specific processing speed you need to achieve.

Abstract goals, such as "I want a nice end result," should be converted into quantifiable metrics (e.g., the amount of time spent on data processing, the minimum and maximum number of symbols in the final text, and the resolution of the final image).

If you can't do this, then plan for extra time to be spent on tailoring the result to the client's abstract goal.

❗️NB: Choose KPIs using the SMART method (Specific, Measurable, Achievable, Relevant, Time-bound) and connect them with the business's goals. It's best to have 5-10 performance indicators to remain focused on the most significant aspects of the project. Here are some examples of KPI that you can use in an ML project:

A model's accuracy: The percentage of correct predictions compared to the overall number of predictions.
Connection to business goals: An assessment of how well the model's results match the business's strategic goals. This could be, for example, target coefficients for increasing revenue, decreasing spending, or increasing the speed of processing user requests.
Prediction speed: The target time required for receiving a prediction from the model after deploying it.
Model's training time: The target time required for training the model on the given data set.
User satisfaction: An assessment of the quality of the models's predictions or recommendations by users. It's a target coefficient calculated using survey results or ratings (such as NPS).

Legal and regulatory factors:

Ensure that the client has researched legal and regulatory requirements in AI and reported them to you. If they haven't done this, plan time to conduct this research yourself. Otherwise, you may develop an instrument that can't be legally used.
Check that the client has provided you with the requirements for laws and standards, both regional and international (such as HIPAA, GDPR, ISO, etc.). If they haven't done this, plan time to conduct this research yourself. Otherwise, you may have to redo the developed tool or reevaluate the time required for the project.

Collecting and processing data:

It's essential to consider the resources you'll spend on collecting and processing data and define the criteria you'll use to evaluate the model's work.

Outline the source of the data and how you'll collect it.
Estimate the volume and quality of data required for training the models.
Decide whether the client has the necessary amount and quality of data for the project or if you'll need extra time to collect/generate this data for training the models. If there isn't enough data or acquiring it isn't realistic, inform the client of the high risk of not completing the project in the way they expected.
Plan how to refine, annotate, and prepare the data for training.
Create target criteria for the quality of the model's work that match with KPI (see point above).

Developing and testing the models:

Creating and training a good model requires a lot of time. First, the model's architecture must be developed, and then it must be trained and tested.

Evaluate the volume of work of researching and developing models (R&D stage). Assess how much time you'll need for experiments, validating, and testing the models.
The time needed to train the models should be visible in the project's timetable.
Plan for optimizing and increasing the models' productivity after demonstrating preliminary results.
When creating the estimate, include the time needed to develop a testing strategy for the model and assess its accuracy, dependency, and productivity. Likewise, consider the time needed to find and remove any potential sources of errors or displacement.
Estimate the expenses on each planned improvement and optimization cycle.
Discuss with the client whether they'll want uninterrupted support and updates for the models as the user base grows. Include this in your estimate, too.

Technical requirements and infrastructure:

Creating and training good models requires significant computer resources.

Define the technology stack already being used in the project. Plan for the expenses needed to integrate it into the final estimate (more details below).
Decide on what technology stack is necessary to develop the ML solution.
Assess the computer, data storage, and infrastructure requirements. Although correctly guessing the needed load is unlikely, it's important to have an estimate for the initial stages of the project.
❗️NB: It's best to divide the resources required for the model's work from those needed for its training. You'll use more resources for training over a short period, while the work will need fewer resources over a longer period.
Assess the potential to scale and support the project. Define these factors in the final estimate as a separate set of expenses.

Integration and deployment:

ML solutions often need to be integrated with other systems used in the project. This requires extra effort to develop interfaces and maintain compatibility.

Estimate the work necessary to integrate the model into the final product or service.
Assess the deployment and monitoring stages for ML in production.

Safety and ethics:

Consider aspects of the data's safety and confidentiality. Estimate the amount of work necessary to ensure the data is safe.
Review the risks of ensuring the ethical use of ML tools and estimate the work required to do this.

Team and resources:

Assess the required number of specialists and their qualifications (engineers, data scientists, data engineers, managers, etc.).

Conclusion

Evaluating the volume of work for ML/AI projects requires that a project manager has a clear understanding of the goals of the project and the tasks that require solutions using machine learning technologies. They should have specific KPIs and metrics. The quality, amount, and accessibility of data play a key role in successfully realizing such projects and requires delicate planning of data collection and processing. When assessing the technical requirements for the models' infrastructure and architecture, it's crucial to consider the separate prerequisites for the production stand and for training the models. Do not ignore regulatory and legal requirements or the data's safety. The project's success also depends on a proper estimate of the team, resources, and demands for supporting the models. The final evaluation should include all these factors to avoid possible issues, financial losses, and the project's closure.