Gregory (Scotland Yard detective): Is there any other point to which you would wish to draw my attention?
Holmes: To the curious incident of the dog in the night-time.
Gregory: The dog did nothing in the night-time.
Holmes: That was the curious incident.
What relation does this expression have to IT and when is it used? We use it for cases when something deviates from a usual situation. Normally, it is a signal about some kind of a problem, when the dogs “not barking” but they should have. Or when there are no alerts about issues while normally, they are present. In other words, we use this expression for something suspicious, something that normally isn’t there.
- Suddenly, you stop getting notifications about errors. So, your alerting tools, “the dogs”, stopped barking. But it doesn’t mean that suddenly, everything is fixed and runs smoothly. It rather means that something has happened. So, look for an issue: a bug, an error, or even a failure of a monitoring or alerting system.
- Users stopped complaining about delays, app crashes, bugs? It doesn’t mean that the product for some reason started working perfectly, especially if earlier, it wasn’t the case. It means that the notifications do not reach you. If everything is suspiciously calm, look for a failure.
- The traffic has increased suddenly. It means that something is going on. And it doesn’t always mean that something good is going on. So, find out what is behind the traffic growth and act accordingly.
Some examples of what might cause the dogs to stop barking
It is time to have a look at what might cause the dogs to stop barking. Or, in other words, to have a look at what might go wrong.
We use a set of alerting tools to be updated on any failure.
So, if there are no alerts, it means that one or another alerting tool stopped working or its settings aren’t alright. Examples? Here we go.
Users are 100% happy with your new product
Yep, we rely on Sentry for alerting user behaviour. Let me elaborate more on this tool though. It will help me to explain how many things depend on it.
Even if your code seems to be clean, even if you did your best to cover it all in tests, testing everything is impossible.
There are PLENTY of new and old browsers across multiple devices: smartphones, PC, tablets, game consoles, IoT devices, smart watches… Yep, you can run tests on many of them but I doubt whether you can test just everything. Moreover, your way of using your app is your way. Users might have a completely different approach to using it. They will for sure do something you would never expect. Bugs might appear just because the sequence of tasks is handled in a way that seems illogical to you. But it is logical for another user. And this sequence might cause an error or make the entire app fail.
Do all users report on the bug? If they are willing to do so, can they describe what bothers them? In most cases, to describe a bug, an error, they would need to have at least some technical knowledge. Well, try to guess what % of users have it. So, the majority of errors will either be not reported or users will limit their reports with “This product doesn’t work”. Guess why they think the product doesn’t work and what they did to make it crushed.
Here is where Sentry and similar alerting tools come in handy. It eliminates the need of relying on customers to assess and test our products. It collects all errors in real-time mode. Then, depending on the settings, it sends alerts about errors to your Slack chat or email.
So, Sentry alerts us about errors that users have encountered and provides us with information about the errors. By the way, as I have mentioned, it is normal that users find bugs. It is not because your product is bad. It is because every person approaches the app in a different way. So, alerts are inevitable, especially if you have just pushed a new version of the app to production.
Now, imagine that you have just released the product, and it works perfectly. You know that it works perfectly because there are no alerts - Sentry is silent.
Don’t hurry to celebrate though. Do you remember I have said that errors are inevitable just because some sequences of tasks are managed by users in a way that you couldn’t predict? So, if you don’t get any alerts, it might be the case when your “Dogs not barking”.
Check whether your alerting system is integrated properly. Make sure you have inserted the code into your application correctly. If you are frantically checking Slack for alerts, check your email, too - you might have set up Sentry to send alerts to your email, and vice versa.
In other words, if you don’t get any alerts on errors when the product is released, look for an issue.
Changes are implemented, and not a single notification is received
One of the monitoring tools we use is Datadog. It gives us alerts about issues in infrastructure, applications, and services. We love the Datadog alerts because they are very specific, actionable, and contextual. They enable us to minimize service downtime. Also, alerts provide enough information to prioritize the issues.
But what would you do if your team has implemented changes to, say, a website, and you haven’t gotten a single alert? Well, considering that the tool issues alerts on bugs, website changes, and its performance, the silence isn’t a good sign here.
This is just one example. Along with the website metrics, the tool monitors many other things: backend, frontend, business analytics, etc. For example, with it, you can monitor how your app performs in front of users, or trace API requests from end to end, and many more things. Considering that Datadog not just alerts on bugs and errors but provides a lot of monitoring metrics if you don’t get any notifications from the tool, it means that something doesn’t work. So, check settings, plugins, etc. This tool shall send notifications. If it doesn’t, your “dogs not barking”.
A new feature is introduced, and website traffic hasn’t changed
Your team has introduced a new website feature. What effect do you normally expect from it? It would be something with the website traffic: a new feature might attract more visitors, it might make the existing visitors use the website more, or - and it can happen, too - it might push the visitors off. Whatever the scenario is, it influences the website traffic. If a change is implemented, users will notice it, and they will want to check the change. In our case, we are talking about a new feature. So, the website traffic change is inevitable.
What does Google Analytics (or whatever you use to monitor website traffic) say? Has it noticed any changes in the number of:
- Sessions number and their length
- The number of users who visited the website?
If there are no changes, your “dogs not barking” again. Something is wrong either with the service settings or with the service itself.
Some words to wrap up
Conclusions? We always check what is going on when we are alerted. But when we suddenly stop getting alerts, it doesn’t mean that everything works smoothly. In most cases, it means that something broke down. Find the failure asap and fix it!