[Testing Web Applications with Speech and Image Recognition – Implementing Fake Audio and Video Inputs]

Our topic is not an everyday automation case; however, it becomes more and more popular, especially with the growing capabilities of speech and image recognition. You can see it in various apps. YouTube, for instance, generates closed captions on the fly, and this is only one of the most well-known examples. ML models for speech and image recognition are actively used in medicine and other fields as well.

You might guess that QA specialists all over the world are facing the need to test complex ML apps more and more. This write-up sheds some light on the toolset we currently have against natural language and video processing web applications, which can help us to pass audio/video files as the microphone or webcam live input.

Faking audio input

Before we begin, please pay attention to the stack of technologies we are going to use. These are Python, PyTest and Playwright. Python dependencies can be installed using the following command: pip install -r requirements.txt , along with the following requirements.txt file:

certifi==2024.2.2
charset-normalizer==3.3.2
colorama==0.4.6
greenlet==3.0.3
idna==3.6
iniconfig==2.0.0
packaging==24.0
playwright==1.42.0
pluggy==1.4.0
pyee==11.0.1
pytest==8.1.1
pytest-base-url==2.1.0
pytest-playwright==0.4.4
python-slugify==8.0.4
requests==2.31.0
text-unidecode==1.3
typing_extensions==4.10.0
urllib3==2.2.1

You will be surprised that the actual magic happens on a deeper level inside drivers for browsers. So, all you need is to pass particular arguments inside your tests to your desired browser (here we use Chrome).

Let’s have a look at the code snippet below and explain the details:

from playwright.sync_api import Playwright
import time

def test_web_mic(playwright: Playwright):
    browser = playwright.chromium.launch(headless=False,
                                         args=[
                                             # use Chrome's fake media streams
                                             "--use-fake-device-for-media-stream",
                                             # bypasses Chrome's cam/mic permissions dialog
                                             "--use-fake-ui-for-media-stream",
                                             # pass in your own custom media
                                             "--use-file-for-fake-audio-capture=C:\\filepath\\audio.wav"]
                                         )
    context = browser.new_context()
    page = context.new_page()
    page.goto("https://mictests.com/")
    page.get_by_role("button", name="Test my mic").click()
    time.sleep(40)
    page.get_by_role("button", name="Stop microphone").click()
    page.pause()

    context.close()
   browser.close()

The first argument "--use-fake-device-for-media-stream" tells your browser not to use a physical microphone, but, instead, to use a fake emulated input device.
The second argument "--use-fake-ui-for-media-stream" simply blocks your browser from asking permission to use the microphone/webcam.

The final argument "--use-file-for-fake-audio-capture=C:\\filepath\\audio.wav" is the actual file that you want to be played via a fake microphone device.

Audio input limitations

Please note that there can be limitations from browser to browser on supported file formats and sizes. Therefore, please conduct your own research on the documentation of your desired browser to ensure the creation of a valid audio file. We used a WAV format file with 1 channel and a 48 kHz sample rate.

The most important thing to remember is that the audio file used for fake audio capture will play in a loop when you run the browser. Consequently, in each individual case, you will need to consider how to organize interaction with the microphone in your tests so that the beginning of the audio file is synchronized with the start of microphone usage.

In case you want your audio sample to be played just one time, you need to add %noloop at the end of the file path inside the --use-file-for-fake-audio-capture argument.

Faking video input

Faking the video input is very similar to the audio, with a slightly different settings:

from playwright.sync_api import Playwright
import time


def test_web_cam(playwright: Playwright):
    browser = playwright.chromium.launch(headless=False,
                                         args=[
                                             # use Chrome's fake media streams
                                             "--use-fake-device-for-media-stream",
                                             # bypasses Chrome's cam/mic permissions dialog
                                             "--use-fake-ui-for-media-stream",
                                             # pass in your own custom media
                                             "--use-file-for-fake-video-capture=C:\\filepath\\video.y4m"]
                                         )
    context = browser.new_context()
    page = context.new_page()
    page.goto("https://webcamtests.com/")
    page.get_by_role("button", name="Test my cam").click()
    time.sleep(10)
    page.get_by_role("button", name="Stop webcam").click()
    page.pause()

    context.close()
    browser.close()

Instead of the “--use-file-for-fake-audio-capture” argument, we use “--use-file-for-fake-video-capture” and pass the relevant file path as the value for it. That's it.

You can only use a y4m file to play as the webcam.

Conclusion

Faking microphone and webcam input can become a useful tool in a QA toolset and can be applied in various web applications. Code snippets provided in this article can serve as a foundation for your own tests, so we encourage you to experiment with them. Successful testing!