Audio Recognition with Embedded Machine Learning

How do Harry Potter, TinyML and an Arduino match each other, you may wonder? Follow along and see how to become a wizard using embedded machine learning.

Recently I got aware of something called tiny machine learning or TinyML. TinyML is basically the latest technology for embedded systems, where deep learning and tiny devices are combined to create something very potential. Traditional machine learning often runs on big powered hardware somewhere in the cloud and therefore requires a good amount of resources.

With TinyML it is now possible to run machine learning models on the edge with devices like the Arduino nano 33 ble, which fits literally everywhere and only consumes about 20 mA.

The Arduino Nano 33 BLE has an integrated 3 axis acceleration sensor, a microphone as well as an Bluetooth interface

The most common application of this technology is the recognition of wake words like „Hey Siri“ or „Alexa“ by known smart devices. If those devices are in standby mode, a low powered chip is constantly waiting for you to say the wake word to power up the main CPU of the device.

In December 2019, Peter Waren and Daniel Situnayake made this public by releasing a book, which explains machine learning with TensorFlow Lite on Arduino and Ultra-Low-Power Micro-Controllers. Since then, things got even easier with the launch of Edge Impulse: TinyML as a service to enable machine learning for all developers. It provides the full TinyML Pipeline, starting by collecting data, building a model and deploying it to the device. As a result, the implementation of motion- and sound recognition or finding anomalies in sensor data became more accessible.

So I decided to build a voice recognition model with edge impulse!

Because I am quite a big fan of Harry Potter, I thought it would be fun to build my own magic wand. If you are not into Harry Potter, let me tell you, that „Lumos“ is the spell for creating light. On the other hand „Nox“ makes the light disappear.

Data acquisition

To get started with every machine learning project, you got to have some data in-store to train your model later on. Because I could not find datasets of people saying „Lumos“ or „Nox“ online, I had to sample the data myself. Thanks to Edge Impulse, this was not really a problem. After installing the Edge Impulse firmware on my Arduino and some further installations on my desktop PC I was able to record data and save it directly to my Edge Impulse project.

Data acquisition in Edge Impulse

So I took some time and recorded myself 12 minutes of data, which can be divided into 3 labels. Because I want my model to recognize me saying a spell name, I recorded myself ~4 minutes of saying „Lumos“ as well as“Nox“. Because the model also needs to recognize, what silence or noise looks like, I had to record ~4 min of noise as well.

Building the model

After collecting a sufficient amount of audio data, it was time to build our machine learning model. You can do this by adding pre-defined blocks to your „impulse-pipeline“:

Impulse Design

At first, I had to configure my time-series data, which defines the window size and the window increase. The window size is simply a frame, where your recorded audio data is organized in. This setting also defines the time, the microphone of the Arduino is recording after the deployment. The window increase is an offset, which splits you recorded audio into even more data. The lower the window increase, the more data you are going to receive for training. 200 milliseconds worked pretty well for me.

After cutting the data in parts, these window sized samples are going to be processed by an MFCC block. This is a well known processing for generating features from audio using Mel Frequency Cepstral Coefficients. The following picture shows the output of a recorded sample:

DSP result of noise

This MFCC processing converts the raw audio signal in a frequency bucket matrix, where features can be extracted for learning our model.

Edge Impulse offers a pretty cool view to observe all processed features of your recorded data. Mine looked like that:

Feature explorer view – lumos, nox and noise

One can see, that the processed features of each label are more or less separated, which is quite promising for our model.

At last, I am adding a Nearest Neighbour classifier for my learning block in the „impulse pipeline“ by changing some training parameters, I am pretty much done building my machine learning model. It is time to let the model train!

Training the model

Training performance

Sweet! Those results are looking pretty good. I was aware of the effect of overfitting my data but some live classification assured me, that the model is working not too shabby. In this article, I am not going over the process of live classification and retraining your model. I just want to mention, that this also works quite well.


There are several possibilities to deploy your generated model to your device. One of them is creating a Arduino library in Edge Impulse and importing it in the Arduino IDE. This library comes with examples for testing your model, which makes it extremely easy to adjust the code for your specific use case.


Like I said in the beginning, I wanted to light up something, when saying „Lumos“ or shut the light, when saying „Nox“. I also wanted the microphone to only record, while a button is pressed. This is why I added some code and did some soldering on my Arduino. I really wanted this project to be fully embedded, so a battery block is attached at the back of my „wand“. So here it is:

poor mans wand

Okay, it may not be as pretty and powerful as Dumbledores Elderwand, but it is quite functional. 🙂

Let me show you:

poor mans wand in action

I am sure, they are some more easy ways to detect words or speech in general, but this project was just about trying out TinyML. Just think about how easy and cheap you could realize predictive maintenance with these tools. I am really excited, what people are going to build with this technology.

I am a student from Berlin with an affinity for computer technology and a bachelor's degree in electrical engineering. In 2021, I will finish my master's study in Computer Engineering. The process of developing hard- and software has been and still is an extremely fulfilling joy for me. I am pretty much interested in the whole stack, from embedded to web, and everything in between. This is why I tend to do some side projects from time to time, which I want to share with everybody. Thanks for coming by :)