Deep Learning, Machine Vision on MCUs - Introduction

Carlos Gustavo Merolla
Dec 6, 2020
6 min read

Updated: Mar 22, 2021

Introduction

If you explored the WildEdge site and the description of my main project Big Cat Brother , you will find out that I use a mixture of old school Machine Vision techniques and modern Deep Learning techniques, including tinyML. Before we get into the details about Deep Learning, I would like to say which is the biggest difference between the two models in my opinion.

Old School Machine Vision techniques vs. Deep Learning based Machine Vision

Machine Vision refers to the ability of a computer to recognize what is inside an image, Optical Character Recognition (OCR) recognize letters for instance. So, these techniques has been around for a long time.

Some old school Machine Vision techniques requires you to know what you are looking for, for instance, if you want to achieve bio-metric measures on a face then you should be focusing on the shadows and lights present in a picture, you will look for them in code by checking the color of one pixel and compare it against the adjacent pixels, a kind of primitive Kernel, similar to the one displayed in the picture, you will have an arbitrary number of expansions or out maps accordingly to the features you are looking for. To make it short, using old school machine vision you have the input picture (one) and the logic to search for features, for instance, once you detect that one pixel is clearer than the adjacent then you can put marks to specify the finding and then review all the out maps to detect the most illuminated parts of the picture to draw some lines and angles in order to apply Geometry functions. The focus is on logic, code. In some scenarios, I find this techniques more appropriated, specially if you are willing to automatically notify Rangers (always in a bad mood :) ) about some events; I say it is more reliable because you know exactly why your code decided to tag an image the way it did. No data samples, a lot of logic and .... math... always math, you cannot escape math!

Modern deep learning and convolutional Neural Networks, CNN for short, will do the same however you won’t get into those details yourself. Deep Learning and Machine Learning in general, is reversing the paradigm of Machine Vision. By applying Deep Learning you will provide pictures as samples tagged as the correct identifications and you will allow the Neural Network to understand (using math and statistical functions) what makes a picture to be something, in other words, the CNN will provide the rules for determining what is what. In my opinion, this can get somehow obscure sometimes because everything will depend on the quality of data you use for training the Networks. As you can see, there is little room for code here, you provide the data, the desired results, the NN model (or used a standard one) and the training process will provide the rules for classification, detection or whatever you are willing to do. This is why I find some obstacles to deploy 100% AI on devices based on Deep Learning, to explain why something is tagged the way it was is somehow problematic, however, it is an effective method and fast to deploy if you want, sometimes I refer to machine learning as the lazy arm of software engineering! Jaguars are lazy, so, why not.

The goals of these posts

As I always write, WildEdge is not meant to be a collection of random tutorials nor specialization tutorials on anything, in fact, I tend to write as simple as possible. The idea behind WildEdge is to share some knowledge that were useful to me to develop devices to protect wildlife, I don't have to say it, Jaguars to be specific!

So, these tutorials (all of them you find on the blog section) wants to give you tools to build your own tech4nature solutions; in addition to this, I am planning on releasing some non-critical components as open source components (i.e. the Smart Connected Collars). When I say non-critical I mean that I won't expose here (and nowhere) anything that can affect the integrity of my project (i.e. the Mesh comm protocol) or ... things that could put Jaguars at risk. I want to give you the basements for your own things, good things.

Deep Learning Basics

I won't get too technically correct, I'll use the Deep Learning term which is associated with CNNs and multi-layer networks models, however, on some scenarios regressions even fully connected networks can do a good job, specially, in the case where images are generated by machine themselves such as data plotting, this is important because this kind of images doesn’t contain the noise you find on outdoor pictures such as light differences, distortions caused by moving animals, and a long list of etc.

To set the right expectations, In these series of posts we will be defining and deploying CNN (fully connected nets too) to the Sony Spresense in order to understand what is contained in a picture, be it a picture taken with the camera or a picture generated by the board itself as the representation for other sensor data.

The first use case: Applying tinyML for motion pattern detection

tinyML means models optimized to be used on micro-controllers. You have the same work pipelines as in regular ML, however, the resulting models need to be tailored to be run on machines with limited resources, such as the available amount memory (critical) and processor power (less-critical).

Most people relates Computer Vision techniques with the identification of objects in a picture, be it a person or a Cat, however, almost any data that can be represented in a visual way (sound, motion, etc) and they can all fall under the domain of computer vision, and this use case is just one example of that.

Big Cat Brother, BCB for short, now includes collars for domestic animals tracking as an attempt to avoid conflicts between Jaguars and humans, if you want to know the details, goals, please refer to the details here. Some of the features of the collar are so general that can be used for almost anything and that is the motivation behind my plans on releasing an open source version and explaining the foundations here.

Besides the features based on GPS (some information about GPS and GNSS in general here) the basic idea is to put a collar on cows and detect what they are doing and where, if they are calmed or if they are running or moving in violent ways. For that purpose, the device (based on a Sony Spresense) has a ROHM’s accelerator sensor add-on (Also a magnetometer to be used in tandem with the GPS).

Those sensors will provide information about G forces being applied to three axis, X, Y and Z. The readings are simply floating point numbers ranging between 0.0 and … the maximum force being applied. This information will be consolidated as circular patterns corresponding to each measure and it will be used as the "training and validation datasets" for the NN model to specify the Motion Patterns.

First consideration for deploying AI on edge devices.

The term tinyML is not a buzzword, I might say, Micro-controllers are getting powerful, however, memory constraints are still important on edge devices. Taken that point into account, you have to try to optimize not only the Neural Networks but the data as well.

When you are in control of image generation, as in this case, you can freely choose the image size, quality, shape, and color scheme. There is no need to say that bigger colorful images will require more memory and that could be a problem no matter how you tune your application nor the resources on the board. On some scenarios, black and white pictures can do a good job, on others, gray scale or a relative small color-space can perform better according to the information you are trying to process.

I won’t ever publish the algorithm I use to detect Jaguar Spot patterns, good people are not the only ones interested on these posts, however, as an example, outdoor wild animal detection will run better using gray-scale images, specially, because they are taken by a camera and sunlight could saturate black and white images making the information fuzzy, no matter the auto balance your cameras could have on edge devices.

In this particular case, black and white circular patterns will do the job while maintaining the memory requirements pretty low, the network I am going to show you implies less than 40KB on the Spresense (which has 1.5MB for the rest of your code and some built-in features such as the GPS, the audio capabilities in the case you use it, etc).

The goal of the demo is to detect between soft or normal motions or violent ones. A cow falling will generate patterns similar to the violent samples presented here, while a cow walking will generated more rounded shapes.

Even when it is not the focus of this site to talk about potential industrial uses of this technologies or techniques, by using this technique you can detect almost anything, from a person doing their exercises in a proper manner, to analyze your tennis style or even detecting if some engine is having unusual vibrations. The last scenario implies a huge monetary effect because if you can detect that some part of a system is about to break then you will save tons of money that is usually associated with a sudden failure in a production line.

The result for this demo can be seen in this video, on the next post, I will cover the implementation details.

Stay Tuned!

Wild

Edge

Big Cat Brother