Part 2: Creating a Simple Keras Model for Inference on Microcontrollers

Author: Marko Sagadin, student intern at IRNAS

Institute IRNAS
The Startup

--

Welcome to the second article about running machine learning algorithms on microcontrollers.
In the previous article, we have created and trained a simple Keras model that was able to classify 3 different classes from CIFAR-10 dataset.
We looked at how to prepare that model for running on a microcontroller, quantized it and saved it to disk as a C file.

It is now time to look at how to setup the development environment and how to run inference on an actual microcontroller.

For this we will need two things to follow along. First, we will need to clone MicroMl project, which will provide us with the example code.

Second, we will need some kind of ARM development board. Example code uses ST’s Nucleo-F767ZI board. For demonstration, we will stick with the preferred choice, as we can run the example code as it is, no modifications needed.

If we do not have that specific board at hand, no need to worry! MicroML was created with customization in mind, so we will show you how to modify code so you can make it run on the development board of your choice.

One thing before we start, it is recommended to read the prerequisites section of README.md of MicroML, so you can see what you will need to get this project running on Windows or Linux.

1. Setup

We will start by cloning MicroMl repo recursively, and changing to keras_article branch.

git clone --recurse-submodules https://github.com/SkobecSlo/MicroML.git
git checkout keras_article

NOTE: MicroML pulls in whole TensorFlow library, which is around 2.5 GB big. For this step to take some time is perfectly normal.

TensorFlow setup

Before we can even compile a simple example for our target, we need to run a hello_world example that will be executed on our host machine. This is needed because makefiles written by TensorFlow team pull in several necessary repositories from third parties, which are missing by default. After compilation, these libraries can be found under tensorflow/lite/micro/tools/make/downloads. To get them, we first move inside tensorflow folder and run make:

cd MicroMl/tensorflow
sudo make -f tensorflow/lite/micro/tools/make/Makefile hello_world

This will call several scripts that download the libraries. Then it will compile the source files for hello_world example. It might take a while, but we have to do this step only once.

In short, what the TensorFlow hello_world example does is that it is feeding values into a model that approximates the sine wave function. Both the input and output values are continuously printed. You can see the program in action by running

./tensorflow/lite/micro/tools/make/gen/linux_x86_64/bin/hello_world

libopencm3 setup

Libopencm3 will provide us with libraries for our targets, it will also generate necessary linker and startup files. Visit their GitHub page and make sure that your target is supported.

To generate all necessary files just run the following command inside MicroML’s root directory. Again, this is needed just once.

make -C libopencm3

2. Building CIFAR-10 project

Before compiling CIFAR-10 project we need to build microlite.a and testlite.a files.

This is done with following two commands, execute them from main directory:

make -C tensorflow/ -f ../archive_makefile PROJECT=cifar_stm32f7
make -C tensorflow/ -f ../archive_makefile PROJECT=cifar_stm32f7 test

These two commands compile all TensorFlow specific source files with specific microcontroller flags (by default for Nucleo-F767ZI) and create microlite.a and testlite.a files.
The former is used in linking step when we are building the code for our specific target, whereby the latter is used in linking step when we are building test code that will run on our development machine. Running tests on the development machine while we are building our application is preferable so we can avoid unnecessary flashing cycles to the device.

To run CIFAR-10 code on our development machine we move into projects/cifar_stm32f7 folder and run make test. The code will be compiled and executed. You will see output similar to this one:

$ make test 
SIZE test_build/test_firmware
text data bss dec hex filename
254341 2864 51296 308501 4b515 test_build/test_firmware
Testing TestInvoke
Input:
Dimension: 4
First Dimension: 1
Rows: 32
Columns: 32
Channels: 1
Input type: 9Output:
Dimension size: 2
First Dimension: 1
Rows: 3
Output type: 1Picture 0
[[0.000000 0.929688 0.070312]]
Inference time: 12.585000 ms
Picture 1
[[0.000000 0.000000 0.996094]]
Inference time: 7.100000 ms
Picture 2
[[0.000000 0.000000 0.996094]]
Inference time: 7.040000 ms
Picture 3
[[0.000000 0.000000 0.996094]]
Inference time: 6.987000 ms
Picture 4
[[0.996094 0.003906 0.000000]]
Inference time: 6.904000 ms
Picture 5
[[0.812500 0.187500 0.000000]]
Inference time: 7.114000 ms
1/1 tests passed
~~~ALL TESTS PASSED~~~

If you check output values, they should match the ones from the previous article (see section 14. Testing model with python interpreter). This means that our neural network is working as expected.

To run the same code on NUCLEO-767Zi, we can run make flash and open serial terminal to see the output. The output should be similar to the one above, but inference time will be longer.

3. Preparing image files

Before jumping into code explanation, it is important to say a word about how picture files were changed. When we converted our Keras model to tflite model, we specified that we want our inputs to be type of singed integer of 8-bit width. By default they would be floating-point which is unnecessary for our purpose. When we fed our picture data to xxd tool, pictures were in int8 format, as we cast them like that. Xxd tool by default puts data into an unsigned char array, like so:

unsigned char picture0[] = {

It is necessary to change that to:

const signed char picture0[] = {

That way we won’t have problems later when we will be feeding picture data to the TensorFlow interpreter.

To make our picture arrays available to the main program we also need to create a header file where we declare our picture files. This header file should be included in each picture.c file.
Likewise, CIFAR model c file should also have its own header file.

4. Code explanation

Now we will explain step by step how the code works. This will be a walk-through through the code that is written in cifar_test.cc, which is the source file that we used for our running program on our development machine. We will not be covering the version of Nucleo-767ZI code that is inside main.cpp as it is the same, except for the hardware-specific sections.

We first include several TensorFlow specific header files. These will give us access to a small unit test environment, micro interpreter and operator resolver. We also include our cifar model and pictures. In model_settings.h we specify the size of our picture input.

#include "tensorflow/lite/c/common.h"
#include "tensorflow/lite/micro/kernels/all_ops_resolver.h"
#include "tensorflow/lite/micro/micro_error_reporter.h"
#include "tensorflow/lite/micro/micro_interpreter.h"
#include "tensorflow/lite/micro/testing/micro_test.h"
#include "tensorflow/lite/schema/schema_generated.h"
#include "tensorflow/lite/version.h"#include "tensorflow/lite/micro/micro_mutable_op_resolver.h"
#include "tensorflow/lite/micro/kernels/micro_ops.h"#include <stdio.h>
#include <time.h>
#include "cifar_model.h"
#include "pictures/pictures.h"
#include "model_settings.h"

We need to define the size of our tensor arena, which takes some trial and error. The simplest method is to start with a big value and gradually proceed with assigning a smaller one until the program fails to compile. For our specific case, 50Kb of memory were enough.

constexpr int tensor_arena_size = 50 * 1024;
uint8_t tensor_arena[tensor_arena_size];

Below we can see how picture data is loaded as an input.

void load_data(const signed char * data, TfLiteTensor * input)
{
for (int i = 0; i < input->bytes; ++i)
{
input->data.int8[i] = data[i];
}
}

Because input->data is a union structure which contains all data types, we can reuse the same line to load different type of data, like floating-point for example:

input->data.f[i] = data[i];

We wrap the program with TF_LITE_MICRO_TESTS_BEGIN and TF_LITE_MICRO_TESTS_END, which will enable us to do assert tests (they are extremely useful when you are starting with new, unknown code).

Main program starts with the code below:

TF_LITE_MICRO_TESTS_BEGIN
TF_LITE_MICRO_TEST(TestInvoke) {
// Set up logging.
tflite::MicroErrorReporter micro_error_reporter;
tflite::ErrorReporter* error_reporter = &micro_error_reporter; // Map the model into a usable data structure. This doesn't involve any
// copying or parsing, it's a very lightweight operation.
const tflite::Model* model = ::tflite::GetModel(cifar_quant_8bit_tflite);
if (model->version() != TFLITE_SCHEMA_VERSION) {
TF_LITE_REPORT_ERROR(error_reporter,
"Model provided is schema version %d not equal "
"to supported version %d.\n",
model->version(), TFLITE_SCHEMA_VERSION);
}

We create an instance of error_reporter which we will use to print out any errors that happen.
We load our tflite model into the model instance and check if model of version matches the version of schema. This is needed to make sure that our model is interpreted correctly.

Next, we need to prepare operator implementations for different layer functions like convolution layer, fully connected layer and activation functions like relu or softmax. This can be done either by calling AllOpsResolver, which pulls all implementations, or calling MicroOpsReslover and then specifying each operator individually. In our case, it makes sense to use the MicroOpsResolver. This way we can call only operators that we need and therefore save space.

tflite::MicroOpResolver <6> micro_op_resolver;
micro_op_resolver.AddBuiltin(
tflite::BuiltinOperator_CONV_2D,
tflite::ops::micro::Register_CONV_2D(),
3 //version number
); micro_op_resolver.AddBuiltin(
tflite::BuiltinOperator_MAX_POOL_2D,
tflite::ops::micro::Register_MAX_POOL_2D(),
2 //version number
);
micro_op_resolver.AddBuiltin(
tflite::BuiltinOperator_RESHAPE,
tflite::ops::micro::Register_RESHAPE()
);
micro_op_resolver.AddBuiltin(
tflite::BuiltinOperator_FULLY_CONNECTED,
tflite::ops::micro::Register_FULLY_CONNECTED(),
4 //version number
);
micro_op_resolver.AddBuiltin(
tflite::BuiltinOperator_SOFTMAX,
tflite::ops::micro::Register_SOFTMAX(),
2 //version number
);
micro_op_resolver.AddBuiltin(
tflite::BuiltinOperator_DEQUANTIZE,
tflite::ops::micro::Register_DEQUANTIZE(),
2 //version number
);

When we are not sure which exact operators to use, we can just call micro_op_resolver without calling AddBuiltin method and compiler will tell us what we need to add and also which version of operator to add.

After all the setup we can finally create our interpreter and allocate tensors for it:

// Build an interpreter to run the model with.
tflite::MicroInterpreter interpreter(model,
micro_op_resolver,
tensor_arena,
tensor_arena_size,
error_reporter);
interpreter.AllocateTensors();

Next set is important for verifying that our model is exactly how we expected it to be:

// Get information about the memory area to use for the model's input.
TfLiteTensor* input = interpreter.input(0); // Make sure the input has the properties we expect.
TF_LITE_MICRO_EXPECT_NE(nullptr, input);
TF_LITE_MICRO_EXPECT_EQ(4, input->dims->size);
TF_LITE_MICRO_EXPECT_EQ(1, input->dims->data[0]);
TF_LITE_MICRO_EXPECT_EQ(kNumRows, input->dims->data[1]);
TF_LITE_MICRO_EXPECT_EQ(kNumCols, input->dims->data[2]);
TF_LITE_MICRO_EXPECT_EQ(kNumChannels, input->dims->data[3]);
TF_LITE_MICRO_EXPECT_EQ(kTfLiteInt8, input->type);

TF_LITE_MICRO_EXPECT_EQ are basically assert macros which will check if two input parameters are equal to each other. We know that we are expecting input tensor of shape [1, 32, 32, 1], which can be checked with size and data members. Rows, columns and channels are defined in model_settings.h file. We can see that input type has to match int8 as we set it during our conversion of tflite format. We can do the same check with our output tensor.

After making sure that everything as it is supposed to be we, can start executing our model.

load_data(picture1, input);
start = clock();
interpreter.Invoke();
end = clock();
output = interpreter.output(0);
print_result("Picture 1", output, end-start);

Above we see that we load our picture into input tensor, call invoke method and print result with a helper function. We are also measuring how long the inference took. We can run this sort of block as many times we want with different inputs. We then only finish the program with TF_LITE_MICRO_TESTS_END macro.

5. Porting cifar project to your microcontroller

MicroML was designed with idea that changing platforms should not be difficult. As long as you plan to use a microcontroller supported by libopencm3, porting should not be hard. First thing that has to be changed is the project.mk file. At the start of the file, there is a DEVICE variable that is in our example set to stm32f767zi, which is the microcontroller used on NUCLEO-767zi board. This variable can easily be changed to something else (e.g. stm32f405vg ).
he
Changing the DEVICE requires also to adjust all hardware-specific functions which deal with clock, systick, uart and gpio setups. Datasheets and other resources available online are your best friends in this process. We then need to rebuild microlite.a and testlite.a files, as mentioned earlier.

To get a better sense of how to use MicroMl for your specific needs check Building your projects section of MicroML’s README.

Final thoughts

This has been quite an involved process from start to finish. We have first created and trained our neural network model and converted it into a microcontroller friendly format. Then we have prepared a development environment and tested our model, first on a computer than on a microcontroller.

The field of machine learning on embedded systems is very new and rapidly changing. What is true today, might be obsolete next month. What makes it even more challenging is that it is combining two disciplines that used to be separated from each other.

Many machine learning engineers have no experience with constrained embedded environments, where speed and size matter the most. Most of the embedded programmers also never needed to use tools like Colaboratory or frameworks like TensorFlow to work on their projects.

With a high rise in demand for low-power and low-bandwidth applications (e.g. IoT), machine learning applications on embedded systems will have to match the pace with their development.

Rapidly changing fields such as this one always require engineers to dig deeper and figure out all intricacies of the technology that they never had to deal with in order to make things simpler and more efficient at a later stage. But we are not there yet. Whichever way we look at it, in the long and short term, the people who manage to overcome the struggles of going out of their comfort zones in the development process will benefit the most.

--

--

Institute IRNAS
The Startup

We are applying today’s knowledge to create systems for an open future.