Deploying machine learning on resource constrained devices such as microcontrollers, especially in remote environments presents significant challenges. We explores solving these challenges in the framework of TinyMLOps, focusing on enabling live model updates and predicting inference latency on STM32 microcontrollers. The implementation of a machine learning algorithm can vary based on the hardware. We use PyTorch to train and test the model. For deploying the model on the hardware, we utilize STM32Cube.AI since our hardware is STM32 microcontroler. The implementation process is as follows:
As shown in the above Figure, the process starts by converting the desired model into the ONNX format and passing it to STM32Cube.AI, a framework developed by STM32 to optimize and deploy machine learning models onto STM32 microcontrollers. STM32Cube.AI further improves the model by optimizing it for memory usage, and it also generates the corresponding code. Firstly, we analyze and create a call graph of the generated code to gain a deeper understanding of its structure and functionality. The Figure below displays the call graph of the generated code, which helps us identify how to refactor it to meet our specific requirements, such as how the model is presented, how input is fed into it, and how the output is stored.
After understanding the structure of the code, We deployed the model into the target hardware. We determined how the model is stored in the generated code and identified the process for live updating the weights.