Wednesday, November 28, 2018

newlib's mallocs

Context

In this post I will mostly be dealing with the two implementations available in the newlib libc implementation for ARM platforms with no MMU/OS newlib/libc/stdlib/mallocr.c and newlib/libc/stdlib/nano-mallocr.c. These are used in the nosys.specs and nano.specs linker options respectively.

A Quick Comparison

malloc nano-malloc
best-fit first-fit
lower fragmentation potential higher fragmentation potential
128 bins (1 kByte+ RAM overhead) 1 bin (free_list)
16 byte minimum chunk size 8 byte minimum chunk size (for alignment)
~2.4 kByte program space < 128 bytes program space

What is malloc?

Simply stated, malloc is the routine in the standard C library (libc) to allocate and manage the heap (or “free store”).

Why malloc?

malloc (of one type or another) is required in order to support dynamic memory allocation. In the embedded world, static allocation is typically the memory that is used most heavily. When it is possible to use stack memory, use it. However, there are instances when object sizes are not known at compile time, object sizes may change after instantiation, or the lifetime of a dynamic object must be independent from the scope in which it is instantiated. For this type of object, dynamic memory allocation is needed.

Why Care About Different Implementations of malloc?

Embedded systems are “resource constrained” meaning they don’t have the speed, data, or program space that a “desktop” system has. One of the most important implications of this fact is the lack of virtual memory. When a desktop system runs out of RAM, it can simply swap some of the data currently held in RAM to some other memory (the hard drive, for example). Embedded systems do not have this capability. Therefore, when an embedded system runs out of RAM, bad things happen. The end result is that the implementation of malloc must be fast, efficient, and control fragmentation.

What is the “Standard” Implementation of malloc?

The newlib malloc implementation is based (very closely) on the memory allocator proposed by Doug Lea. The high-level categorization of this allocator is “best-fit with coalescing”.

The glibc version appears to be a bit more complex on a first look while still originating with Doug Lea’s design. It appears to be much more optimized for speed at the expense of complexity and efficiency.

Commonalities

If there are no free chunks large enough to satisfy the memory request, sbrk() is called. This is a system call from the Unix universe with the purpose of increasing the size of the available heap from system memory. Both of the malloc implementations discussed use the same sbrk() and I was shocked to see that it has no check for the end of the heap memory section. This feels like a very serious, obvious, and easily fixed oversight, but I will be looking for some feedback from the community concerning this.

What is Different about nano-malloc?

The high-level categorization of the nano malloc is “first-fit with coalescing”. In general, first-fit is fast and small (in code size), but has the drawbacks of being less RAM efficient and increasing the likelihood of fragmentation. Additionally, the block list is implemented using a singly-linked list rather than doubly-linked lists.

malloc Analysis

Overhead

  • Uses bins.
    • These bins allows for a faster best-fit algorithm at the expense of RAM.
    • Each of the 128 bins is a doubly-linked list of the chunks of a specific size. This requires malloc to hold an array of pointer pairs (generally 4-bytes each pointer) to the head of each bin.
    • RAM usage: 1024 bytes

malloc Routine

A simple algorithm can be used to determine which bin should be searched first (“target bin”) for an appropriately-sized free chunk. In the case of bins for sizes less than 512 bytes, all the chunks within each bin is exactly the same size. Therefore, as long as there is a free chunk in the target bin, the first chunk is immediately used. If no free chunk is available i the target bin, the search progresses to the next bin and so-on until a free chunk is found and is split if it is large enough for the remainder to support a minimum-sized chunk. Alternatively, in the case of bins of sizes >= 512 bytes, each bin’s chunks are sorted from smallest to largest. This makes the first chunk that is found which satisfies the memory request size the best-fit option.

free Routine

First, the memory neighboring the to-be-freed chunk is checked for already-free chunks that can be coalesced. The resulting chunk (either the original to-be-freed chunk or a larger, coalesced chunk) is inserted into the appropriate-size bin (in order of size in the case of bin size >= 512 bytes).

nano-malloc Analysis

nano-malloc Routine

The free_list is searched from the beginning for the first block that is large enough. If the block found is large enough to be split with the remaining still large enough to fit a valid, minimum-sized block, that is done. Otherwise, the entire block is removed from the free_list.

One can see how this might lead to additional fragmentation (over a best-fit algorithm) due to the potentially higher likelihood of splitting larger blocks into smaller ones.

nano-free Routine

The free_list is searched from the beginning to find the insertion point for the to-be-freed block (free_list is sorted by address). This leads to the question, “Since free_list is sorted, why not sort it by size instead of address to implement best-fit rather than first-fit?” I imagine one of the biggest reasons is malloc time, but I personally feel that it would be worth it to reduce the potential for fragmentation (I see fragmentation as being the single most influential reason to avoid or eliminate heap use in embedded systems). The worst case malloc would still be iteration through the entire free_list, but best-fit adds the length of the requested memory as a determining factor (unless using a memory pool as in the standard implementation).

Written with StackEdit.

Saturday, November 24, 2018

Perfect Portability

Any bright-eyed, naive developer exploring the boundaries of their knowledge might have a vision of a utopia of perfect portability. A world where your application code is 100% hardware agnostic; a world where whether programming a server, a desktop or mobile app, or an embedded device, the models, data structures, algorithms, and practices are exactly the same. When considering dynamic memory allocation on embedded systems, one might think, “Surely we should be able to write portable code that can utilize nearly all of the standard C/C++ library routines and practices. Maybe we have to forgo filesystems, but these people advising everyone to avoid heap allocations on embedded systems are just old, jaded embedded developers who aren’t up with the times.” (not that I have ever had these thoughts 😉)

They’ve got heaps, we’ve got heaps, together we use heaps.

If we have a heap on our embedded system (which we generally can), why not use it?

The core issue, as I see it, is the user interface. When a desktop system (or mobile device) runs out of memory, it can notify the user who can then, typically, take some action to free up some memory. Alternatively, in the case of mobile devices (at least Android), the device can dynamically kill old processes that are still loaded in the background. The user of an embedded system, on the other hand, just knows that their device isn’t working properly. Whether it keeps resetting or starts blinking some red light, not only do they not know what the problem is, but they also have no options to solve the problem.

This quickly becomes an issue of determinism. I come from a background of bare-metal systems in safety-critical applications. In my mind, any device which the user cannot take steps to “optimize” must be deterministic (or at least perceived as). By this, I mean that the device should know exactly how much memory it will ever need and how long it will take to complete any possible task. For example, any design in which the fragmentation rate of the heap doesn’t quickly reach 0 (actually 0, not asymptotic) is unacceptable. In practice, deterministic memory use is the rule anyway. Desktop and mobile applications have to deal with many dynamic conditions such as the hardware they are running on, the other applications or system processes running concurrently, complex user inputs, etc. These conditions do not exist in the embedded space.

The use-case still remains that an object my a lifetime unconstrained by the scope in which it was instantiated. The easy way to solve this is to make those objects global, but that is certainly not best practice.

Just call me jaded…

Written with StackEdit.

Wednesday, November 14, 2018

Personal Project

Project Introduction

I have started a personal project to implementing a simple Bluetooth Mesh client using the Nordic nRF52832 microcontroller with C++.

Why the nRF52832?

The Nordic nRF52832 is a very popular, multi-protocol wireless MCU. I want to get some experience with this device so I can be confident in my use of it on work projects.

Why Bluetooth Mesh?

Bluetooth Mesh is a relatively new wireless mesh protocol. Others, like Zigbee, Thread, and Z-Wave have been in the market for much longer. I have completed a number of designs with Zigbee and overall, I have been unsatisfied with the implementations I’ve worked with. I think that the Zigbee 3.0 specification is a step in the right direction, however. The research I have done on Bluetooth Mesh has excited me that, perhaps, someone has created a protocol that learned from the mistakes of the others. In addition, the ability to start using a Bluetooth Mesh device straight out of the box with a smartphone or other “Classic Bluetooth” device is very helpful.

Why C++?

In the beginning of my career, I wrote assembly. It was quite satisfying at first. I was able to define the exact behavior of the processor. Eventually, as the problems got more complex, I did move to C. While using C, I often thought, “Why did I ever program in assembly?”. As problems have wont to do, they continue to get more and more complex. In response, I began investigating C++. I now use C++ exclusively if at all possible. Again, I look back at my years with C and often think, “Why did I ever program in C?”.

There are a few key reasons I can point to for my current attitude:

  1. low-cost, zero-cost, and negative-cost abstractions - Abstractions make code easier to read, maintain, and test. Without abstractions, you have spaghetti code in which dependencies are tangled and diffused throughout your application code. C++ allows development of “business logic” at a high level (even using a DSL) while still allowing “opening the hood”, as they say, to optimize the details.
  2. strong typing - Strong types make your code safer and more expressive. Consider the example void setDuration(int duration) vs void setDuration(Seconds duration).

Project Goals

  1. Gain familiarity with the Nordic nRF52832 platform and software utilities.
  2. Exercise design best practices including appropriate documentation, defining system architecture(s), and VCS usage.
  3. Exercise code best practices including appropriate TDD, abstractions, and modern language features.

Written with StackEdit.

Introductions

Who am I?

Broadly speaking, I am an electronics engineer. I have been designing electronics in one form or another for well over a decade (since about 2004) and now work as a freelance designer. My emphases are good communication and high quality.

Why a Blog?

I intend this blog to cover technical and professional topics. It is primarily for my own benefit as a place to exercise good communication as well as a place to explore my understanding of topics. I am a firm believer that to learn something well, one should teach it.

Notes

This first post is dry and stuffy because I’m rather nervous about starting a blog. I will likely get more comfortable as I add more blog posts and my personality and sense of humor will begin to show through. For this, I can only apologize.

Written with StackEdit.