v1.1.0
 All Classes Namespaces Functions Variables Typedefs Enumerations Enumerator Modules Pages
Performance

Tips and best practices for performance optimization

Introduction

The Execution Environment used by Sifteo Cubes is fairly resource constrained, meaning that most games will at some point have to face performance optimization challenges. If you're used to optimizing C++ code on a small portable platform, many of those skills will serve you well in optimizing Sifteo applications as well. But in many ways this is a very unique platform. This guide covers many of the specific optimization techniques for Sifteo Cubes.

There are four types of common bottlenecks:

Let's cover each one individually.

Flash Bandwidth

This is a common bottleneck on the Sifteo cubes, so it deserves some detailed attention. Since the the Base has such a tiny execution environment, it must be "fed" code and static data in small chunks from external Flash memory. And of course, the bandwidth used to make these transfers is limited, so it's easy to choke if you are paging lots of code.

The primary tool to reduce the load is a small 16K cache. By keeping your code small enough to fit in this cache most of the time, you can reduce the load on flash bandwidth and make the cubes sing!

The best way to optimize here is adopt a style of code that lends itself to fitting within your cache. These rules all boil down to keeping code linear as much as possible. If we spend the time to page in a block of code, as much of that code as possible should be relevant to the current gameplay. Time spent paging in dead code is wasted time.

So how does one code to maximize the cache?

This last point encourages a practice called Data-Oriented Design. In summary, learn to process your updates based on the type of data it uses instead of its logical hierarchy. For example, don't do this:

1 for (Entity e : entites)
2  if (e.IsRoom())
3  e.ProcessRoom();
4  else if (e.IsView())
5  e.ProcessView();

Instead, do something like this:

1 for (Entity e : ListRooms(entities))
2  e.ProcessRoom();
3 for (Entity e : ListViews(entities))
4  e.ProcessView();

The trick is to build BitArrays that can quickly tell you which entities need which types of updates. The ARM processor in the base has a handy-dandy instruction called CLZ (Count Leading Zeroes) that makes this kind of iteration super fast.

There are two main source of metrics to determine how well you are utilizing the flash cache. Running in siftulator with the –svm-flash-stats option will log cache hits and misses. There are also Lua hooks for catching flash misses programmatically.

So what are some techniques to better utilize the cache?

CPU

Most Sifteo games are not CPU bound. Flash bandwidth and graphics performance are much more common sources of slowness. But still, watch out for types of operations that place a large load on the CPU:

Decompression bottlenecks

There are several places where the system may spend CPU time to decompress data from flash:

Normally these are not bottlenecks, but if they do become too much of a CPU burden, you may be able to selectively disable compression on the problematic data (such as by using FlatAssetImage or uncompressed PCM audio), or you can pre-decompress it (i.e. by storing images in a TileBuffer).

Floating point math bottlenecks

The Base's CPU has no hardware floating point coprocessor. It is super fast at integer math, even at integer multiplication. But floating point will always be a bit pokey.

For human-timescale operations, such as animation that happens once or a few times per frame, this shouldn't be a problem. But if you have any inner loops or very math-heavy algorithms, consider avoiding floating point in those cases, and using integer math or fixed-point instead.

In particular, math library functions like sin(), cos(), sqrt(), and log() are especially slow. When possible, avoid these operations or pre-compute the values you need.

If you're using trig heavily but you don't need amazing precision, consider the table-driven alternatives: tsin() and tcos(), or their fixed-point counterparts tsini() and tcosi().

Remember, integer math is very fast! Integer math runs at full native speed in SVM, without either the software floating point overhead itself, or the virtualization overhead inherent in calling into the floating point library.

Pro Tip! Use Sifteo::TimeDelta for frame time delta instead of floats. Floats are convenient, but beyond that it's rarely a good idea to use them for time.

Memory load/store bottlenecks

Due to details of the SVM architecture, some types of memory are vastly speedier than others:

The compiler will naturally try to keep things in fast memory whenever possible, but due to the semantics of memory access in C++ it isn't always possible for the compiler to optimize out loads and stores. So it can help the optimizer to keep data in local variables instead of repeatedly accessing class members.

Graphics

Along with flash bandwidth, graphics performance is the other common bottleneck in Sifteo games. The graphic engine is distributed to maximize performance, but you have to keep in mind what it's doing in order to not get in it's way.

So what are some techniques to optimize graphics?

Cubes do not render synchronously, but certain function calls (namely System::finish) can force them to wait and sync up. Some types of rendering require this behavior, but it is also easy to call finish more often than you need to. Also watch out for functions that implicitly call finish like VideoBuffer::initMode and BG1Drawable::setMask. Only call finish when you need it (for example, before some dependent BG1 calls).

Overdraw

In this case, overdraw refers to any situation where you're drawing data to a Sifteo::VideoBuffer which is different from the final content you intend the VideoBuffer to contain at the end of the frame.

For example, this operation causes overdraw:

1 // Overdraw! AVOID THIS!
2 vid.bg0.image(vec(0,0), Background);
3 vid.bg0.image(vec(4,2), Door);

If you were do to this every frame (which itself is probably a bad idea if this screen is not changing) you would be repeatedly marking the Door's tiles as changed, causing them to go out over the radio every frame, and the cube to continuously re-draw the screen.

You can tell if this is happening by looking at the frame rate counter under each cube in Siftulator. If it shows a nonzero FPS count even when the scene is idle, you're probably causing overdraw. This will waste radio bandwidth and waste battery power!

Additionally, overdraw can cause flickering. If we happen to send the contents of the VideoBuffer over the radio between these two draws, the cube will momentarily show the background but not the door, causing the door to flicker. These types of bugs are often highly timing-dependent, and it's common for them to be visible in hardware but not in simulation, or vice versa.

There are multiple ways to solve overdraw:

If you have the memory to spare, double-buffering is usually the easiest way around overdraw. For example:

1 // Using a TileBuffer to avoid overdraw
2 TileBuffer<16,16> buffer;
3 buffer.image(vec(0,0), Background);
4 buffer.image(vec(4,2), Door);
5 vid.bg0.image(vec(0,0), buffer);

Radio

Radio communication between the base and cubes can also be a bottleneck. However, this is rare since the system is designed to minimize radio transmissions. But watch out if you are continuously transmitting data (for example, with non-stop changing of frame buffer mode pixels).

You can take a peek at radio traffic using the siftulator option –radio-trace. The format is not documented, but you can see how many packets are going to what cubes and how full those packets are.

So what can you do to optimize radio traffic?