Executable Optimization
When coding in languages such as C or C++, or any other compiled language, I always aim to produce the smallest executable. The reason for that might be an old-school mind created in the floppy disk era where 1.44 mb was enough. Nowadays, it is easier to compile a code that produces executables pretty big even though the software doesn’t do much.
The goal of this “tutorial” is to show the steps that I came up with to produce the smallest executable. All the things shown here were done using Windows 11, MSYS2, VS Code and C language, but they can also be applied to many other situations.
The testing code uses SDL2 library to show a window that will be open for a few seconds only. Not that matters for this tutorial but, I’ll use it as a basis for other projects. Next is the boiler-plate code that I used:
#include <SDL2/SDL.h>
#include <stdbool.h>
#define SCREEN_WIDTH 64
#define SCREEN_HEIGHT 32
#define SCREEN_SCALE 10
uint8_t display[SCREEN_WIDTH * SCREEN_HEIGHT];
SDL_Window* window = NULL;
SDL_Renderer* renderer = NULL;
SDL_Texture* texture = NULL;
bool initializeSDL() {
if (SDL_Init(SDL_INIT_VIDEO) != 0) {
fprintf(stderr, "SDL_Init error: %s\n", SDL_GetError());
return false;
}
window = SDL_CreateWindow("SDL Test",
SDL_WINDOWPOS_UNDEFINED,
SDL_WINDOWPOS_UNDEFINED,
SCREEN_WIDTH * SCREEN_SCALE,
SCREEN_HEIGHT * SCREEN_SCALE,
SDL_WINDOW_SHOWN);
if (window == NULL) {
fprintf(stderr, "SDL_CreateWindow error: %s\n", SDL_GetError());
SDL_Quit();
return false;
}
renderer = SDL_CreateRenderer(window, -1, SDL_RENDERER_ACCELERATED);
if (renderer == NULL) {
fprintf(stderr, "SDL_CreateRenderer error: %s\n", SDL_GetError());
SDL_DestroyWindow(window);
SDL_Quit();
return false;
}
texture = SDL_CreateTexture(renderer,
SDL_PIXELFORMAT_ARGB8888,
SDL_TEXTUREACCESS_STREAMING,
SCREEN_WIDTH, SCREEN_HEIGHT);
if (texture == NULL) {
fprintf(stderr, "SDL_CreateTexture error: %s\n", SDL_GetError());
SDL_DestroyRenderer(renderer);
SDL_DestroyWindow(window);
SDL_Quit();
return false;
}
return true;
}
void renderDisplay() {
SDL_UpdateTexture(texture, NULL, display, SCREEN_WIDTH * sizeof(uint8_t));
SDL_RenderClear(renderer);
SDL_RenderCopy(renderer, texture, NULL, NULL);
SDL_RenderPresent(renderer);
}
void cleanupSDL() {
SDL_DestroyTexture(texture);
SDL_DestroyRenderer(renderer);
SDL_DestroyWindow(window);
SDL_Quit();
}
int main(int argc, char* argv[]) {
if (!initializeSDL()) {
return EXIT_FAILURE;
}
memset(display, 0, sizeof(display));
display[10 + 20 * SCREEN_WIDTH] = 1;
renderDisplay();
SDL_Delay(5000);
cleanupSDL();
return EXIT_SUCCESS;
}
When compiled, using GCC (Rev3, Built by MSYS2 project) version 14.1.0, the executable generated had the size of 173,495 bytes. For this tutorial, the following line was used to compile the code:
> gcc -fdiagnostics-color=always -g code.c -o code.exe
-IC:/msys64/ucrt64/bin/../include
-IC:/msys64/ucrt64/bin/../include/SDL2
-Dmain=SDL_main
-LC:/msys64/ucrt64/bin/../lib
-lmingw32
-mwindows
-lSDL2main
-lSDL2
GCC⌗
The first optimization can be done during compiling time.
GCC has many optimization flags.
Two of those are the most interesting to me, -Os
and -flto
. According to GCC’s documentation:
-Os
: optimizes the executable for size.-flto
: runs the standard link-time optimizer. Usually, this is used together with the previous one.
Adding the previous flags, the new executable has a size of 158,068 bytes and the compilation command is:
# the code was simplified for reading purposes
> gcc ... -Os -flto code.c -o code.exe ...
Stripping .reloc
⌗
The next optimization is not related to the compiler used.
When executables are generated by a compiler they have the .reloc
session which is used for debugging purposes.
If you don’t plan to do any debugging with the executable using Ghidra or IDA, you can get rid of this session with the strip.exe
included in the binutils
of MSYS2.
Its usage is as simple as:
> strip code.exe
The .reloc
session will be removed from the very same code.exe
, therefore no new file is created.
The code.exe
size is drastically reduced to 23,040 bytes!
UPX⌗
Another file size optimization is performed by compressing the executable.
This new optimization is performed on the executable generated after the previous steps.
Tools such as the open-source UPX or the proprietary ASPack can be used for this.
There are other similar tools but here I used UPX only.
When an executable is compressed with UPX no new file is created.
The decompression mechanism is embedded into the same executable and it will decompress during runtime.
UPX can also be used to compress other files such as .dll
s.
To use UPX you do upx executable.exe
, however, I also used the flag --best
to force the best compression level.
So the command line is:
> upx --best code.exe
UPX produced a file of only 11,776 bytes, a compression of about 51%.
Summary⌗
The table below summarizes the file size after each step.
Optimization | Size (bytes) |
---|---|
None | 173,495 |
-Os & -flto |
158,068 |
strip |
23,040 |
UPX | 11,776 |
It is worth remembering that those are incremental optimizations focusing on the file size.
Thus, the UPX step has used the output of the strip
which has used the result of GCC’s compilation with -Os
and -flto
.
When using UPX there is an extra overhead added because of the decompression mechanism, but I didn’t consider it here.
H.