When coding in languages such as C or C++, or any other compiled language, I always aim to produce the smallest executable. The reason for that might be an old-school mind created in the floppy disk era where 1.44 mb was enough. Nowadays, it is easier to compile a code that produces executables pretty big even though the software doesn’t do much.

The goal of this “tutorial” is to show the steps that I came up with to produce the smallest executable. All the things shown here were done using Windows 11, MSYS2, VS Code and C language, but they can also be applied to many other situations.

The testing code uses SDL2 library to show a window that will be open for a few seconds only. Not that matters for this tutorial but, I’ll use it as a basis for other projects. Next is the boiler-plate code that I used:

#include <SDL2/SDL.h>
#include <stdbool.h>

#define SCREEN_WIDTH  64
#define SCREEN_HEIGHT 32
#define SCREEN_SCALE  10

uint8_t display[SCREEN_WIDTH * SCREEN_HEIGHT];

SDL_Window* window = NULL;
SDL_Renderer* renderer = NULL;
SDL_Texture* texture = NULL;

bool initializeSDL() {
    if (SDL_Init(SDL_INIT_VIDEO) != 0) {
        fprintf(stderr, "SDL_Init error: %s\n", SDL_GetError());
        return false;
    }

    window = SDL_CreateWindow("SDL Test",
                              SDL_WINDOWPOS_UNDEFINED,
                              SDL_WINDOWPOS_UNDEFINED,
                              SCREEN_WIDTH * SCREEN_SCALE,
                              SCREEN_HEIGHT * SCREEN_SCALE,
                              SDL_WINDOW_SHOWN);
    if (window == NULL) {
        fprintf(stderr, "SDL_CreateWindow error: %s\n", SDL_GetError());
        SDL_Quit();
        return false;
    }

    renderer = SDL_CreateRenderer(window, -1, SDL_RENDERER_ACCELERATED);
    if (renderer == NULL) {
        fprintf(stderr, "SDL_CreateRenderer error: %s\n", SDL_GetError());
        SDL_DestroyWindow(window);
        SDL_Quit();
        return false;
    }

    texture = SDL_CreateTexture(renderer,
                                SDL_PIXELFORMAT_ARGB8888,
                                SDL_TEXTUREACCESS_STREAMING,
                                SCREEN_WIDTH, SCREEN_HEIGHT);
    if (texture == NULL) {
        fprintf(stderr, "SDL_CreateTexture error: %s\n", SDL_GetError());
        SDL_DestroyRenderer(renderer);
        SDL_DestroyWindow(window);
        SDL_Quit();
        return false;
    }

    return true;
}

void renderDisplay() {
    SDL_UpdateTexture(texture, NULL, display, SCREEN_WIDTH * sizeof(uint8_t));

    SDL_RenderClear(renderer);
    SDL_RenderCopy(renderer, texture, NULL, NULL);
    SDL_RenderPresent(renderer);
}

void cleanupSDL() {
    SDL_DestroyTexture(texture);
    SDL_DestroyRenderer(renderer);
    SDL_DestroyWindow(window);
    SDL_Quit();
}

int main(int argc, char* argv[]) {
    if (!initializeSDL()) {
        return EXIT_FAILURE;
    }

    memset(display, 0, sizeof(display));

    display[10 + 20 * SCREEN_WIDTH] = 1;
    renderDisplay();

    SDL_Delay(5000);
    cleanupSDL();

    return EXIT_SUCCESS;
}

When compiled, using GCC (Rev3, Built by MSYS2 project) version 14.1.0, the executable generated had the size of 173,495 bytes. For this tutorial, the following line was used to compile the code:

> gcc -fdiagnostics-color=always -g code.c -o code.exe 
	  -IC:/msys64/ucrt64/bin/../include 
	  -IC:/msys64/ucrt64/bin/../include/SDL2 
	  -Dmain=SDL_main 
	  -LC:/msys64/ucrt64/bin/../lib 
	  -lmingw32 
	  -mwindows 
	  -lSDL2main 
	  -lSDL2

GCC

The first optimization can be done during compiling time. GCC has many optimization flags. Two of those are the most interesting to me, -Os and -flto. According to GCC’s documentation:

  • -Os: optimizes the executable for size.
  • -flto: runs the standard link-time optimizer. Usually, this is used together with the previous one.

Adding the previous flags, the new executable has a size of 158,068 bytes and the compilation command is:

# the code was simplified for reading purposes
> gcc ... -Os -flto code.c -o code.exe ...

Stripping .reloc

The next optimization is not related to the compiler used. When executables are generated by a compiler they have the .reloc session which is used for debugging purposes. If you don’t plan to do any debugging with the executable using Ghidra or IDA, you can get rid of this session with the strip.exe included in the binutils of MSYS2. Its usage is as simple as:

> strip code.exe

The .reloc session will be removed from the very same code.exe, therefore no new file is created. The code.exe size is drastically reduced to 23,040 bytes!

UPX

Another file size optimization is performed by compressing the executable. This new optimization is performed on the executable generated after the previous steps. Tools such as the open-source UPX or the proprietary ASPack can be used for this. There are other similar tools but here I used UPX only. When an executable is compressed with UPX no new file is created. The decompression mechanism is embedded into the same executable and it will decompress during runtime. UPX can also be used to compress other files such as .dlls. To use UPX you do upx executable.exe, however, I also used the flag --best to force the best compression level. So the command line is:

> upx --best code.exe

UPX produced a file of only 11,776 bytes, a compression of about 51%.

Summary

The table below summarizes the file size after each step.

Optimization Size (bytes)
None 173,495
-Os & -flto 158,068
strip 23,040
UPX 11,776

It is worth remembering that those are incremental optimizations focusing on the file size. Thus, the UPX step has used the output of the strip which has used the result of GCC’s compilation with -Os and -flto. When using UPX there is an extra overhead added because of the decompression mechanism, but I didn’t consider it here.

H.