Agner`s CPU blog

A design without a TLB

Author: Agner

Date: 2016-03-28 05:13

Ideas for preventing stack overflow:

In most cases, it is possible to calculate exactly how much stack space an application needs. The compiler knows how much stack space it has allocated in each function. We only have to make the compiler save this information. This can be accomplished in the following way. If a function A calls a function B then we want the compiler to save information about the difference between the value of the stack pointer when A is called and the stack pointer when B is called. These values can then be summed up for the whole chain of nested function calls. If function A can call both function B and function C then each branch of the call tree is analyzed and the value for the branch that uses most stack space is used. If function A is compiled separately into its own object file, then the information must be stored in the object file.

The amount of stack space that a function uses will depend on the maximum vector length if full vectors are saved on the stack. All values for required stack space are linear functions of the vector length: Stack_frame_size = Constant + Factor * Max_vector_length. Thus, there are two values to save for each function and branch: Constant and Factor. We need separate calculations for each thread and possibly also information about the number of threads.

The linker will add up all this information and store it in the header of the executable file. The maximum vector length is known when the program is loaded, so the loader can finish the calculations and allocate a stack of the calculated size before the program is loaded. This will prevent stack overflow and fragmentation of the stack memory. We may also store information about how many threads the program will create. Some programs will use as many threads as there are CPU cores, for optimal performance. It is not essential, though, to know how many threads will be created because each stack can be placed anywhere in memory, but it will make the memory map simpler if all thread stacks can be kept together

In theory, it is possible to avoid the need for virtual address translation if the following four conditions are met:

The required stack size can be predicted and sufficient stack space is allocated when a program is loaded and when additional threads are created.
Static variables are addressed relative to the data section pointer. Multiple running instances of the same program have different values in the data section pointer.
The heap manager can handle fragmented physical memory in case of heap overflow.
There is sufficient memory so that no application needs to be swapped to a hard disk.

Before we rely on this mechanism, we should discuss what can possibly go wrong. Things that can cause problems are:

Recursive functions can use unlimited stack space. We may require that the programmer specifies a maximum recursion level in a pragma.
Allocation of variable-size arrays on the stack using the alloca function in C. We may require that the programmer specifies a maximum size.
Run-time dynamic linking. Dynamic link libraries (DLLs) are usually linked at load time and the loader will be able to include these in the calculation of stack requirements. But a program can need to load and call a DLL at run-time if the choice of DLL depends on user input or if the DLL is called from a script. We may need to guess the required stack size, perhaps based on statistics.
Lazy loading. A large program may have certain code units that are rarely used and loaded only when needed. Lazy loading can be useful to save memory, but it may require virtual memory translation and it may cause memory fragmentation. A straightforward solution is to implement such code units as separate executable programs, but this can complicate the exchange of data between mother program and subunits.
Script interpreters. Some programming languages are implemented as scripts which are interpreted at run-time rather than compiled. We cannot calculate the required stack size in advance for interpreted scripts. Obviously, it will be more efficient to compile the script if a compiler is available. Self-modifying scripts cannot be compiled.
User-defined macros. Macros are similar to small scripts. Depending on the implementation, macros may use heap space or stack space or both, but usually the memory requirement is limited.
Many programs running. The memory can become fragmented when many programs of different sizes are loaded and unloaded randomly.

A possible alternative to calculating the stack space is to measure the actual stack use the first time a program is run, and then rely on statistics to predict the stack use in subsequent runs. The same method can be used for heap space. This method is simpler, but less reliable. The calculation of stack requirements based on the compiler is sure to cover all branches of a program, while a statistical method will only include branches that have actually been used.

We may implement a hardware register that measures the stack use. This stack_measurement register is updated every time the stack grows. We can reset this stack_measurement register when a program starts and read it when the program finishes. We don't need a hardware register to measure heap size. This information can be retrieved from the heap manager.

These proposals can eliminate or reduce memory fragmentation in many cases so that we only need a relatively small memory map which can be stored in the CPU chip (Each process will have its own memory map). However, we cannot completely eliminate memory fragmentation and the need for virtual memory translation because of the complications discussed above.

Reply To This Message

Previous Message

Next Message