Agner`s CPU blog

Software optimization resources | E-mail subscription to this blog | www.agner.org

Rethinking DLLs and shared objects
Author: Agner Date: 2016-05-20 05:07
Windows systems use dynamic link libraries (DLLs) and Unix-like systems (Linux, BSD, Mac OS) use shared objects (SOs). Both types have a number of disadvantages. I have an idea for replacing DLLs and SOs with something more efficient in the new CRISC architecture.

Let me first explain how the current systems work. A normal program goes through the steps of compiling, linking, and loading. The linker joins multiple program modules and library functions together and adjusts all addresses in the linked program. Any absolute addresses in the code are adjusted according to the placement of each module, and all relative addresses from one module to another are calculated. This process is called relocation. The loader may do the relocation once again in case the program is placed at a memory address different from the address that was assumed in the link process. A DLL or SO is linked and loaded in basically the same way as an executable program.

The main difference between a DLL and a SO is that shared objects allow "symbol interposition". It is possible to override a symbol in a SO by defining another symbol with the same name in the calling program. This feature is intended to mimic the behavior of a static link library, but symbol interposition is hardly ever used and it comes at a high cost. Every access to a function or a global variable needs to go via a procedure linkage table (PLT) or a global offset table (GOT). This applies even to internal references inside the SO if the symbol is globally visible. It would be easy to bypass this time-consuming mechanism if the linker and loader allowed it, but for unknown reasons, they don't.

The advantage of using a DLL or SO is that the loaded code can be shared between multiple running programs. This is rarely saving any memory, however, because the library may contain hundreds of functions while you are only using a few of them. It is not uncommon to load a DLL or SO of one megabyte and use only one kilobyte of it.

Another problem that makes DLLs and SOs less efficient than static libraries is that they are scattered around in each their memory block. Each DLL/SO will use at least two memory pages, one for code and one for data. The scattered memory access makes caching less efficient.

Now, my proposal for the CRISC architecture is to get completely rid of DLLs and SOs. Instead, we will have only one type of function libraries that can be used in three different ways:

  1. Static linking. This will work the same way as today.
  2. Load-time linking. The library is linked with the executable program by the loader when the program is loaded into memory.
  3. Run-time linking. The library is loaded by commands in a running program.

In all three cases, we are loading only those functions from the library that are actually needed.

Load-time linking will be easier with the CRISC system than with existing systems because the CODE and DATA sections are independent of each other in the CRISC system. The CODE (and CONST) sections are addressed relative to the instruction pointer, while the DATA section is addressed relative to a special pointer called the data section pointer (DATAP). CODE and DATA can be placed anywhere in memory independently of each other. If extra library functions need to be linked in at load time, then the CODE and DATA sections of the library functions are simply appended to the CODE and DATA sections of the main program, and any new cross references are resolved. The result will be very efficient because the code and data of the library functions are contiguous with the code and data of the main program, so that caching is improved. There are no intermediate import tables or PLTs to slow down the execution.

Run-time linking is used less often. It is needed only when the choice of library depends on user input to the running program, or when a library is loaded explicitly from a not-compiled script language. The loader can use several different methods when run-time linking is requested:

  1. The main program may have reserved extra memory space for the library functions. This information is stored in the header of the executable program file. The library function is accessed through a function pointer which is returned by the load_library function. Any DATA section in the library can be addressed through DATAP, using the normal relocation procedure.
  2. If there is no reserved space, or the reserved space is too small, then the loader must place the library function somewhere else in memory. If there is a vacant memory space within a distance of +/- 2 GB from the address in DATAP then the same method as above is used.
  3. If there is no vacant space within 2 GB of DATAP then the loader can insert a stub that changes DATAP to point to the DATA section of the library function. The function is called through this stub, which changes DATAP when called, and restores the old value of DATAP on return. If the function can throw an exception then the exception handler needs to restore DATAP as well.
  4. The library function can be compiled with a compiler option that tells it not to use DATAP. The function will load the absolute address of its DATA section into a general purpose register and access its data with this register as pointer.

If lazy loading of a program module is desired then use the same method as for run-time linking, or put the lazy module into a separate executable file.

Newer versions of Linux have a feature called Gnu indirect function which makes it possible to choose between different versions of a function at load time depending on, for example, the microprocessor version. This feature will not be copied in the CRISC system because it relies on a PLT. Instead, we can make a dispatcher system to be used with load-time linking. The library can contain a dispatch function which tells which version of a library function to load. The loader will first load the dispatch function (possibly using run-time linking into itself) and call it. The dispatch function returns the name of the chosen version of the desired function. The loader then unloads the dispatch function and links the chosen function into the main program. The dispatch function must have access to information about the hardware configuration, command line parameters, environment variables, and anything else that it might need to choose which version of the function to use.

System functions and device drivers are called by using an ID number rather than a function pointer. This ID number can be resolved at link time, load time or run time just like library functions.

The advantages of my proposal are:

  • There is only one type of function libraries. The same library can be used with any of the three methods: static linking, load-time linking, and run-time linking.
  • Only the part of the library that is actually needed is loaded.
  • The code and data of the library is contiguous with the code and data of the calling program in most cases. This makes memory management simpler, avoids memory fragmentation, and improves caching.
  • There are no intermediate import tables, procedure linkage tables or global offset tables to reduce the performance.
Any comments?
 
thread Proposal for an ideal extensible instruction set new - Agner - 2015-12-27
replythread Itanium new - Ethan - 2015-12-28
last reply Itanium new - Agner - 2015-12-28
replythread Proposal for an ideal extensible instruction set new - hagbardCeline - 2015-12-28
last replythread Proposal for an ideal extensible instruction set new - Agner - 2015-12-28
reply Proposal for an ideal extensible instruction set new - Adrian Bocaniciu - 2016-01-04
reply Proposal for an ideal extensible instruction set new - Adrian Bocaniciu - 2016-01-04
reply Proposal for an ideal extensible instruction set new - Adrian Bocaniciu - 2016-01-04
replythread Proposal for an ideal extensible instruction set new - Adrian Bocaniciu - 2016-01-04
reply Proposal for an ideal extensible instruction set new - Adrian Bocaniciu - 2016-01-05
replythread Proposal for an ideal extensible instruction set new - John D. McCalpin - 2016-01-05
last reply Proposal for an ideal extensible instruction set new - Adrian Bocaniciu - 2016-01-06
last reply Proposal for an ideal extensible instruction set new - Ook - 2016-01-05
last reply Proposal for an ideal extensible instruction set new - acppcoder - 2016-03-27
reply Proposal for an ideal extensible instruction set new - Jake Stine - 2016-01-11
replythread Proposal for an ideal extensible instruction set new - Agner - 2016-01-12
last replythread Proposal for an ideal extensible instruction set new - Jonathan Morton - 2016-02-02
last replythread Proposal for an ideal extensible instruction set new - Agner - 2016-02-03
last replythread Proposal for an ideal extensible instruction set new - Jonathan Morton - 2016-02-12
last replythread Proposal for an ideal extensible instruction set new - Hubert Lamontagne - 2016-02-18
last replythread Proposal for an ideal extensible instruction set new - Agner - 2016-02-21
last replythread Proposal for an ideal extensible instruction set new - Hubert Lamontagne - 2016-02-22
last replythread Proposal for an ideal extensible instruction set new - Agner - 2016-02-23
replythread Proposal for an ideal extensible instruction set new - Hubert Lamontagne - 2016-02-23
last replythread Proposal for an ideal extensible instruction set new - Agner - 2016-02-24
last replythread Proposal for an ideal extensible instruction set new - asdf - 2016-02-24
last reply Proposal for an ideal extensible instruction set new - Agner - 2016-02-24
last reply Proposal for an ideal extensible instruction set new - Agner - 2016-02-25
replythread limit instruction length to power of 2 new - A-11 - 2016-02-24
last replythread limit instruction length to power of 2 new - Agner - 2016-02-24
replythread Any techniques for more than 2 loads per cycle? new - Hubert Lamontagne - 2016-02-24
last reply Any techniques for more than 2 loads per cycle? new - Agner - 2016-02-25
last replythread limit instruction length to power of 2 new - A-11 - 2016-02-25
last reply limit instruction length to power of 2 new - Hubert Lamontagne - 2016-02-25
replythread More ideas new - Agner - 2016-03-04
replythread More ideas new - Hubert Lamontagne - 2016-03-07
last reply More ideas new - Agner - 2016-03-08
last reply More ideas new - Agner - 2016-03-09
replythread Proposal for an ideal extensible instruction set new - Joe Duarte - 2016-03-07
reply Proposal for an ideal extensible instruction set new - Agner - 2016-03-08
last replythread Proposal for an ideal extensible instruction set new - Hubert Lamontagne - 2016-03-08
last replythread Proposal for an ideal extensible instruction set new - Joe Duarte - 2016-03-09
last replythread Proposal for an ideal extensible instruction set new - Agner - 2016-03-10
last replythread Proposal for an ideal extensible instruction set new - Hubert Lamontagne - 2016-03-11
last replythread Proposal for an ideal extensible instruction set new - Agner - 2016-03-11
last replythread Proposal for an ideal extensible instruction set new - anon2718 - 2016-03-13
last reply Proposal for an ideal extensible instruction set new - Agner - 2016-03-14
replythread A design without a TLB new - Agner - 2016-03-11
replythread A design without a TLB new - Hubert Lamontagne - 2016-03-11
reply A design without a TLB new - Agner - 2016-03-11
last reply A design without a TLB new - Agner - 2016-03-12
reply A design without a TLB new - Bigos - 2016-03-13
last reply A design without a TLB new - Agner - 2016-03-28
replythread Proposal now published new - Agner - 2016-03-22
last replythread Proposal now published new - Hubert Lamontagne - 2016-03-23
last replythread Proposal now published new - Agner - 2016-03-24
last replythread Proposal now published new - Hubert Lamontagne - 2016-03-24
last replythread Proposal now published new - Agner - 2016-03-24
last replythread Proposal now published new - Hubert Lamontagne - 2016-03-24
last replythread Proposal now published new - Agner - 2016-03-25
last replythread Proposal now published new - Hubert Lamontagne - 2016-03-28
last replythread Proposal now published new - Agner - 2016-03-29
last replythread Proposal now published new - Hubert Lamontagne - 2016-03-30
last replythread Proposal now published new - Agner - 2016-03-30
last replythread Do we need instructions with two outputs? new - Agner - 2016-03-31
last replythread Do we need instructions with two outputs? new - Hubert Lamontagne - 2016-04-01
reply Do we need instructions with two outputs? new - Agner - 2016-04-01
replythread Do we need instructions with two outputs? new - Joe Duarte - 2016-04-02
last replythread Do we need instructions with two outputs? new - Agner - 2016-04-02
last reply Do we need instructions with two outputs? new - Joe Duarte - 2016-04-02
last replythread Do we need instructions with two outputs? new - Agner - 2016-04-02
last replythread Do we need instructions with two outputs? new - Hubert Lamontagne - 2016-04-02
last replythread Do we need instructions with two outputs? new - Agner - 2016-04-03
reply Do we need instructions with two outputs? new - Joe Duarte - 2016-04-03
last replythread Do we need instructions with two outputs? new - Hubert Lamontagne - 2016-04-03
last replythread Do we need instructions with two outputs? new - Agner - 2016-04-04
reply Do we need instructions with two outputs? new - Hubert Lamontagne - 2016-04-04
last replythread Do we need instructions with two outputs? new - Joe Duarte - 2016-04-06
last replythread Do we need instructions with two outputs? new - Hubert Lamontagne - 2016-04-07
last replythread Do we need instructions with two outputs? new - HarryDev - 2016-04-08
last reply Do we need instructions with two outputs? new - Hubert Lamontagne - 2016-04-09
replythread How about stack machine ISA? new - A-11 - 2016-04-10
last replythread treating stack ISA as CISC architecure new - A-11 - 2016-04-14
last replythread treating stack ISA as CISC architecure new - Agner - 2016-04-14
last replythread treating stack ISA as CISC architecure new - A-11 - 2016-04-17
replythread treating stack ISA as CISC architecure new - Hubert Lamontagne - 2016-04-17
last replythread stack ISA versus long vectors new - Agner - 2016-04-18
last replythread stack ISA versus long vectors new - Hubert Lamontagne - 2016-04-19
last reply stack ISA versus long vectors new - Agner - 2016-04-20
last reply treating stack ISA as CISC architecure new - A-11 - 2016-04-18
replythread Proposal for an ideal extensible instruction set new - zboson - 2016-04-11
last replythread Proposal for an ideal extensible instruction set new - Agner - 2016-04-11
last replythread Proposal for an ideal extensible instruction set new - Hubert Lamontagne - 2016-04-11
last replythread Proposal for an ideal extensible instruction set new - Agner - 2016-04-12
last reply Proposal for an ideal extensible instruction set new - Hubert Lamontagne - 2016-04-12
replythread Version 1.01 new - Agner - 2016-05-10
last replythread Version 1.01 new - Hubert Lamontagne - 2016-05-13
last replythread Version 1.01 new - Agner - 2016-05-14
last replythread Version 1.01 new - Harry - 2016-06-02
replythread Public repository new - Agner - 2016-06-02
reply Public repository new - Harry - 2016-06-02
last reply Public repository new - Harry - 2016-06-02
last reply Public repository new - Agner - 2016-06-09
replythread Rethinking DLLs and shared objects - Agner - 2016-05-20
replythread Rethinking DLLs and shared objects new - cv - 2016-05-20
last reply Rethinking DLLs and shared objects new - Agner - 2016-05-20
replythread Rethinking DLLs and shared objects new - Peter Cordes - 2016-05-30
last replythread Rethinking DLLs and shared objects new - Agner - 2016-05-30
last replythread Rethinking DLLs and shared objects new - Joe Duarte - 2016-06-17
last replythread Rethinking DLLs and shared objects new - Agner - 2016-06-18
last reply Rethinking DLLs and shared objects new - Bigos - 2016-06-18
last replythread Rethinking DLLs and shared objects new - Freddie Witherden - 2016-06-02
last replythread Rethinking DLLs and shared objects new - Agner - 2016-06-04
last replythread Rethinking DLLs and shared objects new - Freddie Witherden - 2016-06-04
last reply Rethinking DLLs and shared objects new - Agner - 2016-06-06
replythread Is it better to have two stacks? new - Agner - 2016-06-05
reply Is it better to have two stacks? new - Hubert Lamontagne - 2016-06-07
replythread Is it better to have two stacks? new - Eden Segal - 2016-06-13
last replythread Is it better to have two stacks? new - Agner - 2016-06-13
last replythread Is it better to have two stacks? new - Hubert Lamontagne - 2016-06-14
last replythread Is it better to have two stacks? new - Agner - 2016-06-14
last replythread Is it better to have two stacks? new - Hubert Lamontagne - 2016-06-15
last replythread Is it better to have two stacks? new - Agner - 2016-06-15
last replythread Is it better to have two stacks? new - Hubert Lamontagne - 2016-06-16
last replythread Is it better to have two stacks? new - Agner - 2016-06-16
last reply Is it better to have two stacks? new - Hubert Lamontagne - 2016-06-17
last reply Is it better to have two stacks? new - Freddie Witherden - 2016-06-22
last reply Now on Github new - Agner - 2016-06-26