Knowledge Base/Tools/GCC

From Thalesians
Jump to: navigation, search

Contents

Nomenclature

  • GNU is a computer operating system composed entirely of free software. Its name is a recursive acronym for GNU's Not Unix. The idea is that GNU is Unix-like but, unlike Unix, it is free and has no Unix code. The development of GNU was initiated by Richard Stallman and was the original focus of the Free Software Foundation (FSF). Because GNU's official kernel, GNU Hurd, is still incomplete, not all GNU components run on it. Instead, they run on third-party Linux kernels and some have been ported to other operating systems, such as Microsoft Windows, BSD Variants, Solaris, and Mac OS. Therefore GNU came to be known as a family of free software products, often ported to different operating system, rather than an operating system per se.
  • GCC stands for the GNU Compiler Collection, which includes front ends (see below) for C, C++, Objective-C, Fortran, Java, and Ada, as well as libraries for these languages (libstdc++, libgcj, etc.). However, GCC originally stood for GNU C Compiler and it (along with the GNU C++ compiler) remain the most frequently used GNU compilers. Many people still mean GNU C Compiler when they say GCC.
  • Front end is the part of a compiler that is specific to a particular language, such as C or C++.
  • Back end is the part of the compiler that is shared across several languages.
  • The GNU Compiler Collection's language-independent component (which is also, confusingly, sometimes referred to as GCC!) is shared among the compilers for all supported languages. The language-independent component of GCC includes the majority of the optimizers, as well as the back ends that generate machine code for various processors.
  • gcc, which is often /usr/bin/gcc, is the command that executes the GNU C Compiler.
  • g++, which is often /usr/bin/g++, is the command that executes the GNU C++ Compiler. We shall be using the g++ command a lot in the sequel.
    • In practice, g++ is simply a script that passes a certain set of command line arguments to gcc, so g++ uses gcc internally. It used to be a bash script in older versions of GCC. Now it's a binary executable, but it still does the same thing (as explained by andres9606t here).
  • cc is the Sun C Compiler, not part of the GNU Compiler Collection. It is part of the Sun Studio.
  • CC is the Sun C++ Compiler, not part of the GNU Compiler Collection. It is part of the Sun Studio.

Compiling "Hello World!"

Create the following text file, hello.cpp:

  1. #include <iostream>
  2.  
  3. using namespace std;
  4.  
  5. int main(int argc, char * argv[])
  6. {
  7. cout << "Hello World!" << endl;
  8. }

Then compile it to hello with the following command:

g++ hello.cpp -o hello

A binary executable, hello should appear in the same directory.

Compiling; linking libraries; file extensions; static linking versus dynamic linking

In order to understand static libraries, dynamic libraries and linking properly, we write a simple C++ program that uses a third-party library, log4cxx. The source file, proto-log4cxx.cpp, looks as follows:

  1. #include <log4cxx/logger.h>
  2. #include <log4cxx/basicconfigurator.h>
  3.  
  4. using namespace log4cxx;
  5.  
  6. LoggerPtr logger(Logger::getRootLogger());
  7.  
  8. int main(int argc, char * argv[])
  9. {
  10. BasicConfigurator::configure();
  11.  
  12. LOG4CXX_INFO(logger, "Hello World!");
  13. }

Compiling

Let's try and compile this to an executable, proto-log4cxx (without an extension, this is specified using -o below):

$ g++ proto-log4cxx.cpp -o proto-log4cxx

We get error messages like the ones below:

proto-log4cxx.cpp:1:28: log4cxx/logger.h: No such file or directory
proto-log4cxx.cpp:2:39: log4cxx/basicconfigurator.h: No such file or directory
proto-log4cxx.cpp:4: namespace `log4cxx' undeclared
proto-log4cxx.cpp:6: `Logger' was not declared in this scope
proto-log4cxx.cpp:6: syntax error before `::' token
proto-log4cxx.cpp: In function `int main(int, char**)':
proto-log4cxx.cpp:10: `BasicConfigurator' undeclared (first use this function)
proto-log4cxx.cpp:10: (Each undeclared identifier is reported only once for
   each function it appears in.)
proto-log4cxx.cpp:10: syntax error before `::' token
proto-log4cxx.cpp:12: `logger' undeclared (first use this function)
proto-log4cxx.cpp:12: `LOG4CXX_INFO' undeclared (first use this function)

Why? In addition to the library files per se we need access to the associated header files, which are included with

  1. #include <log4cxx/logger.h>
  2. #include <log4cxx/basicconfigurator.h>

We shall have to tell the g++ compiler where they are located. They happen to be located in the directory ~/dev/apache-log4cxx-0.10.0/src/main/include, so we try the following:

$ g++ proto-log4cxx.cpp -o proto-log4cxx -I ~/dev/apache-log4cxx-0.10.0/src/main/include

This has resolved the errors listed above but introduced some new errors:

/tmp/cc3QoMuy.o(.text+0x12): In function `main':
: undefined reference to `log4cxx::BasicConfigurator::configure()'
/tmp/cc3QoMuy.o(.text+0x28): In function `main':
: undefined reference to `log4cxx::Logger::isInfoEnabled() const'
/tmp/cc3QoMuy.o(.text+0x3f): In function `main':
: undefined reference to `log4cxx::helpers::MessageBuffer::MessageBuffer[in-charge]()'
/tmp/cc3QoMuy.o(.text+0x57): In function `main':
: undefined reference to `log4cxx::spi::LocationInfo::LocationInfo[in-charge](char const*, char const*, int)'
/tmp/cc3QoMuy.o(.text+0x6f): In function `main':
: undefined reference to `log4cxx::helpers::MessageBuffer::operator<<(char const*)'
/tmp/cc3QoMuy.o(.text+0x7c): In function `main':
: undefined reference to `log4cxx::helpers::MessageBuffer::str(log4cxx::helpers::CharMessageBuffer&)'
/tmp/cc3QoMuy.o(.text+0x8c): In function `main':
: undefined reference to `log4cxx::Level::getInfo()'
/tmp/cc3QoMuy.o(.text+0xa6): In function `main':
: undefined reference to `log4cxx::Logger::forcedLog(log4cxx::helpers::ObjectPtrT<log4cxx::Level> const&, std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, log4cxx::spi::LocationInfo const&) const'
/tmp/cc3QoMuy.o(.text+0xe8): In function `main':
: undefined reference to `log4cxx::helpers::MessageBuffer::~MessageBuffer [in-charge]()'
/tmp/cc3QoMuy.o(.text+0x105): In function `main':
: undefined reference to `log4cxx::helpers::MessageBuffer::~MessageBuffer [in-charge]()'
/tmp/cc3QoMuy.o(.text+0x135): In function `__static_initialization_and_destruction_0(int, int)':
: undefined reference to `log4cxx::Logger::getRootLogger()'
/tmp/cc3QoMuy.o(.gnu.linkonce.t._ZN7log4cxx7helpers10ObjectPtrTINS_5LevelEED1Ev+0x55): In function `log4cxx::helpers::ObjectPtrT<log4cxx::Level>::~ObjectPtrT [in-charge]()':
: undefined reference to `log4cxx::helpers::ObjectPtrBase::~ObjectPtrBase [not-in-charge]()'
/tmp/cc3QoMuy.o(.gnu.linkonce.t._ZN7log4cxx7helpers10ObjectPtrTINS_5LevelEED0Ev+0x55): In function `log4cxx::helpers::ObjectPtrT<log4cxx::Level>::~ObjectPtrT [in-charge deleting]()':
: undefined reference to `log4cxx::helpers::ObjectPtrBase::~ObjectPtrBase [not-in-charge]()'
/tmp/cc3QoMuy.o(.gnu.linkonce.d._ZTIN7log4cxx7helpers10ObjectPtrTINS_5LevelEEE+0x8): undefined reference to `typeinfo for log4cxx::helpers::ObjectPtrBase'
/tmp/cc3QoMuy.o(.gnu.linkonce.t._ZN7log4cxx7helpers10ObjectPtrTINS_6LoggerEED1Ev+0x55): In function `log4cxx::helpers::ObjectPtrT<log4cxx::Logger>::~ObjectPtrT [in-charge]()':
: undefined reference to `log4cxx::helpers::ObjectPtrBase::~ObjectPtrBase [not-in-charge]()'
/tmp/cc3QoMuy.o(.gnu.linkonce.t._ZN7log4cxx7helpers10ObjectPtrTINS_6LoggerEED0Ev+0x55): In function `log4cxx::helpers::ObjectPtrT<log4cxx::Logger>::~ObjectPtrT [in-charge deleting]()':
: undefined reference to `log4cxx::helpers::ObjectPtrBase::~ObjectPtrBase [not-in-charge]()'
/tmp/cc3QoMuy.o(.gnu.linkonce.d._ZTIN7log4cxx7helpers10ObjectPtrTINS_6LoggerEEE+0x8): undefined reference to `typeinfo for log4cxx::helpers::ObjectPtrBase'
collect2: ld returned 1 exit status

This is because by default GCC will attempt to link the program as well as compile it. In order to link it, it needs access to the library files used by the program so they can be linked in.

We could tell it to skip the linking. In this case it will produce an object file, proto-log4cxx.o, which can then be linked separately. It won't produce the executable. The option -c (compile) tells it to do just that:

$ g++ proto-log4cxx.cpp -c -o proto-log4cxx.o -I ~/dev/apache-log4cxx-0.10.0/src/main/include

But we do want to link this program and produce an executable. So in addition to telling the compiler about the header files, we need to make it link in the libraries that the code is using.

Linking

Now we need to decide, shall we use static linking or dynamic linking? Let's compare the two!

The required code from the libraries that are statically linked into our project will be physically included into the executable at compile time. This will create a bigger size executable (because of all the library code included in it), but this executable will be standalone.

The code from the libraries that are linked in dynamically will not be included into the executable, which means that the size of the executable will be smaller. However, whoever runs this executable will have to ensure that the so-called shared objects, *.so, are available at runtime (this is similar to DLLs on Windows systems).

In general, static linking is much slower than dynamic linking. This may be a factor in big projects which take a long time to build.

It is possible to use a mixture of static and dynamic linking.

Static linking

In order to use static linking, we need to specify the file to the linker:

$ g++ proto-log4cxx.cpp -o proto-log4cxx -I ~/dev/apache-log4cxx-0.10.0/src/main/include -L ~/dev/apache-log4cxx-0.10.0/src/main/cpp/.libs -L ~/dev/apr-util-1.3.9/.libs -L ~/dev/apr-util-1.3.9/xml/expat/lib/.libs -L ~/dev/apr-1.3.8/.libs -Wl,-Bstatic -llog4cxx -laprutil-1 -lexpat -lapr-1 -Wl,-Bdynamic -pthread

Let's go through this command line:

  • g++ — call g++.
  • proto-log4cxx.cpp — specify the source file. If our program were more complex, we could list several source files here separated by spaces.
  • -o proto-log4cxx — place the output in this file. If the compilation and linking succeed, an executable named proto-log4cxx will be created (no file extension).
  • -I ~/dev/apache-log4cxx-0.10.0/src/main/include — tell the preprocessor to look for header files here. These header files are ~/dev/apache-log4cxx-0.10.0/src/main/include/log4cxx/logger.h and ~/dev/apache-log4cxx-0.10.0/src/main/include/log4cxx/basicconfigurator.h. We can refer to them simply as log4cxx/logger.h and log4cxx/basicconfigurator.h because we specified this here.
  • -L ~/dev/apache-log4cxx-0.10.0/src/main/cpp/.libs — tell the linker to look for *.a libraries in ~/dev/apache-log4cxx-0.10.0/src/main/cpp/.libs. In fact liblog4cxx.a is located in this directory. We shall later tell the linker to link this library in using the -l option.
  • -L ~/dev/apr-util-1.3.9/.libs — tell the linker to look for *.a libraries in ~/dev/apr-util-1.3.9/.libs. In fact libaprutil-1.a is located in this directory. We shall later tell the linker to link this library in using the -l option.
  • -L ~/dev/apr-util-1.3.9/xml/expat/lib/.libs — tell the linker to look for *.a libraries in ~/dev/apr-util-1.3.9/xml/expat/lib/.libs. In fact libexpat.a is located in this directory. We shall later tell the linker to link this library in using the -l option.
  • -L ~/dev/apr-1.3.8/.libs — tell the linker to look for *.a libraries in ~/dev/apr-1.3.8/.libs. In fact libapr-1.a is located in this directory. We shall later tell the linker to link this library in using the -l option.
  • -Wl,-Bstatic-Wl, passes an option to the linker. -Bstatic tells the linker to link in the libraries that follow this option statically rather than dynamically.
  • -llog4cxx — tell the linker to link in the liblog4cxx.a library. Note that lib is assumed, so we specify -llog4cxx, not -lliblog4cxx (which wouldn't work!). The file extension is also assumed.
  • -laprutil-1 — tell the linker to link in the libaprutil-1.a library.
  • -lexpat — tell the linker to link in the libexpat.a library.
  • -lapr-1 — tell the linker to link in the libapr-1.a library.
  • -Wl,-Bdynamic-Wl, passes an option to the linker. -Bdynamic tells the linker to link in the libraries that follow this option dynamically rather than statically. (In fact we won't list any, but it seems right to revert to this default.)
  • -pthread — this option is similar to -lpthread but stronger. It does link in a library, the POSIX threads library for multithreading. However, it sets some flags for both the preprocessor and linker in addition to telling the linker to link this library in. The POSIX threads library is required by some of the code in the other libraries that we are linking in.

You may ask, does the order in which you specify the -L options matter? No. Because each of these simply tells the linker "also consider this directory when looking for lib*.a files (if using static linking; or *.so files if using dynamic linking)".

What about the order of the -l options? Here the answer is yes. Try rearranging

-llog4cxx -laprutil-1 -lexpat -lapr-1

to

-laprutil-1 -lexpat -lapr-1 -llog4cxx

in the following command line:

$ g++ proto-log4cxx.cpp -o proto-log4cxx -I ~/dev/apache-log4cxx-0.10.0/src/main/include -L ~/dev/apache-log4cxx-0.10.0/src/main/cpp/.libs -L ~/dev/apr-util-1.3.9/.libs -L ~/dev/apr-util-1.3.9/xml/expat/lib/.libs -L ~/dev/apr-1.3.8/.libs -Wl,-Bstatic -laprutil-1 -lexpat -lapr-1 -llog4cxx -Wl,-Bdynamic -pthread

You will get many error messages, like these ones:

/home/paul/dev/apache-log4cxx-0.10.0/src/main/cpp/socket.cpp:95: undefined reference to `apr_signal'
/home/paul/dev/apache-log4cxx-0.10.0/src/main/cpp/.libs/liblog4cxx.a(socket.o)(.text+0x10c1): In function `log4cxx::helpers::Socket::write(log4cxx::helpers::ByteBuffer&)':
../../../src/main/include/log4cxx/helpers/bytebuffer.h:49: undefined reference to `apr_socket_send'
/home/paul/dev/apache-log4cxx-0.10.0/src/main/cpp/.libs/liblog4cxx.a(socket.o)(.text+0x10cd): In function `log4cxx::helpers::Socket::write(log4cxx::helpers::ByteBuffer&)':
/home/paul/dev/apache-log4cxx-0.10.0/src/main/cpp/socket.cpp:97: undefined reference to `apr_signal'
/home/paul/dev/apache-log4cxx-0.10.0/src/main/cpp/.libs/liblog4cxx.a(socket.o)(.text+0x11d9): In function `log4cxx::helpers::Socket::close()':
/home/paul/dev/apache-log4cxx-0.10.0/src/main/cpp/socket.cpp:114: undefined reference to `apr_socket_close'
collect2: ld returned 1 exit status

This is because liblog4cxx.a depends on libaprutil-1.a and libapr-1.a, so it must appear before them. libaprutil-1.a depends on libexpat.a, so it must appear before it.

We have already mentioned that when we use static linking the library code is incorporated into the executable. The resulting executable, proto-log4cxx, is... 7,685,473 bytes!

We can check that our executable works:

$ proto-log4cxx
0 [0x550022a0] INFO root null - Hello World!

Dynamic linking

It is time to try dynamic linking. Instead of the statically linked libraries

  • ~/dev/apache-log4cxx-0.10.0/src/main/cpp/.libs/liblog4cxx.a (size: 15,190,064 bytes)
  • ~/dev/apr-util-1.3.9/.libs/libaprutil-1.a (size: 432,394 bytes)
  • ~/dev/apr-util-1.3.9/xml/expat/lib/.libs/libexpat.a (size: 247,628 bytes)
  • ~/dev/apr-1.3.8/.libs/libapr-1.a (size: 656,902 bytes)

we shall tell the linker to look for the so-called shared objects

  • ~/dev/apache-log4cxx-0.10.0/src/main/cpp/.libs/liblog4cxx.so (size: 8,308,351 bytes)
  • ~/dev/apr-util-1.3.9/.libs/libaprutil-1.so (size: 302,420 bytes)
  • ~/dev/apr-util-1.3.9/xml/expat/lib/.libs/libexpat.so (size: 222,041 bytes)
  • ~/dev/apr-1.3.8/.libs/libapr-1.so (size: 432,338 bytes)

The code from these shared objects will not be incorporated into the executable but they will have to be available at runtime — when we run our executable or when our end user runs it. Therefore these shared objects have to be distributed to the end user.

Here is the command that we used:

$ g++ proto-log4cxx.cpp -o proto-log4cxx -I ~/dev/apache-log4cxx-0.10.0/src/main/include -L ~/dev/apache-log4cxx-0.10.0/src/main/cpp/.libs -Wl,-Bdynamic -llog4cxx

This command line seems to be much shorter than the one we used for static linking. Let's go through it:

  • g++ — call g++.
  • proto-log4cxx.cpp — specify the source file. If our program were more complex, we could list several source files here separated by spaces.
  • -o proto-log4cxx — place the output in this file. If the compilation and linking succeed, an executable named proto-log4cxx will be created (no file extension).
  • -I ~/dev/apache-log4cxx-0.10.0/src/main/include — tell the preprocessor to look for header files here. These header files are ~/dev/apache-log4cxx-0.10.0/src/main/include/log4cxx/logger.h and ~/dev/apache-log4cxx-0.10.0/src/main/include/log4cxx/basicconfigurator.h. We can refer to them simply as log4cxx/logger.h and log4cxx/basicconfigurator.h because we specified this here.
  • -L ~/dev/apache-log4cxx-0.10.0/src/main/cpp/.libs — — tell the linker to look for *.so libraries (shared objects) in ~/dev/apache-log4cxx-0.10.0/src/main/cpp/.libs. In fact liblog4cxx.so is located in this directory. We shall later tell the linker to link this library in using the -l option.
  • -Wl,-Bdynamic-Wl, passes an option to the linker. -Bdynamic tells the linker to link in the libraries that follow this option dynamically rather than statically.
  • -llog4cxx — tell the linker to link to the liblog4cxx.so library. Note that lib is assumed, so we specify -llog4cxx, not -lliblog4cxx (which wouldn't work!). The file extension is also assumed.

The command line is so much shorter because we don't need to bother about libaprutil-1, libexpat, and libapr-1 any longer. Why? Because our executable depends on liblog4cxx directly; it depends on libaprutil-1, libexpat, and libapr-1 indirectly (because liblog4cxx depends on them). When we were using static linking we had to include the code from the three indirect dependencies into our executable, so we had to tell the linker about them. Now we are using dynamic linking and all we need to know about is liblog4cxx, so we know how to call its functions. Its code will not be incorporated into the executable. The dependencies of liblog4cxx are in turn referenced from liblog4cxx.so but not from our executable. This will become clearer shortly.

This time the linking is almost instantaneous. The size of the resulting executable is only 11,527 bytes.

Locating *.so files at runtime

Let's try and run the executable that we have built.

$ proto-log4cxx

We get an error message:

proto-log4cxx: error while loading shared libraries: liblog4cxx.so.10: cannot open shared object file: No such file or directory

We have already mentioned that the dynamically linked shared object files must be available at runtime. Looks like the system (or, more specifically, the so-called runtime linker, ld.so, which is responsible for linking in the *.so shared objects) does not know where to find liblog4cxx.so.10.

Note that it is looking for liblog4cxx.so.10 rather than liblog4cxx.so this is because liblog4cxx.so is really a symbolic link to liblog4cxx.so.10 (a technicality that we previously omitted to mention). This is common for shared object files.

We can examine the dynamically linked dependencies of proto-log4cxx using ldd, a utility that prints shared library dependencies:

$ ldd proto-log4cxx
 liblog4cxx.so.10 => not found
 libstdc++.so.5 => /usr/lib/libstdc++.so.5 (0x00dee000)
 libm.so.6 => /lib/tls/libm.so.6 (0x0021a000)
 libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x0045a000)
 libc.so.6 => /lib/tls/libc.so.6 (0x0061c000)
 /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x001b9000)
undefined symbol: _ZTIN7log4cxx7helpers13ObjectPtrBaseE (./proto-log4cxx)

It depends on a number of standard libraries, like libstdc++.so.5, which has been located as /usr/lib/libstdc++.so.5, and liblog4cxx.so.10 which could not be found.

How can we fix this?

It turns out that there is more than one way. We could use an environment variable, LD_LIBRARY_PATH, which is very similar to PATH, only it specifies the paths to shared object files rather than executables. Let's add /home/paul/dev/apache-log4cxx-0.10.0/src/main/cpp/.libs to LD_LIBRARY_PATH:

$ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/paul/dev/apache-log4cxx-0.10.0/src/main/cpp/.libs

Run the program again:

$ proto-log4cxx

It works:

0 [0x550002a0] INFO root null - Hello World!
What about the other library dependencies — those of
liblog4cxx.so.10
— how does it know where to find them? Let's run ldd again:
$ ldd proto-log4cxx
 liblog4cxx.so.10 => /home/paul/dev/apache-log4cxx-0.10.0/src/main/cpp/.libs/liblog4cxx.so.10 (0x00111000)
 libstdc++.so.5 => /usr/lib/libstdc++.so.5 (0x00ec6000)
 libm.so.6 => /lib/tls/libm.so.6 (0x00545000)
 libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x009bb000)
 libc.so.6 => /lib/tls/libc.so.6 (0x00d6a000)
 libaprutil-1.so.0 => /home/paul/dev/apr-util-1.3.9/.libs/libaprutil-1.so.0 (0x00724000)
 libexpat.so.0 => /home/paul/dev/apr-util-1.3.9/xml/expat/lib/.libs/libexpat.so.0 (0x00fdd000)
 libapr-1.so.0 => /home/paul/dev/apr-1.3.8/.libs/libapr-1.so.0 (0x002bc000)
 libpthread.so.0 => /lib/tls/libpthread.so.0 (0x009fc000)
 librt.so.1 => /lib/tls/librt.so.1 (0x00cf8000)
 libcrypt.so.1 => /lib/libcrypt.so.1 (0x00a18000)
 libdl.so.2 => /lib/libdl.so.2 (0x002dd000)
 /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x00884000)

Notice that now that we can find liblog4cxx.so.10 the list of dependencies includes the dependencies of liblog4cxx.so.10 and

 libaprutil-1.so.0 => /home/paul/dev/apr-util-1.3.9/.libs/libaprutil-1.so.0 (0x00724000)
 libexpat.so.0 => /home/paul/dev/apr-util-1.3.9/xml/expat/lib/.libs/libexpat.so.0 (0x00fdd000)
 libapr-1.so.0 => /home/paul/dev/apr-1.3.8/.libs/libapr-1.so.0 (0x002bc000)
 libpthread.so.0 => /lib/tls/libpthread.so.0 (0x009fc000)
 librt.so.1 => /lib/tls/librt.so.1 (0x00cf8000)
 libcrypt.so.1 => /lib/libcrypt.so.1 (0x00a18000)
 libdl.so.2 => /lib/libdl.so.2 (0x002dd000)

have appeared. We can verify that these are indeed the dependencies of liblog4cxx.so.10 (and as such the indirect dependencies of our executable proto-log4cxx):

$ ldd /home/paul/dev/apache-log4cxx-0.10.0/src/main/cpp/.libs/liblog4cxx.so.10
 libaprutil-1.so.0 => /home/paul/dev/apr-util-1.3.9/.libs/libaprutil-1.so.0 (0x00963000)
 libexpat.so.0 => /home/paul/dev/apr-util-1.3.9/xml/expat/lib/.libs/libexpat.so.0 (0x009d1000)
 libapr-1.so.0 => /home/paul/dev/apr-1.3.8/.libs/libapr-1.so.0 (0x008da000)
 libpthread.so.0 => /lib/tls/libpthread.so.0 (0x00a5b000)
 librt.so.1 => /lib/tls/librt.so.1 (0x00d69000)
 libcrypt.so.1 => /lib/libcrypt.so.1 (0x006a5000)
 libdl.so.2 => /lib/libdl.so.2 (0x00ed9000)
 libc.so.6 => /lib/tls/libc.so.6 (0x00111000)
 /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x00464000)

But the question remains, how does it "know" where to look for libaprutil-1.so.0 and others? /home/paul/dev/apr-util-1.3.9/.libs does not appear on our LD_LIBRARY_PATH...

We have seen ldd. There is another useful utility, readelf. Let's apply it to liblog4cxx.so.10:

$ readelf -d /home/paul/dev/apache-log4cxx-0.10.0/src/main/cpp/.libs/liblog4cxx.so.10

-d tells readelf to examine the dynamic segment, the one we are interested in. The result:


Dynamic segment at offset 0x1a75b8 contains 29 entries:
  Tag        Type                         Name/Value
 0x00000001 (NEEDED)                     Shared library: [libaprutil-1.so.0]
 0x00000001 (NEEDED)                     Shared library: [libexpat.so.0]
 0x00000001 (NEEDED)                     Shared library: [libapr-1.so.0]
 0x00000001 (NEEDED)                     Shared library: [libpthread.so.0]
 0x00000001 (NEEDED)                     Shared library: [librt.so.1]
 0x00000001 (NEEDED)                     Shared library: [libcrypt.so.1]
 0x00000001 (NEEDED)                     Shared library: [libdl.so.2]
 0x00000001 (NEEDED)                     Shared library: [libc.so.6]
 0x0000000e (SONAME)                     Library soname: [liblog4cxx.so.10]
 0x0000000f (RPATH)                      Library rpath: [/home/paul/dev/apr-util-1.3.9/.libs:/home/paul/dev/apr-util-1.3.9/xml/expat/lib/.libs:/home/paul/dev/apr-1.3.8/.libs:/usr/local/apr/lib]
 0x0000000c (INIT)                       0x94b64
 0x0000000d (FINI)                       0x14e408
 0x00000004 (HASH)                       0xd4
 0x00000005 (STRTAB)                     0x22674
 0x00000006 (SYMTAB)                     0xa204
 0x0000000a (STRSZ)                      352872 (bytes)
 0x0000000b (SYMENT)                     16 (bytes)
 0x00000003 (PLTGOT)                     0x1a7864
 0x00000002 (PLTRELSZ)                   11440 (bytes)
 0x00000014 (PLTREL)                     REL
 0x00000017 (JMPREL)                     0x91eb4
 0x00000011 (REL)                        0x7b9dc
 0x00000012 (RELSZ)                      91352 (bytes)
 0x00000013 (RELENT)                     8 (bytes)
 0x6ffffffe (VERNEED)                    0x7b96c
 0x6fffffff (VERNEEDNUM)                 2
 0x6ffffff0 (VERSYM)                     0x788dc
 0x6ffffffa (RELCOUNT)                   109
 0x00000000 (NULL)                       0x0

Notice the (RPATH) line:

 0x0000000f (RPATH)                      Library rpath: [/home/paul/dev/apr-util-1.3.9/.libs:/home/paul/dev/apr-util-1.3.9/xml/expat/lib/.libs:/home/paul/dev/apr-1.3.8/.libs:/usr/local/apr/lib]

The answer to our question is now clear. The path to the shared object files in question has been incorporated into the shared object file liblog4cxx.so.10 by the linker; it appears on the RPATH in the so-called dynamic segment of the file.

RPATH

Let us perform an experiment. Let us restore LD_LIBRARY_PATH (remove /home/paul/dev/apache-log4cxx-0.10.0/src/main/cpp/.libs that we have added), or even make it blank:

$ export LD_LIBRARY_PATH=

Our executable won't run now:

$ proto-log4cxx
proto-log4cxx: error while loading shared libraries: liblog4cxx.so.10: cannot open shared object file: No such file or directory

Let's copy liblog4cxx.so.10 to the directory that contains the executable (which happens to be our current directory):

$ cp /home/paul/dev/apache-log4cxx-0.10.0/src/main/cpp/.libs/liblog4cxx.so.10 .

Since we don't have "." on our LD_LIBRARY_PATH the executable still won't run:

$ proto-log4cxx
proto-log4cxx: error while loading shared libraries: liblog4cxx.so.10: cannot open shared object file: No such file or directory

And now let's rebuild the executable and incorporate an appropriate RPATH:

$ g++ proto-log4cxx.cpp -o proto-log4cxx -I ~/dev/apache-log4cxx-0.10.0/src/main/include -L ~/dev/apache-log4cxx-0.10.0/src/main/cpp/.libs -Wl,-Bdynamic -llog4cxx -Xlinker -z -Xlinker origin -Xlinker -rpath -Xlinker '$ORIGIN:$ORIGIN/lib/third-party'

Here is what we have added to the command line:

  • -Xlinker -z -Xlinker origin-Xlinker is similar to -Wl in that it passes an option to the linker. It needs to be added twice, once for the option and once for the argument, as in this case. So we are really passing -z origin to the linker. This tells the linker to mark the executable as requiring the immediate $ORIGIN processing at runtime. Thus we have enabled the use of $ORIGIN which we shall use right now...
  • -Xlinker -rpath -Xlinker '$ORIGIN:$ORIGIN/lib/third-party' — passes the option -rpath '$ORIGIN:$ORIGIN/lib/third-party' to the linker. This sets the RPATH value to $ORIGIN:$ORIGIN/lib/third-party. At runtime, $ORIGIN is replaced with the path to the directory that contains the executable.

Thus we have set the RPATH to search for shared object files in the executable's directory as well as in lib/third-party under the executable's directory. (We are not going to use lib/third-party; we have added it for the purposes of illustration.)

Let's check that the resulting executable does indeed have its RPATH properly set:

$ readelf -d proto-log4cxx

Dynamic segment at offset 0x1688 contains 26 entries:
  Tag        Type                         Name/Value
 0x00000001 (NEEDED)                     Shared library: [liblog4cxx.so.10]
 0x00000001 (NEEDED)                     Shared library: [libstdc++.so.5]
 0x00000001 (NEEDED)                     Shared library: [libm.so.6]
 0x00000001 (NEEDED)                     Shared library: [libgcc_s.so.1]
 0x00000001 (NEEDED)                     Shared library: [libc.so.6]
 0x0000000f (RPATH)                      Library rpath: [$ORIGIN:$ORIGIN/lib/third-party]
 0x0000000c (INIT)                       0x8048cb0
 0x0000000d (FINI)                       0x804935c
 0x00000004 (HASH)                       0x8048168
 0x00000005 (STRTAB)                     0x8048538
 0x00000006 (SYMTAB)                     0x80482a8
 0x0000000a (STRSZ)                      1534 (bytes)
 0x0000000b (SYMENT)                     16 (bytes)
 0x00000015 (DEBUG)                      0x0
 0x00000003 (PLTGOT)                     0x804a798
 0x00000002 (PLTRELSZ)                   128 (bytes)
 0x00000014 (PLTREL)                     REL
 0x00000017 (JMPREL)                     0x8048c30
 0x00000011 (REL)                        0x8048c08
 0x00000012 (RELSZ)                      40 (bytes)
 0x00000013 (RELENT)                     8 (bytes)
 0x6ffffffb (FLAGS_1)                    Flags: ORIGIN
 0x6ffffffe (VERNEED)                    0x8048b88
 0x6fffffff (VERNEEDNUM)                 3
 0x6ffffff0 (VERSYM)                     0x8048b36
 0x00000000 (NULL)                       0x0

Indeed it is:

 0x0000000f (RPATH)                      Library rpath: [$ORIGIN:$ORIGIN/lib/third-party]

And now we can successfully run our executable:

$ proto-log4cxx
0 [0x55009100] INFO root null - Hello World!

ldconfig

Finally, there is yet another way to make your shared objects found at runtime. The runtime linker, ld.so which is invoked when you launch the executable, examines the links and cache created by ldconfig of the most recent shared libraries found in the directories specified in the file /etc/ld.so.conf and in the trusted directories /usr/lib and /lib.

Thus you could move the shared object to /usr/lib and run ldconfig to make sure that it gets picked up, then run the executable.

Alternatively, you could add the directory that contains the shared object to /etc/ld.so.conf and then run ldconfig it should still be picked up.

It is instructive to run

$ ldconfig -v

which will print the current version number, the name of each directory as the bindings are scanned and any links that are created.

The command

echo "/home/paul/dev/apache-log4cxx-0.10.0/src/main/cpp/.libs" >> /etc/ld.so.conf

quickly appends the directory path to /etc/ld.so.conf.

Whichever method you decide to use to specify the location of your shared objects at runtime depends on the configuration of your application, various administrative and infrastructure configurations, etc.

We (subjectively) find the RPATH approach the most intuitive.

GCC files

So far we have encountered several important file extensions: *.a and *.so. Let us review these and other file extensions on Linux one should be aware of:

  • *.o: an object file. According to Wikipedia, "an object file is an organised collection of named objects, and typically these objects are sequences of computer instructions in a machine code format, which may be directly executed by a compiter's CPU. Object files are typically produced by a compiler as a result of processing a source code file. Object files contain compact code, and are often called binaries. A linker is typically used to generate an executable or library by amalgamating parts of object files together".
  • *.a: statically linked library. In practice, these are merely archive files (created using the ar command) of object files (*.o). Thus linking them in statically is similar to linking in the object files contained in them individually.
  • *.la: GNU libtool library file. GNU libtool is a generic library support script. It aims to hide the complexity of using shared libraries behind a consistent interface. The *.la files contain the information required for libtool to ease the linking process during the compilation; they contain library names, location and dependent libraries during linking. We do not find these files particularly useful and won't discuss them further.
  • *.so: the so-called shared objects, or shared object files. These are meant to be linked in dynamically at runtime using the dynamic linker/loader, ld.so. Unlike the regular *.o files these contain the dynamic segment with some information used by the dynamic linker/loader.

The executable files (usually without extensions), as well as *.o and *.so (sometimes the extension *.elf is used, although it is rarely seen nowadays) share the same file format: the Executable and Linking Format (ELF), formerly called Extensible Linking Format. These files can be examined using the readelf utility:

$ readelf -d proto-log4cxx

shows something like the following for the executable proto-log4cxx

Dynamic segment at offset 0x1668 contains 24 entries:
  Tag        Type                         Name/Value
 0x00000001 (NEEDED)                     Shared library: [liblog4cxx.so.10]
 0x00000001 (NEEDED)                     Shared library: [libstdc++.so.5]
 0x00000001 (NEEDED)                     Shared library: [libm.so.6]
 0x00000001 (NEEDED)                     Shared library: [libgcc_s.so.1]
 0x00000001 (NEEDED)                     Shared library: [libc.so.6]
 0x0000000c (INIT)                       0x8048c90
 0x0000000d (FINI)                       0x804933c
 0x00000004 (HASH)                       0x8048168
 0x00000005 (STRTAB)                     0x8048538
 0x00000006 (SYMTAB)                     0x80482a8
 0x0000000a (STRSZ)                      1502 (bytes)
 0x0000000b (SYMENT)                     16 (bytes)
 0x00000015 (DEBUG)                      0x0
 0x00000003 (PLTGOT)                     0x804a768
 0x00000002 (PLTRELSZ)                   128 (bytes)
 0x00000014 (PLTREL)                     REL
 0x00000017 (JMPREL)                     0x8048c10
 0x00000011 (REL)                        0x8048be8
 0x00000012 (RELSZ)                      40 (bytes)
 0x00000013 (RELENT)                     8 (bytes)
 0x6ffffffe (VERNEED)                    0x8048b68
 0x6fffffff (VERNEEDNUM)                 3
 0x6ffffff0 (VERSYM)                     0x8048b16
 0x00000000 (NULL)                       0x0

-d tells readelf to display the dynamic section. Other sections can also be examined with readelf.

Personal tools