Thursday, December 03, 2009

Threads and performance in scientific C++

I like the new C++ standard, with new, great features and tools for efficient programming. I cannot wait to try the new characteristics implemented in GCC.

My current programs make heavy use of threads; as many of you might know, threads tipically work with common variables and thread-local variables. The latter are only seen by the thread they are defined into, and they are implemented as a GNU extension to the ISO C++03 standard.

Thread-local variables in threads are a bit of a burden for the compiler, linker and ELF file format. Thread-local variables are ready for use with any number of threads, which means that they are not static.

I know that my simulations use two threads, because the running processors only have two cores. So for the sake of performance, I learned to use templates to implement static thread-local variables. Here is an example:

template<int iThread>
class ThreadLocal {
    protected:
    static int staticA;
};

template<int iThread>
int ThreadLocal::a;

template<int iThread>
class ClsA : public ThreadLocal<iThread> {
    private:
        typedef ThreadLocal<iThread> tLocal;
        int a;
};

template<int iThread>
class ClsB : public ThreadLocal<iThread> {
    private:
        typedef ThreadLocal<iThread> tLocal;
        int b;
        ClsA<iThread> embeddedA;
};

ClsA<0> A0;
ClsA<1> A1;

ClsB<0> B0;
ClsB<1> B1;

Each template instantiation has its own copy of the staticA variable. This certainly has given me good performance under GCC, but any comment on tests with different compilers is welcome.

No comments:

Post a Comment

Is your comment or question:

1. Constructive?
2. Interesting?
3. Short?

If not, please post it in your own blog.