Expression templates and meta-programming in boost::Spirit

March 25, 2007 at 5:31 PMAmer Gerzic

Introduction

According to one of expression templates inventors, Todd Veldhuizen, expression templates are “new C++ technique for passing expressions as functions arguments” [1]. The purpose of expression template is to inline expressions during compilation, which produces faster and arguably more readable code. Most of expression template articles focus on numeric array classes and operations on those using expression template techniques. For instance, expression templates could be utilized for matrix multiplication without using temporaries. For such article and code examples, please refer to articles referenced under [1] and [2].

As probably suspected, this article will not focus on expression templates as applied on numeric array classes. In this article, the focus will be on parsing and language recognition using expression template techniques. Furthermore the article will focus on Spirit library [3], a parser framework implemented completely using expression templates and template meta-programming. At this point the reader is encouraged to refer to introductory part of Spirit documentation as found here http://www.boost.org/libs/spirit/doc/introduction.html.

In addition it is important to note that this article will not focus on all parts of Spirit framework. This article is rather attempt to explain fundamentals behind expression templates as applied in Spirit framework.

Introduction to Spirit

According to Spirit documentation [3], Spirit is an “object-oriented recursive-decent parser generator framework implemented completely using template meta-programming” [3]. Spirit supports full EBNF (Extended Backus Normal Form) embedded completely into C++ program. In other words, to recognize strings like “ab”, the parser could be constructed by simply using ‘a’ >> ‘b’, which means that character a follows character b. At this point, the reader is encouraged to refer Spirit documentation [3] for syntax and technique used in following example:

Following example is taken from Spirit documentation. The example presents parser that recognizes the language of arithmetic expressions and implements calculator.

/*=============================================================================
    Copyright (c) 2002-2003 Joel de Guzman
    http://spirit.sourceforge.net/ 

    Use, modification and distribution is subject to the Boost Software
    License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at
    http://www.boost.org/LICENSE_1_0.txt)
=============================================================================*/
////////////////////////////////////////////////////////////////////////////
//
//  Plain calculator example demostrating the grammar and semantic actions.
//  This is discussed in the "Grammar" and "Semantic Actions" chapters in
//  the Spirit User's Guide.
//
//  [ JDG 5/10/2002 ]
//
////////////////////////////////////////////////////////////////////////////
#include <boost/spirit/core.hpp>
#include <iostream>
#include <string> 

////////////////////////////////////////////////////////////////////////////
using namespace std;
using namespace boost::spirit; 

////////////////////////////////////////////////////////////////////////////
//
//  Semantic actions
//
////////////////////////////////////////////////////////////////////////////
namespace
{
    void    do_int(char const* str, char const* end)
    {
        string  s(str, end);
        cout << "PUSH(" << s << ')' << endl;
    } 

    void    do_add(char const*, char const*)    { cout << "ADD\n"; }
    void    do_subt(char const*, char const*)   { cout << "SUBTRACT\n"; }
    void    do_mult(char const*, char const*)   { cout << "MULTIPLY\n"; }
    void    do_div(char const*, char const*)    { cout << "DIVIDE\n"; }
    void    do_neg(char const*, char const*)    { cout << "NEGATE\n"; }
} 

////////////////////////////////////////////////////////////////////////////
//
//  Our calculator grammar
//
////////////////////////////////////////////////////////////////////////////
struct calculator : public grammar<calculator>
{
    template <typename ScannerT>
    struct definition
    {
        definition(calculator const& /*self*/)
        {
            expression
                =   term
                    >> *(   ('+' >> term)[&do_add]
                        |   ('-' >> term)[&do_subt]
                        )
                ; 

            term
                =   factor
                    >> *(   ('*' >> factor)[&do_mult]
                        |   ('/' >> factor)[&do_div]
                        )
                ; 

            factor
                =   lexeme_d[(+digit_p)[&do_int]]
                |   '(' >> expression >> ')'
                |   ('-' >> factor)[&do_neg]
                |   ('+' >> factor)
                ;
        } 

        rule<ScannerT> expression, term, factor; 

        rule<ScannerT> const&
        start() const { return expression; }
    };
}; 

////////////////////////////////////////////////////////////////////////////
//
//  Main program
//
////////////////////////////////////////////////////////////////////////////
int
main()
{
    cout << "/////////////////////////////////////////////////////////\n\n";
    cout << "\t\tExpression parser...\n\n";
    cout << "/////////////////////////////////////////////////////////\n\n";
    cout << "Type an expression...or [q or Q] to quit\n\n"; 

    calculator calc;    //  Our parser 

    string str;
    while (getline(cin, str))
    {
        if (str.empty() || str[0] == 'q' || str[0] == 'Q')
            break; 

        parse_info<> info = parse(str.c_str(), calc, space_p); 

        if (info.full)
        {
            cout << "-------------------------\n";
            cout << "Parsing succeeded\n";
            cout << "-------------------------\n";
        }
        else
        {
            cout << "-------------------------\n";
            cout << "Parsing failed\n";
            cout << "stopped at: \": " << info.stop << "\"\n";
            cout << "-------------------------\n";
        }
    } 

    cout << "Bye... :-) \n\n";
    return 0;
} 

Particularly interesting is calculator class, which represents calculator grammar written in EBNF. Focusing on grammar definition we see that the language defined is following (using more accurate EBNF notation):

    expression = term (('+' term) | ('-' term))*       
    term = factor (('*' factor) | ('/' factor))*       
    factor (digit_p+) | '(' expression ')' | ('-' factor) | ('+' factor)

If we compare this definition with Spirit’s definition in grammar class, we can see strong similarities. Differences are that * and + operators must precede the operand, and >> operator is used instead of sequence operator in EBNF, which in EBNF is denoted as single space character. As Spirit documentation states, the reason is that C++ operator overloading puts limits on how the operator is overloaded and what could be used as operator in the first place. However, the question that this article is trying to answer is not reasons behind Spirit design decisions, but rather how such design could be accomplished in C++ language.

Let's write our own Spirit

Well, let’s not! To do such thing, it would be very challenging (at least for me). However, to see how Spirit works, it is not necessary to write complete library, but rather focus on fundamental principles. At this point it is important to state that what follows is extreme case of problem simplification and does not reveal all issues surrounding Spirit design. To get the taste of complexity of Spirit framework, the reader is encouraged to explore Spirit source code. However, this article is intended to shed some light on such exploration.

But, let us get practical. What follows is an extremely small library that could be used to define grammars that recognize sequence of characters. In our example, we will implement only two operators, alternative operator denoted by | character and sequence operator denoted by >> characters. Also, we will demonstrate how Spirit combines static and dynamic polymorphism to implement complexities of type derivation during template instantiation.

In order to recognize sequence of characters, first we need to implement an object that is able to recognize single character. In order to preserve similarities with Spirit framework, the object is defined as a template class:

template<typename T = char>
struct chlit
{
    T m_t; 

    /* Default constructor */
    chlit() : m_t() {}; 

    /* Copy constructor */
    chlit(chlit<T>& t) { m_t = t.m_t; }; 

    chlit(T t) : m_t() { m_t = t; };
    
    bool match(string::iterator& iter, string& s)
    {    
        if(iter != s.end())
        {
            if(*iter == m_t)
            {
                ++iter;
                return true;
            }
        } 

        return false;
    };
}

As we can see, chlit<> template parameter defaults to a char, which is fundamental type to recognize. In addition the class contains default constructor, copy constructor, and parameterized constructor, which takes template parameter type object as parameter. If we consider default template parameter, parameterized constructor’s parameter is nothing else but a char. Using such constructor, we can construct chlit<> object from single character. In addition, there is a match function that takes reference to a string object and string iterator. Match function simply checks if the character contained in chlit<> object matches the character pointed by the iterator. In other words, the function checks if string end is reached. If string end is not reached then the function checks if character pointed by iterator is equal to the stored character. If match is detected, iterator is advanced and character is consumed. There is really nothing special about this class other than it is implemented using templates (really not necessary, but trying to keep it as similar to Spirit as possible).

Now that we defined an object capable to recognize single character, we need an object that will recognize sequence of characters. We can actually go further and design generic class that will recognize sequence of objects of any type, which again is accomplished by using template class:

template<typename Left = chlit<>, typename Right = chlit<> >
struct sequence
{
    Left m_l;
    Right m_r; 

    /* Default constructor */
    sequence() : m_l(), m_r() {}; 

    /* Copy constructor */
    sequence(sequence<Left,Right>& s) { m_l = s.m_l; m_r = s.m_r; }; 

    sequence(Left l, Right r) : m_l(l), m_r(r) {}; 

    bool match(string::iterator& iter, string& s)
    {
        if(m_l.match(iter, s))
        {
            if(m_r.match(iter, s))
            {
                return true;
            }
            --iter;
            return false;
        }
        return false;
    };
}

Close observation of sequence<> class reveals similarities with chlit<> class. There are default and copy constructors and parameterized constructor that takes two parameters of the types specified by template parameters. Default template parameters are chlit<> but other types will be utilized throughout the example. The interesting part of sequence<> class is its match function. Match function signature is identical to chlit<> match function. Looking at match function implementation in sequence<> class, it is obvious that the function simply checks if Left and Right members match in sequence (duh!). Successful match is achieved by calling match function of the Left parameter and upon successful match, calling the match function of the Right parameter. If both tests return true, the function itself returns true!

However, closer look at sequence match function reveals interesting implementation technique. Looking at the function body, it is obvious that sequence class relies that both member objects define a match function. Furthermore, the sequence class relies that those match functions have identical signature. It is obvious that during program execution appropriate match function calls are performed. However, similar effect could be achieved using virtual functions. Using polymorphism, both objects could be derived from base class that defines virtual match function. Subsequently, any derived class could override match function to implement its own version. During runtime, appropriate match function would be called. Both approaches essentially have same effect. However, there is a fundamental difference. Using virtual functions, call to appropriate function is determined during runtime. Call forwarding is achieved using, so called, vtable. Such technique is called as dynamic polymorphism. Using templates, virtual function overhead is eliminated and call forwarding is performed during compile time (hence the name static polymorphism). During compilation the compiler instantiates template class (not to be confused with class instantiation during runtime) and “inlines” function calls. Positive side effect is increase in execution performance.

In addition to sequence operation, we need a construct that recognizes alternative match between two objects. In other words, we need a way to check that current iterator position matches either of two characters. Once again we’ll use templates to make the construct as generic as possible. Following is alternative template class:

template<typename Left = chlit<>, typename Right = chlit<> >
struct alternative
{
    Left m_l;
    Right m_r; 

    /* Default constructor */
    alternative() : m_l(), m_r() {}; 

    /* Copy constructor */
    alternative(alternative<Left,Right>& s) { m_l = s.m_l; m_r = s.m_r; }; 

    alternative(Left l, Right r) : m_l(l), m_r(r) {}; 

    bool match(string::iterator& iter, string& s)
    {
        if(m_l.match(iter, s))
        {
            return true;
        } 

        if(m_r.match(iter, s))
        {
            return true;
        } 

        return false;
    };
}

Looking at the template class definition of alternative construct we can see that it is identical to sequence class. The only difference is the match function, which checks if either of member objects matches. As soon as any of member objects return successful match, successful match is returned.

So far we have all necessary ingredients to match a character, a sequence of characters, and alternative between two characters. At this point the missing link is the an interface between these three classes. In other words, how do we combine these three classes to achieve language definition?

Considering our requirements we identify the need for operators >> or |. In other words, construct like A >> B (A and B being chlit<>) should produce a sequence<> object that will recognize A followed by B. To achieve such feature we will utilize operator overloading:

template<typename Left, typename Right>
sequence<Left,Right> operator>>(Left l, Right r)
{ 
    return sequence<Left, Right>(l, r); 
}

In the case above, we are utilizing templates to implement operator overloading. Necessity to use templates comes from the fact that sequence<> is a template class and requires operand values and types to be passed. In addition, operator >> is completely generic with respect to type, which is desired feature.

Let us investigate further to see what happens behind scenes when compiler compiles such constructs. Let say we have following code:

chlit<> A('a');
chlit<> B('b');

A >> B;

What happens behind the scenes is the following:

[1] Compiler encounters construct chlit<> A(‘a’) and performs template class instantiation of chlit<> template to something similar to this:

struct chlit
{
    char m_t; 

    /* Default constructor */
    chlit() : m_t() {}; 

    /* Copy constructor */
    chlit(chlit& t) { m_t = t.m_t; }; 

    chlit(char t) : m_t() { m_t = t; };
    
    bool match(string::iterator& iter, string& s)
    {    
        if(iter != s.end())
        {
            if(*iter == m_t)
            {
                ++iter;
                return true;
            }
        } 

        return false;
    };
}

In this case, the compiler simply replaces template parameter within the template class, creating special type for char template parameter. Similarly, the construct is used for chlit<> B(‘b’). As a matter of fact the same class definition is reused.

[2] Compiler encounters A >> B and instantiates template function for >> operator as following:

sequence<chlit, chlit> operator>>(chlit l, chlit r)
{ 
    return sequence<chlit, chlit>(l, r); 
}

Chlit refers to instantiated template class as shown under (1). However, at this point, the template class sequence is being utilized and compiler is forced to perform template class instantiation of sequence class with chlit as template parameter, which produces following code:

struct sequence
{
    chlit m_l;
    chlit m_r; 

    /* Default constructor */
    sequence() : m_l(), m_r() {}; 

    /* Copy constructor */
    sequence(sequence & s) { m_l = s.m_l ; m_r = s.m_r; }; 

    sequence(chlit l, chlit r) : m_l(l), m_r(r) {}; 

    bool match(string::iterator& iter, string& s)
    {
        if(m_l.match(iter, s))
        {
            if(m_r.match(iter, s))
            {
                return true;
            }
            --iter;
            return false;
        }
        return false;
    };
}

The compiler creates new type specialized for types that are given as template parameters, which in turn produces brand new type. So for construct like A >> B, the new type is sequence<chlit<char>, chlit<char>> .

Considering observation from previous paragraph, we can generalize our conclusion by finding pattern code generation during compilation. Let’s look at following examples:

    Construct: A >> B
    Resulting Type: sequence<chlit<char>, chlit<char>>

    Construct: A >> B >> A
    Resulting Type: sequence<chlit<char>, sequence<chlit<char>, chlit<char>>>

    Construct: A >> B | A
    Resulting Type: alternative<chlit<char>, sequence<chlit<char>, chlit<char>>>

The pattern is obvious. Every construct is simply expanded by the compiler and new type is created. Using static polymorphism function calls are correctly matched, producing the desired result of matching sequence of two objects. In addition, such technique is called expression templates.

Type Issue

One of the strengths of expression templates is that templates are used to generate code during compile time to perform specific task. As already mentioned, the advantage is speed optimization because virtual function overhead is eliminated. However, the strength is also a weakness, because there is no simple technique to discover the type of the generated class during compilation. As we saw in previous section, each construct produced different type. If were to assign such construct to any other object, we have to specialize the assignment to produced type. Furthermore, if we want to pass the construct to a function, what do we specify as the function parameter?

In Spirit, we are capable of writing something like this: rule<> r = A >> B. In this way, we can define functions that take rule<> as a parameter and process it in any way desired. But, how is such feature accomplished in Spirit? According to Spirit documentation [3] “rules are actually implemented using a pointer to a runtime polymorphic abstract class that holds the dynamic type of the parser assigned to it.” Following paragraph is an attempt to mimic Spirit design.

In order to accomplish desired feature it is imperative to bring all possible types “under one roof”. Such goal could be achieve by creating a template class that will hold parser object and its types:

template<typename T>
struct parser
{
    T m_t;
    
    /* Default constructor */
    parser() : m_t() {}; 

    /* Copy constructor */
    parser(parser<T>& p) { m_t = p.m_t; }; 

    parser(T t) : m_t(t) {}; 

    bool match(string::iterator& iter, string& s)
    { return m_t.match(iter, s); };
}

Parser template class is very simple and it is used only to hold the parser type and the parser object itself. The match function simply calls the match function of the member object. Following is desired outcome:

    sequence<chlit<char>, chlit<char>> becomes              
    parser<sequence<chlit<char>, chlit<char>>> and
    parser<> object contains an object of type sequence<chlit<char>, chlit<char>>, 
    so that when m_t.match(…) is called, function call is forwarded to
    sequence<chlit<char>, chlit<char>>::match(…) 

But how is parser<> template going to help? If we want the rule to hold a pointer to a parser<> object, we need to know the parser<> type, which in turn depends on our construct. In other words, now, instead of having infinite number of combinations between sequence<>, alternative<>, and chlit<> templates, we have all these combinations with parser<> combined. The solution is dynamic polymorphism. Using dynamic polymorphism we will derive our parser<> template from a common type that is independent of any other type. The parser template class becomes following:

struct abstract_parser
{
    abstract_parser() {};
    virtual ~abstract_parser() {};
    virtual bool match(string::iterator& iter, string& s) = 0;
}; 

template<typename T>
struct parser : abstract_parser
{
    T m_t;
    
    /* Default constructor */
    parser() : m_t() {}; 

    /* Copy constructor */
    parser(parser<T>& p) { m_t = p.m_t; }; 

    parser(T t) : m_t(t) {}; 

    bool match(string::iterator& iter, string& s)
    { return m_t.match(iter, s); };
}

As we can see there is pure virtual class abstract_parser, from which we derive our parser “holder”. In this way we can use pointer to abstract_parser to point to any parser<…> object. And using dynamic polymorphism when we call abstract_parser::match(…) function, vtable redirects the function call to appropriate parser function call (parser<…>::match(…)).

Considering our observations we need to perform following modifications to operator overloading for trivial reasons:

template<typename Left, typename Right>
parser<sequence<Left,Right> > operator>>(Left l, Right r)
{ return parser<sequence<Left, Right> >(sequence<Left, Right>(l, r)); }; 

template<typename Left, typename Right>
parser<alternative<Left,Right> > operator|(Left l, Right r)
{ return parser<alternative<Left, Right> >(alternative<Left, Right>(l, r)); }; 

Close observation reveals one difference (from previous operator overloading – see above): We are embedding our result into parser<> template, so that every result is of type abstract_parser.

Finally, let’s consider the rule template class:

template<typename T = abstract_parser>
struct rule
{
    boost::shared_ptr<T> ptr; 

    /* Default constructor */
    rule() : ptr() {}; 

    /* Copy constructor */
    rule(rule<T>& r) { ptr = r.ptr; }; 

    /* Destructor */
    ~rule() {}; 

    /* Parameterized constructor */
    template<typename ParserT>
    rule(ParserT& p)
    { ptr.reset(new ParserT(p)); }; 

    /* Match function */
    bool match(string::iterator& iter, string& s)
    { return(ptr->match(iter, s)); };
}

Rule template class contains smart pointer to an object of type specified by the template parameter. In our case, the pointer points to an abstract_parser object. Another key observation is template-parameterized constructor. Such constructor is capable of taking any type, constructing an object on the heap, and setting the pointer to the address of the newly created object. In this case the constructor creates new parser object on the heap (and destroys any previous parser automatically by using smart pointer). The template parameter for this constructor is always parser<…>, which in turn is an abstract_parser, so that rule<> always contains the pointer to correct parser. Rule<>’s match function is simply calling match function of the parser that it points to (vtable and virtual functions make sure this happens)!

Simple enough? Well, it was not for me. Reading through Spirit code and documentation, the most difficult part was to understand the mixture of static and dynamic polymorphism and interaction of these two. Especially tricky and probably performance limiting is necessity to utilize dynamic polymorphism. However, this might change! According to Spirit [3] documentation, C++ community is considering suggestion of David Abrahams to introduce auto keyword that could be utilized as following:

auto r = A >> B | C

 

At this point the compiler would derive rhs type and declare r to be of the same type. Such addition to C++ language could make Spirit “more” static.

Conclusion

Expression templates and template meta-programming is powerful tool for designing highly generic libraries. One such library is Spirit, whose complexity is far more reaching than explained here. It is important to stress that this article is trying to simply describe expression templates in Spirit context. The reader is still encouraged to refer to Spirit documentation as well as the documentation listed under references section.

References

[1] Expression Templates, Todd Veldhuizen
[2] C++ Templates, The Complete Guide, David Vandevoorde, Nicolai M. Josuttis
[3] Spirit library homepage: http://spirit.sourceforge.net/

Download

Download Source Code - MiniSpirit.zip (12.43 kb)

Posted in: CPP

Tags: , , , , , , ,

Add comment

biuquote
Loading