Evaluating a Simple Abstract Syntax Tree

Programming languages and their implementation is a topic I’ve been interested in for a long time and I thought it would be worth trying to get a bit more hands on and play with some of the ideas in this space. Choosing a topic somewhat at random I’ve chosen to take a look at implementing an Abstract Syntax Tree (AST).

What is an Abstract Syntax Tree?

An Abstract Syntax Tree is a way to represent a program’s source code within an interpreter/compiler. Consider an expression like 1 + 2, it could be represented by the following tree.

graph TD
   + --- 1
   + --- 2

One of the nice things about ASTs is that they remove some of the ambiguity that can exist in plain text. Consider the expression 1 + 2 * 3, there are two ways to interpret it and depending on which you choose you will get a different result - either (1 + 2) * 3 = 9 or 1 + (2 * 3) = 7 (rules like BODMAS dictate that we should choose the second version but there’s still a choice). This choice gets encoded into the structure of tree itself

graph TD
   subgraph "1 + (2 * 3)"
   p2[+] --- u1[1]
   p2[+] --- m2[x]
   m2[*] --- u2[2]
   m2[*] --- u3[3]
   end
   subgraph "(1 + 2) * 3"
   m1[*] --- p1[+]
   m1[*] --- v3[3]
   p1[+] --- v1[1]
   p1[+] --- v2[2]
   end

Encoding a program in this way allows the interpreter/compiler to focus on the semantic meaning of the program without having to worry about the finer details of how that program happens to be written down.

Representing an AST

For no particular reason other I fancied trying it, I decided to use C to represent my AST where each node is represented by the following struct.

typedef struct _ast {
    /* The type of node this represents e.g.
       a literal, an operator etc. */
    AstNodeType type;

    /* In the case of a literal value, this field
       holds the actual value */
    float value;

    /* In the case of an operator, this pointer
       refers to the left child node */
    struct _ast *left;

    /* In the case of an operator, this pointer
       refers to the right child node */
    struct _ast *right;
} AstNode;

Although the typedef will let us refer to this struct using the name AstNode from here on out, this name does not exist within the body of the struct definition itself. In order to have the struct hold pointers to other instances of the same type it’s necessary to also name the struct definition _ast

In order to express the example trees above we only need to specify a few node types

typedef enum {
    AST_LITERAL,
    AST_PLUS,
    AST_MULTIPLY,
} AstNodeType;

With our AST node representation defined, it’s easy enough to construct a tree for the two examples we outlined earlier

AstNode one = { .type = AST_LITERAL, .value = 1.0f };
AstNode two = { .type = AST_LITERAL, .value = 2.0f };
AstNode three = { .type = AST_LITERAL, .value = 3.0f };

AstNode plus1 = { .type = AST_PLUS, .left = &one, .right = &two};
AstNode example1 = { .type = AST_MULTIPLY, .left = &plus1, .right = &three};

AstNode mul1 = { .type = AST_MULTIPLY, .left = &two, .right = &three};
AstNode example2 = { .type = AST_PLUS, .left = &one, .right = &mul1};

Evaluating the AST

To evaluate an instance of the AST we have defined we can take a simple approach

  • If the node we are evaluating is an AST_LITERAL then all we have to do is return the value stored in that node

    float
    ast_evaluate(AstNode *ast)
    {
        switch(ast->type) {
        case AST_LITERAL:
            return ast->value;
    
    
  • In the case of AST_PLUS, we recursively call ast_evaluate on both the left and right branches of the tree and then add the resulting values together

        case AST_PLUS: {
            float a = ast_evaluate(ast->left);
            float b = ast_evaluate(ast->right);
    
            return a + b;
        }
    
  • Simiarly for AST_MULTIPLY, but returing a * b in this case.

        case AST_MULTIPLY: {
            float a = ast_evaluate(ast->left);
            float b = ast_evaluate(ast->right);
    
            return a * b;
        }
        }
    }
    
    

Calling this function on each of our example trees and printing the result we see that we indeed compute the values we would expect in each case

Example 1: 9.00
Example 1: 7.00

And there you have it, a calculator that only knows how to add and multiply floats! 😃 If you want to see the entire source file you can find it here.