Chapter 6
Hygiene System

Section 2.6 and 2.7 of the tutorial give two simple ways to bend hygiene. Doing anything more advanced requires knowledge of how ZL’s hygiene system works. This chapter explains ZL hygiene system (Section 6.1) and how to bend it (Section 6.2).

In some ways ZL’s hygiene system is similar to the syntax-case system [6]. However, the data structures are different. A mark holds a lexical environment, and marks are applied during replace rather than to the input and result of a macro transformer. Special lookup rules search mark environments in lieu of maintaining a list of substitutions.

6.1 Implementation

During parsing, ZL maintains an environment that maps from one type of symbol to another. Symbols in the environment’s domain correspond to symbols in syntax objects, while each symbol in the environment’s range is generated to represent a particular binding. Symbols in syntax objects (and hence the environment domain) have a set of marks associated with them. The set of marks are considered part of the symbol’s identity. A mark is created with the new_mark primitive and applied to symbols during the replacement process (via replace). During this process, each symbol is either replaced, if it is a macro parameter, or marked. A mark also has an environment associated with it, which is the global environment at the site of the new_mark call.

When looking up a binding, the current environment is first checked. If a symbol with the same set of marks is not found in the current environment, then the outermost mark is stripped and the symbol is looked up in the environment associated with the stripped mark. This process continues until no more marks are left.

6.1.1 An Illustrative Example

To better understand this process, consider the code in Figure 6.1.


  float r = 1.61803399;
  
  Syntax * make_golden(Syntax * syn, Environ * env) {
    Mark * mark = new_mark();
    Match * m = match_f(0, syntax (A,B,ADJ,FIX), syn);
    UnmarkedSyntax * r = syntax {
      for (;;) { float a = A, b = B;
                 float ADJ = (a - r*b)/(1 + r);
                 if (fabs(ADJ/(a+b)) > 0.01) FIX;
                 else break; }
    };
    return replace(r, m, mark);
  }
  make_macro make_golden;
  
  int main() {
    float q = 3, r = 2;
    make_golden(q, r, a, {q -= a; r += a;});
  }

Figure 6.1: Example code to illustrate how hygiene is maintained. The make_golden macro will test if A and B are within 1% of the golden ratio. If not, it will execute the code in FIX to try to fix the ratio (where the required adjustment will be stored in ADJ) and then try again until the golden ratio condition is satisfied.


When the first binding form “float r = ...” is parsed, r is bound to the unique symbol $r0, and the mapping r => $r0 is added to the current environment. When the function make_golden is parsed, it is added to the environment. When the new_mark() primitive is parsed inside the body of the function, the current global environment is remembered. The new_mark() primitive does not capture local variables, since it makes little sense to use them in the result of the macro. Next, “make_macro make_golden” is parsed, which makes the function make_golden into a macro.

Now the body of main is parsed. A new local environment is created. When “float q = 3, r = 2” is parsed, two unique symbols $q0 and $r1 are created and corresponding mappings are added to the local environment. At this point, we have:

  float $r0 = 1.61803399;
  [make_golden => ..., r => $r0]
  int main () {
    float $q0 = 3, $r1 = 2;
    [r => $r1, q => $q0, make_golden => ..., r => $r0]
    make_golden(q, r, a, {q -= a; r += a;});
  }

The expanded output is represented in this section as pseudo-syntax that is like the input language of ZL with some additional annotations. Variables starting with $ represent bound symbols. The [...] list represents the current environment in which new binding forms are added to the front of the list.

Now, make_golden is expanded and, in the body of main, we have:

    ...
    [r => $r1, q => $q0, make_golden => ..., r => $r0]
    for (;;) { float a’0 = q, b’0 = r;
               float a = (a’0 - r’0*b’0)/(1 + r’0);
               if (fabs(a/(a’0+b’0)) > 0.01)
                 {q -= a; r += a;}
               else break; }
    ’0 => [r => $r0]

where ’0 represents a mark and ’0 => [...] is the environment for the mark. Notice how marks keep the duplicate a and r’s in the expanded output distinct.

Now, the statement “float a’0 = q, b’0 = r” is compiled. Compiling the first part creates a unique symbol $a0 and the mapping a’0 => $a0 is added to the new environment inside the for loop. The variable q on the right-hand-side resolves to the $q0 symbol in the local environment. A similar process is performed for the second part. We now have:

     ...
     for (;;) { float $a0 = $q0, $b0 = $r1;
                [b’0 => $b0, a’0 => $a0, r => $r1,
                 q => $q0, ...]
                float a = (a’0 - r’0*b’0)/(1 + r’0);
                ...}
     ’0 => [r => $r0]

Next, the statement “float a = ...” is compiled. A unique symbol $a1 is created for a and the associated mapping is added to the local environment. Then the right-hand-side expression must be compiled. The variables a’0 and b’0 resolve to $a0 and $b0, respectively, since they are found in the local environment. However, r’0 is not found, so the mark ’0 is stripped, and r is looked up in the environment for the ’0 mark and resolves to $r0. We now have:

     ...
     for (;;) { ...
                float $a1 = ($a0 - $r0*$b0)/(1 + $r0);
                [a => $a1, b’0 => $b0, a’0 => $a0,
                 r => $r1, q => $q0, ...]
                if (fabs(a/(a’0+b’0)) > 0.01)
                  {q -= a; r += a;}
                else break; }
     ’0 => [r => $r0]

Next, the if is compiled. The marks keep the two a variables in the expression a/(a’0+b’0) distinct, and everything correctly resolves. Thus, we finally have:

  float $r0 = 1.61803399;
  int main() {
    float $q0 = 3, $r1 = 2;
    for (;;) { float $a0 = $q0, $b0 = $r1;
               float $a1 = ($a0 - $r0*$b0)/(1 + $r0);
               if (fabs($a1/($a0+$b0)) > 0.01)
                 {$q0 -= $a1; $r1 += $a1;}
               else break; }
  }

Hence, all symbols are correctly bound and hygiene is maintained.

6.1.2 Multiple Marks

The symbols in the expansion of make_golden only had a single mark applied to them. However, in some cases, such as when macros expand to other macros, multiple marks are needed. For example, multiple marks are needed in the expansion of plus_10 in Figure 6.2.


  macro mk_plus_n (NAME, N) {
    macro NAME (X) { ({int x = X; x + N;}); }
  }
  
  static const int x = 10;
  mk_plus_n(plus_10, x);
  
  int main() {
    int x = 20;
    return plus_10(x);
  }

Figure 6.2: Example code to show how hygiene is maintained when a macro expands to another macro.


In this figure, mk_plus_n expands to

  macro plus_10 (X’0) { ({int x’0 = X’0; x’0 + x;}); }

where the first mark ’0 is applied. A second mark is then applied in the expansion of plus_10(x) in main:

  { ({int x’0’1 = x; x’0’1 + x’1;}) }

In particular, a second mark is added to x’0, making it x’0’1. This symbol then resolves to the x local to the macro plus_10. In addition, x’1 resolves to the global x constant1 and the unmarked x resolves to the x local to main. Thus, hygiene is maintained in spite of three different x’s in the expansion.

6.1.3 Structure Fields

Normal hygiene rules will not have the desired effect when accessing fields of a structure or class. Instead of trying to look up a symbol in the current environment, we are asking to look up a symbol within a specialized subenvironment.

For example, the following code will not work with normal hygiene rules:

  macro sum(q) {q.x + q.y;}
  struct S {int x; int y;}
  int f() {
    struct S p;
    ...
    return sum(p);
  }

The problem is that sum(p) will not be able to access the fields of p since it will expand to “p.x’0 + p.y’0” with marks on x and y. The solution is to use a special lookup rule for structure fields. The rule is that if the current symbol with its sets of marks is not found in the structure, strip the outermost mark and try again, and repeat the process until no more marks are left. This process is similar to the normal lookup rule except that the subenvironment associated with the mark is ignored since it is irrelevant. In the above example, p.x’0 in the expansion of sum(p) will resolve to the structure field x in struct S.

A similar strategy is used when accessing members inside a module or user type (including classes). For example, looking for X::f’0() will also find X::f().

6.1.4 Importing Symbols from A Module

The handling of marks when importing symbols of a module is tricky. Importing a module in which the name is provided as a parameter should not expose additional symbols that are not normally visible inside the macro. On the other hand, if a macro imports a module in which the name is not a parameter its symbols should only visible to the macro and not outside of the macro. The following example illustrates which imported symbols should be visible from where where:

  module X {int x = 1;}
  
  module Y {int y = 2;}
  
  macro foo(VAR,MOD) {
    module Z {int VAR = 3;
              int z = 4;}
    import X; import MOD; import Z;
    printf("%d %d %d\n", x, VAR, z);
    // Y::y is not visible
  }
  
  int main() {
    foo(v,Y);
    printf("%d %d %d\n", y, v);
    // X::x and Z::z not visible
  }

The context of the imported symbols is defined by the marks that are stripped in order to find the module. More specially, the stripped marks are applied to every symbol imported. In the above example when foo imports the module X the macro’s mark is stripped, this mark is then applied to X::x making the imported symbol visible to the macro but not the caller. In contrast when the module Y (from the pattern variable MOD) is imported in the expansion of foo(v,Y), no marks are applied to y thus is is visible to the caller but not the macro. Finally, when the module Z is imported in the macro no additional marks are applied since the module was defined by the macro and hence the module name already has a mark on it, and thus no marks are stripped to resolve the modules symbol. If additional marks were applied when importing the Z module the symbol v (from the pattern variable VAR) will not be visible outside the macro.

6.2 Bending Hygiene

6.2.1 Exporting Marked Symbols

One common reason to bend hygiene is to make macro introduced symbols visible outside the macro. To archive this effect ZL provided a way to export symbols introduced by the macro.

As already mentioned in Section 2.6, ZL allows top-level symbols to be exported in pattern based macro (and simple procedural macros) by using:

  macro foo() :(*) {/* body */}

Additionally, individual symbols can be exported. For example, to export the symbols, x and y:

  macro bar() :(x,y) {/* body */}

Exporting is handled using two primitives that can be used directly by lower level procedural macros.

Exporting Specific Symbols. Exporting specific symbols is handled using the macro_export primitive. For example, the macro bar could be written as:

  Syntax * bar(Syntax * call, Syntax * parms) {
     Mark * mark = ...;
     Match * repl = match_f(NULL, syntax [NAME], call);
     return replace(syntax {
       __raw(macro_export NAME (symbol x) (symbol y));
       /* body */
     }, repl, mark);
  }
  make_macro bar;

(As macro_export is not a form that is commonly used ZL does not provide support for it in the higher level syntax, hence the __raw form is used, which switches the syntax to the lower level s-expression form until the end of the current statement.)

The use of macro_export involves the use of a pattern variable NAME that is used to get the context in which to export the symbols. After expansion the macro_export line would become:

  __raw(macro_export’ bar (symbol’ x’) (symbol’ y’))

where marks are applied to x and y and foo gets the context of the call site, which for most cases is the empty context (i.e., no marks).

Exporting Top-Level Symbols. Exporting all top-level symbols with a specific mark is handled by calling the new_mark_f function directly and a special form of the macro_export primitive.

The full version of the new_mark_f function is:

  Mark * new_mark_f(EnvironSnapshot *, bool export_tl,
                    Context * export_to, Context * also_allow);

If export_tl is true, than all symbols top-level like symbols that contain the newly created mark are exported to export_to, this includes structure fields and module symbols.

If a macro is expanded inside a function than any symbols created in the outermost scope of the expansion are not exported as the environment inside a function is not a top-level one. Since exporting these symbols is often a desirable behavior, ZL provides a special form of the macro_export primitive which is used as follows:

  __raw(macro_export NAME tl_this_mark)

This form will export all symbols in the same scope where the directive was used. Unlike with the symbol directive, any symbols in an inner scope, are not exported. For example, given

  __raw(macro_export NAME tl_this_mark)
  int x;
  {
    int y = 2;
    x = y;
  }

the symbol x will be exported, but y will not.

For simplicity of the implementation only marks created with export_tl set to true can be used with the tl_this_mark directive, in addition the context of NAME is ignored and the context of export_to (when creating the mark) is used instead.

Putting it all together, the macro foo() could be written as:

  Syntax * foo(Syntax * call, Syntax * parms) {
    Match * repl = match_f(NULL, syntax [NAME], call);
    Mark * mark = new_mark_f(environ_snapshot(), true,
                             get_context(repl->var(syntax NAME)), NULL);
    return replace(syntax {
      __raw(macro_export NAME tl_this_mark);
      /* body */
    }, repl, mark);
  }

The also_allow allow paramter is most useful in the context of macro expanding to other macros and is thus not used here.

Implementation. The exporting of symbols is handled by creating an alias to the original symbol in the desired context when a marked symbol is defined. This way uses of the symbol inside the macro will unconditionally bind to the symbol the macro defined, while also making the symbol visible outside the macro.

When a new symbol is introduced by importing a module the additional marks are first applied as outlined in Section 6.1.4. Then, after the marks are applied, the new symbol is considered for exporting using the same criteria as if the symbol is defined normally except that the tl_this_mark directive is never used. This directive is not used because it is unclear how it should be handled as symbols imported from a module are normally not considered to be in the same scope.

6.2.2 Fluid Binding

The other common case for wanting to bend hygiene is to make symbols visible to the macro parameters. For this ZL provides fluid_binding, which allows a variable to take its meaning from the use site of a macro rather than the macro’s definition site, in a similar fashion to define-syntax-parameter in Racket [84].

Section 2.7 of the tutorial gave one example of the need for fluid_binding; another prime example is the special variable this in classes. Variables in ZL are lexically scoped. For example, the code:

  int g(X *);
  int f() {return g(this);}
  int main() {X * this = ...; return f();}

will not compile because the this defined in main is not visible in f, even though f is called inside main. However, if the this variable was instead dynamically scoped, the this in main would be visible to f.

Normal hygiene rules preserve lexical scope in a similar fashion, such that:

  int g(X *);
  macro m() {g(this);}
  int main() {X * this = ...; return m();}

will also not compile. Attempts to make this work with get_ and replace_context will not compose well [4]. What is really needed is something akin to dynamic scoping in the hygiene system. That is, for this to be scoped based on where it is used when expanded, rather than where it is written in the macro definition. This can be done by marking the this symbol as fluid using fluid_binding at the top level and then using fluid when defining the symbol in local scope. For example:

  fluid_binding this;
  int g(X *);
  macro m() {g(this);}
  int main() {X * fluid this = ...; return m();}

will work as expected. That is, the this in m will bind to the this in main.

6.2.3 Replacing Context

When all other methods of bending hygiene fail, ZL provides the analog to datum->syntax-object in the syntax-case expander [5] in the form of two functions: get_context and replace_context, shown in Figure 6.3, which changes the context (ie the set of marks) associated with a symbol.


Type Context with related functions:

Context * get_context(Syntax *)

Syntax * replace_context(UnmarkedSyntax *, Context *)


Figure 6.3: Visibility API.


For example, a macro defining a class needs to create a vtable that is accessible outside of the macro creating the class. The get_context function gets the context from some symbol, generally some part of the syntax object passed in, while replace_context replaces the context of the symbol with the one provided. For example, code to create a symbol _vtable that can be used later might look something like:

  ...
  Match * m = match_f(0, raw_syntax (name ...), p);
  Syntax * name = m->var(m, syntax name);
  Context * context = get_context(name);
  Syntax * _vtable = replace_context(syntax _vtable, context);
  ...

Here name is the name of the class that is passed in as m. The name symbol is extracted into a syntax object so that it can be used for get_context. The replace_context function is then used to put the symbol _vtable in the same context as name. Now _vtable will have the same visibility as the name symbol, and thus be visible outside the macro.

6.2.4 Gensym

ZL does not provide the equivalent to Lisp’s gensym as it is simply not needed. In the rare case a truly unique symbol is needed a fresh mark, with an empty context, can be used:

  Syntax * name = replace(syntax anon, NULL, new_empty_mark());

Converted From LaTeX using TeX4ht. PDF Version