ruby / haml renderer

Templates in my web framework use a restricted subset of Haml. A custom renderer replaced the haml gem. It parses and renders only the subset grammar, rejecting everything else at parse time.

Two files, ~1,200 lines total:

Why replace the gem

The haml gem evaluates arbitrary Ruby at render time. Templates can call methods, access constants, assign variables. The template subset uses none of this.

A custom renderer enforces the subset by construction: if the parser has no node type for a construct, it can't appear in templates. This removes Ruby eval from the rendering path and drops a dependency.

The prerequisite was restricting all ~360 templates to the dumb subset first: moving method calls, hash access, and formatting into handlers with Data.define structs. Once every template conformed, a CI linter prevented regressions, and the renderer could be built against a frozen grammar.

Parser

Haml::Subset.new(source, path:) parses source into a tree at construction time. Lines are classified into node types:

:doctype     # !!!
:comment     # -# ...
:filter      # :javascript, :css
:if          # - if expr
:elsif       # - elsif expr
:else        # - else
:each        # - collection.each do |item|
:render      # = render "name", key: value
:output      # = expr (HTML-escaped)
:raw_output  # != expr (raw)
:tag         # %tag.class#id{ attrs }
:text        # static text

There is no :eval or :ruby node. A method call, constant reference, or variable assignment has no node type to parse into, so the parser raises.

Indentation determines nesting. The parser walks lines at each indent level and recursively parses children:

private def parse(lines, base_indent, from, to)
  nodes = []
  i = from

  while i < to
    line = lines[i]
    stripped = line.lstrip
    indent = line.length - line.lstrip.length

    if stripped == ""
      i += 1
      next
    end

    if indent != base_indent
      raise "#{@path}:#{i + 1}: expected indent #{base_indent}, got #{indent}"
    end

    # Find children (lines with greater indent)
    child_end = i + 1
    while child_end < to
      next_line = lines[child_end]
      next_stripped = next_line.lstrip
      if next_stripped != ""
        if (next_line.length - next_stripped.length) <= indent
          break
        end
      end
      child_end += 1
    end

    node = parse_line(stripped, indent, lines, i + 1, child_end)
    nodes << node
    i = child_end
  end

  nodes
end

The parser extracts tag name, classes, ID, and attributes:

private def parse_tag(stripped, indent, lines, child_from, child_to)
  rest = stripped.dup
  tag_name = "div"
  classes = []
  id = nil

  if rest.start_with?("%")
    m = rest.match(/\A%(\w[\w-]*)/)
    tag_name = m[1]
    rest = rest[m[0].length..]
  end

  while rest.match?(/\A[.#]/)
    if rest.start_with?(".")
      m = rest.match(/\A\.(-?[a-zA-Z_][\w-]*)/)
      classes << m[1]
      rest = rest[m[0].length..]
    elsif rest.start_with?("#")
      m = rest.match(/\A#([a-zA-Z_][\w-]*)/)
      id = m[1]
      rest = rest[m[0].length..]
    end
  end

  # Reject inline content: inner content must be on a new line
  rest = rest.strip
  if rest != ""
    raise "#{@path}: inline content on tags is not allowed: #{stripped}"
  end

  children = parse(lines, indent + 2, child_from, child_to)
  { type: :tag, tag: tag_name, classes: classes, id: id,
    children: children }
end

Inline content on tags is banned. %h1 Title must be written as:

%h1
  Title

This simplifies parsing (every tag's content is children) and makes the structure explicit.

Expression evaluator

Expressions in = field, - if expr, and #{} interpolation go through Haml::Expr, a recursive-descent parser with a constrained grammar:

expr         ::= or_expr
or_expr      ::= and_expr ('||' and_expr)*
and_expr     ::= not_expr ('&&' not_expr)*
not_expr     ::= '!' not_expr | cmp_expr
cmp_expr     ::= primary (('==' | '!=') primary)?
primary      ::= STRING | NUMBER | BOOL | NIL | field_access
field_access ::= IDENT ('.' IDENT)*

There are three stages. Tokenize, parse to AST, evaluate:

def self.eval_string(src, ctx)
  tokens = tokenize(src.strip)
  parser = Parser.new(tokens)
  node = parser.parse_expr
  evaluate(node, ctx)
end

The evaluator walks the AST and resolves values against a context object:

def self.evaluate(node, ctx)
  case node[:type]
  when :string  then node[:value]
  when :number  then node[:value]
  when :bool    then node[:value]
  when :nil     then nil
  when :field   then eval_field(node[:parts], ctx)
  when :cmp
    left = evaluate(node[:left], ctx)
    right = evaluate(node[:right], ctx)
    case node[:op]
    when "==" then left == right
    when "!=" then left != right
    end
  when :and
    evaluate(node[:left], ctx) && evaluate(node[:right], ctx)
  when :or
    evaluate(node[:left], ctx) || evaluate(node[:right], ctx)
  when :not
    !evaluate(node[:operand], ctx)
  end
end

Field access resolves through send:

def self.eval_field(parts, ctx)
  val = ctx.send(parts[0].to_sym)
  i = 1
  while i < parts.length
    val = val.send(parts[i].to_sym)
    i += 1
  end
  val
end

= data.name becomes ctx.send(:data).send(:name), which works with Data.define structs and singleton methods on the context.

The evaluator also handles hash literals with ** splat (for tag attributes), array literals, interpolated strings, and function calls (for ViewHelper methods on the context).

Rendering

The renderer walks the AST, appending HTML to a buffer:

private def render_nodes(nodes, buf, ctx, partial_renderer)
  i = 0
  while i < nodes.length
    node = nodes[i]
    case node[:type]
    when :doctype
      buf << "<!DOCTYPE html>\n"
    when :comment
      nil
    when :text
      buf << Expr.interpolate(node[:text], ctx) << "\n"
    when :output
      buf << escape_val(Expr.eval_string(node[:expr], ctx)) << "\n"
    when :raw_output
      buf << Expr.eval_string(node[:expr], ctx).to_s << "\n"
    when :render
      buf << render_partial_call(node[:expr], ctx, partial_renderer)
    when :tag
      render_tag(node, buf, ctx, partial_renderer)
    when :filter
      render_filter(node, buf, ctx)
    when :if
      chain = [node]
      while i + 1 < nodes.length &&
          (nodes[i + 1][:type] == :elsif || nodes[i + 1][:type] == :else)
        i += 1
        chain << nodes[i]
      end
      render_conditional(chain, buf, ctx, partial_renderer)
    when :each
      render_each(node, buf, ctx, partial_renderer)
    end
    i += 1
  end
end

= expr always HTML-escapes. != expr outputs raw. Handlers pre-escape anything that needs !=.

Tags emit opening and closing HTML with escaped attributes:

private def render_tag(node, buf, ctx, partial_renderer)
  tag = node[:tag]
  attrs = build_attrs(node, ctx)
  attr_str = attrs.map { |k, v|
    if v == true
      " #{k}"
    else
      " #{k}=\"#{CGI.escapeHTML(v.to_s)}\""
    end
  }.join

  if VOID_ELEMENTS.include?(tag)
    buf << "<#{tag}#{attr_str}>\n"
    return
  end

  if node[:children] != []
    buf << "<#{tag}#{attr_str}>\n"
    render_nodes(node[:children], buf, ctx, partial_renderer)
    buf << "</#{tag}>\n"
  else
    buf << "<#{tag}#{attr_str}></#{tag}>\n"
  end
end

Context

Templates receive data through a context object. Locals become singleton methods:

private def make_context(locals, context: nil)
  env = context || Object.new
  locals.each do |k, v|
    env.define_singleton_method(k) { v }
  end
  env
end

Loop variables clone the context to avoid mutating the parent binding:

private def clone_context(ctx, name, value)
  child = ctx.clone
  child.define_singleton_method(name.to_sym) { value }
  child
end

Without cloning, - items.each do |data| would overwrite the parent data local for the rest of the template.

The renderer is the linter

Before the custom renderer, a regex-based linter scanned templates for banned constructs. It was incomplete. Regexes couldn't parse nested expressions, and every new violation pattern needed a new rule.

The custom renderer replaced the linter. Templates are parsed at boot by cache_all. A template with a construct outside the subset crashes the process before it serves a request.

If it parses, it's in the subset. If it's not in the subset, it doesn't parse.

← All articles