Term rewriting equations and patterns
For a programmer, functions are the fundamental form of expression. Programmers write functions that transform inputs into outputs.
The inputs and outputs are complex data that represent something in the real world. I think about the inputs and outputs as abstract data shapes. I might start with a nested map that needs to be transformed into a sequence of sub-selections. Data comes in one shape and goes out another. Squares in, circles out.
For a mathematician, equations are the fundamental form of expression. Equations define valid rules for transforming one expression into another. Mathematicians solve problems through symbolic manipulation.
Functions and equations are both fine ways to specify transformations. So why only use functions for programming? What does it look like to program data transformation like equations instead of functions? Today we will look at some examples of an equation base approach to specifying data transformations. Before diving into the examples, let’s first establish some definitions
Definitions
Term rewriting |
Replacing terms with other terms |
Term |
A variable or function application: \(a\), \((a + b)\), \((a + b)^2\) |
Variable |
\(a,b\) can be assigned a value |
Function |
|
Rule |
A pair of matching terms and substitution terms: \((a + b)^2 \Rightarrow a^2 + 2ab + b^2\) |
Matching |
\((1 + 3)^2\) can be made equal to \((a + b)^2\) when \(a=1,b=3\) |
Substitution |
Creation of new terms from old ones \((1 + 3)^2 \Rightarrow 1^2 + 2\cdot1\cdot3 + 2^2\) |
Math equations are rules that strictly preserve equivalence. Term rewriting is a more general concept that permits any transformation. |
Introducing Meander
Meander is a term rewriting system packaged as a Clojure library for data transformation.
[meander/epsilon "0.0.650"]
(ns meander.examples
(:require [meander.epsilon :as m]))
I prefer to call it a structural pattern syntax for data transformation.
Rewrite
The m/rewrite
macro is the swiss army knife of Meander.
Let’s do our first transformation from a vector to a map:
; Meander!
(m/rewrite [:a 1 :c 2] ; input
[?a ?b ?c ?d] ; a match pattern
{?a ?d}) ; a substitution pattern
;=> {:a 2 :c 1} ; output
-
The match pattern is a data literal with the shape of the input
-
The substitution pattern is a data literals with the shape of the output
-
?a
is called a logic variable -
_
is for ignoring unused terms
|
Logic
(m/rewrite [1 2] ; input
[?a ?a] ; a match pattern
{?a ?a}) ; a substitute pattern
;=> nil
No match.
?a
cannot match both 1
and 2
.
(m/rewrite [1 1] ; input
[?a ?a] ; a match
{?a ?a}) ; a substitution pattern
;=> {1 1}
Matched ?a
with 1
Unification means resolving logical matches. |
Repeating terms
(m/rewrite [:a 1 :c 2 :d 3]
[!a ...]
[!a ...])
;=> [:a 1 :c 2 :d 3]
!a
is a memory variable: an array that can collect many values
…
indicates 0 or more repeated occurrences
The match pattern is symmetrical with the substitution pattern. We got out exactly what we put in. |
Repeat scope
(m/rewrite [1 :a 2 :b 3 :c]
[!x !y ...] ; match pairs
[!y !x ...]) ; substitute flipped pairs
;=> [:a 1 :b 2 :c 3]
All terms preceeding …
repeat.
Separator
Limits the scope of repeats.
(m/rewrite [:hello 1 2 3]
[:hello . !v ...]
[:world . !v ...])
;=> [:world 1 2 3]
Only the terms between .
and …
repeat.
Structurally explicit
Most Clojure functions will coerce their inputs to seqs. Meander does not do this.
[!x …]
|
will only match vectors |
(!x …)
|
will only match lists and seqs |
{:k ?v}
|
will only match maps |
#{1}
|
will only match sets |
(m/seqable !x …)
|
will match maps, vectors, strings, etc… |
I like that it will only match the same types unless I choose to be more permissive with m/seqable
.
Nesting
As you might expect, match and substitute patterns may be nested.
(m/rewrite {:xs [1 2 3 4 5]
:ys [6 7 8 9 10]}
{:xs [!xs ...]
:ys [!ys ...]}
[!xs ... . !ys ...])
;=> [1 2 3 4 5 6 7 8 9 10]
And Meander does the rearranging for you.
Pop Quiz
(m/rewrite [:hello 1 2 3]
[?k . !v ...]
[?k !v ...])
;=> [:hello 1 :hello 2 :hello 3]
Why?
When matching, we expect a single value, then repeated values. The substituting pattern has no separator, so we are substituting pairs. |
Congratulations! Enjoy these macarons as your reward.
Meander has a rich feature set. I have only shown you the core syntax. We are now going to shift gears and look at some examples in the wild. I’ll introduce a few bits of unexplained but fairly obvious syntax as we go.
For a comprehensive treatment of Meander features, see Meander on GitHub which links to documentation, a cookbook, blog posts and a Strange Loop talk. |
Examples
Rearranging logic expressions
This example is taken from Justice (an alternative syntax to Datalog queries).
(def rearrange-logic
(bottom-up
;; nested commutative logic is raised
((m/pred #{'and 'or} ?op) . !before ... (?op . !clauses ...) . !after ...)
(?op . !before ... !clauses ... !after ...)
;; moves `or` to the outside,
;; and `and` to the inside to match Datalog rule convention
(and . !before ... (or . !clauses ...) . !after ...)
(or . (and . ~@!before . !clauses . ~@!after) ...)
;; identity logic expressions are flattened
((m/pred #{'and 'or} ?op) ?body)
?body
;; double negatives are removed
(not (not ?body))
?body
;; moves `not` inside to match Datalog rule convention
(not (or . !clauses ...))
(and . (not !clauses) ...)
(not (and . !clauses ...))
(or . (not !clauses) ...)))
Web scraping
This example comes from the Chartit (an ETL/analytics utility for tracking work)
(def extract-employees
(s/search
(m/$
[:div {:class "directory-tables"} &
(m/scan
;; heading/table pairs
[:h3 {} ?department & _]
_
[:table &
(m/scan
[:tbody &
(m/scan
[:tr &
(m/separated
[:td & (m/scan [:a {} ?name & _])]
[:td {} ?title & _]
[:td & (m/scan [:a {:href ?mailto} & _])])])])])])
;;=>
{:department (str/trim ?department)
:name ?name
:title ?title
:email (subs ?mailto 7)}))
Parsing defn like forms
Here is another example from Justice.
where it was desirable to define a macro that behaves like defn
.
This example can be directly compared with the Clojure.spec approach.
(defn wrap-defn
"Returns a function that will parse a form according to `defn` semantics.
Takes a function which will convert fn-spec forms."
[rewrite-fn-spec]
(m/rewrite
(m/and ((pred simple-symbol? ?name) .
(pred string? !?docstring) ...
(pred map? !?attr-map) ...
!tail ...)
(m/guard (<= (count !?docstring) 1))
(m/guard (<= (count !?attr-map) 1))
(let
(or (([(m/pred simple-symbol? !params) ... :as !param-list]
. !forms ... :as !fn-specs) ..1)
([(m/pred simple-symbol? !params) ... :as !param-list]
. !forms ... :as !fn-specs))
(list* !tail))
(m/guard (apply distinct? (map count !param-list))))
;;>
(defn ?name . !?docstring ... !?attr-map ...
~@(map rewrite-fn-spec !fn-specs))))
Schema to code
This example shows the core functionality of HappyGAPI (a library that exposes Google APIs by generating code from schemas).
(defn summarize-schema [schema request depth]
"Given a json-schema of type definitions,
and a request that is a $ref to one of those types,
resolves $ref(s) to a depth of 3,
discards the distracting information,
and returns a pattern for constructing the required input."
(m/rewrite request
{:type "object"
:id ?id
:additionalProperties ?ap
:properties (m/seqable [!property !item] ...)}
;;>
{& ([!property (m/app #(summarize-schema schema % depth) !item)] ...)}
{:type "array"
:items ?item}
;;>
[~(summarize-schema schema ?item depth)]
{:type (m/pred string? ?type)}
;;>
(m/app symbol ?type)
{:$ref (m/pred string? ?ref)}
;;>
~(if (> depth 2)
(symbol ?ref)
(summarize-schema schema (get schema (keyword ?ref)) (inc depth)))))
AST manipulation
This example comes from Kalai, a transpiler from Clojure to Java and Rust.
(def propagate-types-from-bindings-to-locals
"We propagate type information which is stored in metadata
from the the place where they are declared on a symbol
to all future usages of that symbol in scope."
(s/rewrite
{:op :local
:form ?symbol
:env {:locals {?symbol {:form ?symbol-with-meta
:init ?init}}
:as ?env}
& ?more
:as ?ast}
;;>
{:op :local
:form ~(propagate-ast-type ?init ?symbol-with-meta ?ast)
:env ?env
& ?more}
;; otherwise leave the ast as is
?else
?else))
Reading patterns vs functions
Those were some pretty complicated transformations.
Could you follow them?
I bet it was easier than most code.
Is that normal?
I claim that term rewriting expressions are much easier to read than functional transformations.
Next let’s compare what is involved in writing functions vs patterns.
Writing functions vs patterns
Functions are like recipes
Functions are all about what to do with inputs.
Clearly the big difference between functions and rewrite rules is what to do with the inputs. To write a function is to specify the tasks in order to convert the ingredients into the desired meal. A function is a verbal description of how to transform inputs to outputs in written form.
Recipe for a function
1. Prepare meat and potatoes |
(example inputs) |
2. Mix ingredients in a pot |
(function definition) |
3. Boil |
(iterate, make it work) |
4. Garnish with documentation |
(docstring or types) |
5. Simmer |
(tests) |
Easy to forget why it tastes good |
Patterns are more like before and after pictures
Rewrite rules are only the inputs and outputs; the what to do part is completely missing.
Depicting a pattern
1. Prepare example inputs |
|
2. Parameterize |
|
3. Recombine as outputs |
|
4. Create a test |
|
5. Verify the output |
|
Examples are the best kind of documentation. Writing Meander feels like writing examples. It gives me confidence I solved the right problem, and it’s easy to see what it does coming back to it later. |
I like pictures. I especially like diagrams. So much so that I built an app for making diagrams: Hummi.app and I’ve written an article about why diagrams are important. The thing I like most about Meander is that transformation patterns depict the shape of the data that is coming in and going out.
When should we use term rewriting, when a function?
Term rewriting is suitable for largish data reshaping where you would otherwise do destructuring, updates, restructuring, and pipelines. We have seen some compelling examples, but not every problem is such a clear-cut case of re-shaping.
Why isn’t this the default way of writing code everywhere?
-
Term rewriting systems are hard to implement, and rarely integrated with programming languages.
-
Functions are more familiar in the programming community.
-
Not a silver bullet.
Problems that require aggregation, reduction, and variables tend to be well expressed with a loop or recursive function. Problems that are extract and reconstitute tend to be well expressed as patterns.
Limitations of term rewriting
Syntax
The syntax available in data literals is limited by the host language.
Meander syntax is partially extensible through defsyntax
to define custom operators.
Execution extension is available by function application (see m/app
and ~
).
IDEs and linting tools have trouble error checking pattern expressions because they are an entirely new set of expressions with a new grammar.
Less open to extension
Multi-methods and Protocols allow for the extension of behavior by including a new function and associating it with the output of the dispatch function. Multi-method and Protocol dispatch is somewhat similar to a case statement in that we expect distinct values to be matched. Multi-methods address the possibility of multiple matches by the introduction of hierarchies and preferences: “prefer-method creates an ordering between methods when they would otherwise be ambiguous” (see Clojure - Multi-methods and Hierarchies).
Pattern matching by contrast often defines many overlapping matching patterns, and is thus sensitive to ordering. The order that patterns are defined matters because the first value match will be used. This makes extension more challenging because extensions need to be coordinated.
One possible approach for extending patterns would be to specify them relative to existing clauses. You could add a pattern at the end, add a pattern before an existing pattern, or add a pattern after an existing pattern. This could get complicated if several libraries participated in an extensible dispatch. The order of extension might change the outcome.
Failure to match is opaque
Similar to regular expressions, it can be difficult to figure out why a match isn’t made for a given input.
Simplifying and decomposing can help.
Function application can be used (m/app #(doto % prn) ?x)
to spy on terms.
Reduction
Meander has recursion, so you can reduce. But recursion has no syntactic advantage. The next branch of Meander (zeta) contains a neater way to express reduction.
Memory variable correspondence
Nested memory variables in a single expression can get confusing, and often don’t behave as you’d expect. There is potential for simplification issue#129. The next branch of Meander (zeta) has flexibility in how variables behave, which may solve this.
Performance
On par with hand rolled functions. Meander is a high quality library: fast and reliable.
Decomposition
The current options for creating sub-patterns are m/with
and m/app
,
Which are equivalent to functional decomposition.
Other approaches to data transformation
Here is a short list of other data transformation approaches in Clojure so that we can examine where they differ from Meander:
Clojure.core |
Clojure is concise and powerful.
Many functions defined on few primary data structures.
We operate on data but do not specify what the data is.
So it is common to be looking at a function that operates on |
Destructuring |
Does one job well. No logic expressions or substitution. |
Core.match, Core.logic |
Independently work great. There is no convenient syntax for substitution. |
Clojure.spec |
Defines the shape of |
Specter |
Takes a navigator/action approach. Improves data manipulation. Very compact, linguistic. |
Each of these approaches have benefits and drawbacks. The purpose of this article was to show where Meander sits relative to these methods, and hopefully inspire you to explore this approach.
Conclusion
The mathematical style term rewriting approach to data transformation has several merits:
-
When reading expressions, inputs and outputs are instantly recognizable
-
When writing expressions, examples can be converted to patterns
-
The declarative style is symmetrically pleasing
-
Visual expression is more effective than verbal expression
Meander provides term rewriting as a convenient library for Clojure. I hope you will give it a try and find it as useful as I have.