Unification

Lecture #11
Complete the associated in-class exercises.

1 How Does Prolog’s Proof Procedure Handle Variables?
2 Substitutions
3 Unification
- 3.1 Unification Algorithm
- 3.2 The Occurs Check
4 Full Proof Examples

1 How Does Prolog’s Proof Procedure Handle Variables?

We explored how Prolog automatically derives a proof for a query back in propositional logic, but we never really figured out how full Prolog manages the same thing. What’s going on?

For this part, we’ll again consider a fact (like p.) to be a rule with an empty body (like p :- .).

1.1 What Happens Intuitively

Intuitively, when Prolog runs across a goal like append([1,3|Tail], [2,4], Result), it looks for clauses that are about append and, for each clause, it tries to match up the head of that clause with the three terms inside the goal [1,3|Tail], [2,4], and Result. It tries the first matching clause it finds. If that fails, it returns to that choice and tries the next matching clause instead, until it runs out of matching clauses.

What does that look like more formally? Let’s go back to propositional logic and then extend it to full Prolog.

1.2 Propositional Logic Proof Procedure

For propositional logic, Prolog performs a backtracking search for a proof. To solve a query like ?- q1, ..., qk:

Let the answer clause A be yes :- q1, ..., qk.
While the body of A has goals inside:
1. Let the leftmost goal in A be a1.
2. Choose a clause C in the KB with a1 as its head.
3. Replace a1 in the body of A with the body of C.

That choose step is non-deterministic: Prolog tries a choice and if the choice fails, it backtracks to that choice and tries the next option. Specifically:

If Prolog hits the choose step and there is no matching clause (or no matching clause left), it fails.
When Prolog hits the choose step and there is at least one matching clause, it tries the first matching clause in the KB and proceeds from there. But, it remembers which one it chose. On failure, it goes back to its most recent choice and tries the next matching clause instead, or fails again if there are none.

1.3 Proof Procedure with Variables

In propositional logic, a goal a1 matches the head of a clause in the KB if they are literally the same simple atoms.

In full Prolog, a1 and the head of the clause may have arguments, including variables. The two atoms match if we can “make them look the same”: they unify. (We get incredibly lucky because any two items that unify do so in one most-general way; we don’t have to return to the unification over and over again and try different ways to unify!)

Our procedure winds up being another backtracking search for a proof. To solve a query like ?- q1, ..., qk, where goals q1, ..., qk may have terms inside them including variables¹, and all the variables anywhere inside these goals are V1, ..., Vj:

Let the generalized answer clause A be yes(V1, ..., Vj) :- q1, ..., qk.
While the body of A has goals inside:
1. Let the leftmost goal in A be a1, and let a1’s predicate symbol be p1.²
2. Choose a clause C in the KB.
  1. Rename all the variables in C (so they don’t accidentally overlap with the ones in A).
  2. It must be possible to unify h and a1. (If not, this choice fails.)
  3. Let the substitution that unifies h and a1 be θ
3. Replace a1 in the body of A with the body of C and apply the substitution θ to A (its head, C’s body, and A’s remaining goals).

Again, if the choose step finds no options (either because no heads with the appropriate predicate symbol are left or because the head doesn’t unify with the goal), Prolog fails and rewinds to its most recent choose where it had more options.

To really make sense of that, however, we need to know what a substitution is in more detail and how the algorithm to unify two atoms/terms works.

2 Substitutions

We need a few definitions:

A substitution is a (finite) list of mappings of variables to terms. We write it like: {V₁/t₁, …, V_n/t_n}, where each V_i is a different variable, and each t_i is the term we want to use to replace the variable.
The application of a substitution to an atom or clause is what we get when we replace every variable in the atom/clause that appears in the substitution with its corresponding term. (So, for any variable V_i that is mapped to t_i in the substitution, we find all occurrences of V_i in the atom/clause and replace each one with t_i.)³

An instance of an atom/clause is the result of applying some substitution to the atom/clause.

If σ is a substitution and c is an atom or clause, then we write cσ to mean the instance we get from applying σ to c.

For example, consider these substitutions:

σ₁ = {X/A, Y/b, Z/C, D/e}
σ₂ = {A/X, Y/b, C/Z, D/e}
σ₃ = {A/V, X/V, Y/b, C/W, Z/W, D/e}

What is the result of each of the following substitution applications? (The first is complete as an example.)

p(A,b,C,D)σ₁ = p(A,b,C,e). (σ₁ only has a replacement for D of the variables in p(A,b,C,D). We’ve replaced it.)
p(X,Y,Z,e)σ₁
p(A,b,C,D)σ₂
p(X,Y,Z,e)σ₂
p(A,b,C,D)σ₃
p(X,Y,Z,e)σ₃

(Two Exercises. We’ll do the first together in class.)

3 Unification

Unifying two atoms or terms means making them look the same. Specifically:

A substitution σ is a unifier of atoms/terms e₁ and e₂ if e₁σ = e₂σ. That is, the instance we get from applying σ to e₁ is the same one we get from applying σ to e₂. They match!
A substitution σ is a most general unifier or mgu of e₁ and e₂ if:
- σ is a unifier of e₁ and e₂, and
- if substitution σ′ is also a unifier of e₁ and e₂, then eσ′ is an instance of eσ for all atoms/terms e.
In other words: σ unifies e₁ and e₂, and if something else unifies them as well, then it’s just a special-case (or renaming) of σ.
If two Prolog atoms/terms have a unifier, then they have a mgu.⁴
If there are multiple mgu’s, then they differ only in the names of the variables chosen.

Let’s try an example. Consider:

e₁= append(cons(1,cons(3,Tail)), cons(2,cons(4,empty)), Result),
e₂= append(empty, X, X), and
e₃= append(cons(X,Xs), Ys, cons(X,Zs)):⁵

For these:

e₂ has no unifier with either e₁ or e₃. That’s because there’s no substitution that can make the terms cons(1,cons(3,Tail)) and empty look alike. (No matter what we do to the one variable Tail, the rest of the terms won’t match!) Similarly, no substitution makes cons(X,Xs) and empty look alike.
e₁ unifies with e₃ with various unifiers. For example:
- Consider the substitution: σ₁ = {X/1, Xs/cons(3,empty), Tail/empty, Ys/cons(2,cons(4,empty)), Result/cons(1,Zs), Unnecessary/Irrelevant}.
  
  Let’s apply that:
  - append(cons(1,cons(3,Tail)), cons(2,cons(4,empty)), Result) σ₁= append(cons(1,cons(3,empty)), cons(2,cons(4,empty)), cons(1, Zs))
  - append(cons(X,Xs), Ys, cons(X,Zs)) σ₁= append(cons(1,cons(3,empty)), cons(2,cons(4,empty)), cons(1, Zs))
  And, those are the same term! So, σ₁ is indeed a unifier for them.
  
  But, σ₁ includes a totally unnecessary mapping at the end (for Unnecessary) and is more specific than it needs to be (mapping Tail to empty).
- Consider the substitution: σ₂ = {X/1, Xs/cons(3,Tail), Ys/cons(2,cons(4,empty)), Result/cons(1,Zs)}.
  
  Let’s apply that:
  - append(cons(1,cons(3,Tail)), cons(2,cons(4,empty)), Result) σ₂= append(cons(1,cons(3,Tail)), cons(2,cons(4,empty)), cons(1,Zs))
  - append(cons(X,Xs), Ys, cons(X,Zs)) σ₂= append(cons(1,cons(3,Tail)), cons(2,cons(4,empty)), cons(1,Zs))
  Again, σ₂ unifies these. σ₂ is more general than σ₁, however. (If we apply σ₁ to something, we can get the same effect by applying σ₂ and then using two more mappings: {Tail/empty, Unnecessary/Irrelevant}.)

In this case, this mgu is unique. In general, there may be many mgu’s, but they only differ in renaming variables differently.

(Exercise.)

3.1 Unification Algorithm

Intuitively, we can unify two atoms/terms if:

They’re already identical, or else
One is a variable, in which case we map it to the other atom/term⁶, or else
They are both compound terms with the same name and same number of arguments, and we can unify each of the pairs of arguments, in turn.

What does this look like as an algorithm?

Algorithm unify(t₁,t₂) either fails (if t₁ and t₂ cannot be unified) or returns a substitution σ:

Let T = {t₁ = t₂}. (This is our “todo list” of pairs of atoms/terms we need to unify.)
Let σ = {}. (This is our substitution, which we build up bit by bit as the algorithm proceeds.)
While T ≠ {}:
1. Select and remove x = y from T.⁷
2. If x is identical to y, there’s no update needed.⁸
3. Otherwise, if x is a variable:
  1. Replace x with y wherever it appears in T and σ.
  2. Add x/y to σ. (The new σ value is σ ∪ {x/y}.)
4. Otherwise, if y is a variable:
  1. Replace y with x wherever it appears in T and σ.
  2. Add y/x to σ. (The new σ value is σ ∪ {y/x}.)
5. Otherwise, if x is a compound term p(x₁,…,x_n) and y is a compound term p(y₁,…,y_n) (where the name p must match and the number of arguments n must match):
  1. Add x₁ = y₁, …, x_n = y_n to the todo list T. (The new T value is T ∪ {x₁ = y₁, …, x_n = y_n}.)
6. Otherwise, fail.
Return σ

Notice that the algorithm maintains a single substitution throughout. The result is that Prolog gets pattern-matching even more powerful than Haskell’s, where the same variable can appear in many different places.

Let’s try some examples:

unify p(A, b, C, D) and p(X, Y, Z, e)
unify p(A, b, A, D) and p(X, X, Z, Z) (left as an exercise!)
unify p(A, b, A, d) and p(X, X, Z, Z)
unify n([sam, likes, prolog], L2, I, C1, C2) and n([P|R], R, P, [person(P)|C], C)

(Exercise.)

3.2 The Occurs Check

There is one last issue we have not addressed.

Consider a knowledge base consisting of one fact: nest(X, inner(X)).

What should happen with the following query: ?- nest(Y, Y).

What does happen, in Prolog?

Now try adding this rule unnest(inner(Z)) :- unnest(Z). and running the query ?- nest(Y, Y), unnest(Y).

(You can find these rules in the file occurs_check.pl.)

The problem is that we allow a substitution to be cyclical: a variable can be inside the replacement for itself.

The solution is the occurs check: before accepting a new mapping into the substitution, ensure that it is not recursive itself and that it won’t introduce recursion into any of the other mappings.⁹

Prolog does not perform the occurs check by default, for efficiency.

4 Full Proof Examples

Let’s do some full examples of proofs.

Given the KB:

live(Y) :- connected_to(Y, Z), live(Z).
live(outside).
connected_to(w6, w5).
connected_to(w5, outside).

Here is a proof for the query live(A):

? live(A).
yes(A) :- live(A).                       % A is an argument
yes(A) :- connected_to(A, Z1), live(Z1). % we rename Y and Z.
yes(w6) :- live(w5).                     % A = w6, Z1 = w5.
yes(w6) :- connected_to(w5, Z2), live(Z2).
yes(w6) :- live(outside).
yes(w6) :- .

So, the answer is A = w6.

Try these.

Given the KB:

append([], L, L).
append([H | T], A, [H | R]) :- append(T, A, R).

Give a full proof for the query:

?- append([a, b, c], [1, 2, 3], L).

Given the KB:

elem(E, set(E,_,_)). elem(V, set(E,LT,_)) :- V < E, elem(V,LT).
elem(V, set(E,_,RT)) :- E < V, elem(V,RT).

Give a full proof for the query:

?- elem(3, S), elem(8, S).

(Five Exercises. We’ll do the first and fourth together.)

So, for example, q1 may actually be something like complex_atom(term1, compound_term2(X, Y), Z).↩︎
So, a1 actually looks like p1 or p1(...) with various terms inside.↩︎
In our assignment, we defined walking a substitution over an expression to be essentially repeatedly substituting until we stopped changing the expression (“reached a fixpoint”, as that is sometimes called). Here, we are instead doing just a single (simultaneous) pass of the substitution. That means, for example, that a substitution like X/Y, Y/X can swap the names of two variables. However, we’re going to carefully construct our substitutions so that never happens. Instead, with the exception of when we violate the “occurs check” (which we’ll define a little later): no variable that is on the left of any mapping will ever appear on the right of any mapping in a substitution produced by our unification algorithm.↩︎
We’re asserting this and the next bullet point, not proving them true. However, you can imagine an inductive proof that follows the structure of the algorithm we give below. It disassembles atoms/compound terms into their parts and shows that at each stage, we stay as general as possible.↩︎
We’re avoiding the special syntax for lists because it just confuses the issue by hiding the real compound terms being used. However, the process still works the same for our custom lists and Prolog’s built-in lists.↩︎
We’re skipping something here. It will cause us trouble, and we’ll come back to it!↩︎
This is “don’t-care non-determinism”. Handling the todo items in any order works. However, we’ll generally handle them in left-to-right order of their appearance in the original expressions.↩︎
In an implementation, we usually do more like what we did in assignment 3. We check if these are simple terms (like constants, numbers, or strings) that are identical to each other. If they’re compound terms that are identical, then the later compound term step will discover that already.↩︎
Specifically, in our algorithm, we maintain an invariant that no variable that appears on the left of a mapping in σ may also appear on the right of any mapping in σ. When we introduce a new mapping, we already know: the newly mapped variable does not appear on the left of any mapping in σ, and none of the variables on the left of mappings so far in σ can appear on the right of the new mapping. (Both of those are because we substitute out any newly added variable from T and σ prior to adding it to σ, alongside our next constraint.) We further insist that the new variable also cannot appear on the right of its mapping. If it does, we simply fail.↩︎