Comparing circular data structures

| No Comments

This is a subject which has been on my mind for a while. It turns out to be much simpler than one might expect.

Firstly, let’s be clear about what the problem actually is. We want to be able to find out whether two objects are structurally equivalent. To have a clear idea of what this means, we’ll divide objects into atomic values and containers. An atomic value is just a thing with no interesting (for our purposes) internal structure. We can suppose that we’re given a predicate for determining whether two atomic objects are equal, and we’ll say that atomic objects are structurally equivalent if and only if they’re equal. A container, on the other hand, contains pointers to other objects. We’ll need to understand the structure of containers. In abstract, we can think of a container as being a partial function from atomic values to objects. Two containers are then structurally equivalent if and only if their domains are equal (so they have the same ‘shape’) and they map equal keys to structurally equivalent objects.

This is a recursive definition. If you translate it directly into code then you end up with a function which doesn’t always work. In particular, it won’t work on circular structures (e.g., $A$ points to itself).

Over the weekend I wrote a surprisingly simple function which solves this problem. I’ll present it in bits. It’s in Common Lisp, but hopefully that won’t be too scary.

The first, and most simple, bit does the type-specific comparing. It’s a generic function with a bunch of methods. Each method accepts two things $x$ and $y$ of the same time, and decides whether they’re equal; the magic of Common Lisp’s generic function dispatch arranges for the right method to be called for the arguments provided. The function takes an extra argument, here called cmp, which is the function to call for recursive comparisons. It will contain magic for handling circular structures.

(defgeneric cequal-compare (x y cmp)
  (:method (x y cmp) nil)
  (:method ((x cons) (y cons) cmp)
    (and (funcall cmp (car x) (car y))
         (funcall cmp (cdr x) (cdr y))))
  (:method ((x string) (y string) cmp)
    (string= x y))
  (:method ((x vector) (y vector) cmp)
    (and (= (length x) (length y))
         (dotimes (i (length x) t)
           (unless (funcall cmp (aref x i) (aref y i))
             (return nil)))))
  (:method ((x array) (y array) cmp)
    (and (equal (array-dimensions x)
                (array-dimensions y))
         (dotimes (i (array-total-size x) t)
           (unless (funcall cmp
                            (row-major-aref x i)
                            (row-major-aref y i))
             (return nil))))))

Because Common Lisp wins, you can add methods for traversing your own types later. But that’s not really the point I’m trying to make.

Here’s the main external interface function. It doesn’t do much.

(defun cequal (x y)
  (cequal* x y #'cequal-compare))

The magic in in cequal*, which takes a comparison predicate as an argument just to show how winning it is.

So, how does cequal* actually do the job? The obvious and wrong answer would be

(defun wrong-cequal* (x y compare)
  (funcall compare x y compare))

(which would also beg the question of why we didn’t just call compare directly).

Here’s the actual code.

(defun cequal* (x y compare)
  (let* ((map (make-hash-table :test #'eql)))
    (labels ((follow (x)
               (loop (multiple-value-bind (next foundp) (gethash x map)
                       (unless foundp
                         (return x))
                       (setf x next))))
             (cmp (x y)
               (let ((x (follow x)) (y (follow y)))
                 (or (eql x y)
                     (progn (setf (gethash y map) x)
                            (funcall compare x y #'cmp))))))
      (cmp x y))))

That’s very shiny. What does it do?

It maintains a hash table which uses raw object identity eql to compare keys. (Well, almost raw; eq does the wrong thing on integers and characters in Lisp. The important thing is that there’s no way in the language, other than using eq, of distinguishing two objects which are eql to each other, even if you try tricks like changing one and seeing whether the other one changes in step.)

The hash table maps objects to other objects we’ve decided they must be structurally equivalent to. The guts of the thing are cmp, which works like this. It first chases its arguments $x$ and $y$ through the hash table until it finds objects $x’$ and $y’$ that it doesn’t yet have equivalents for. If they’re both the same object, then they must be equal, so that’s a win. Otherwise, it puts a note in the table that $y’$ must be equivalent to $x’$, and calls the comparison function. That’s it.

This is actually a (simpler) variant of the famous unification algorithm. It’s simpler because there aren’t any variables to worry about.

I don’t, alas, have an immediate use for this function, but it’s cool anyway.

Leave a comment

About this Entry

This page contains a single entry by Mark Wooding published on October 12, 2009 10:34 PM.

Argument passing was the previous entry in this blog.

Three-star programming is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.

Pages

OpenID accepted here Learn more about OpenID
Powered by Movable Type 5.2.13