Diagram

Query

verb | \ˈkwir-ē\

to ask questions of especially with a desire for authoritative information

Yesterday Logan and I pair programmed on his Datomic-inspired logos database, implementing query resolution with a backtracking query engine.

At least that’s the short way to put it. The long way, the detailed way, the chronicle of prolonged mental suffering, well that’s what this post is.

Introduction

The part of our work I will focus on is the query engine, as that was the lion’s share of what we did today.
First, some terms:

Querya question being asked
Databasea source of facts

A query engine is a function that processes a query given a database and yields the answer to the asked question.
In our case, a query takes the form:

find <variables> where <clauses>

Our example query will be the following:

find ?b where (?a name "Bob")
              (?b parent ?a)

And a clause is of the form (entity attribute value), which is identical to our representation of facts in the database, except that clauses are patterns for facts, and can contain variables, such as in the example query.

What the example query is asking is to find any value(s) (which we are giving the name “b”) that are entities defined to have an attribute parent that has any value (which we are calling “a”) that in turn has the name attribute of "Bob". All that is described in only two clauses! Whew!

The Engine of Knowledge

Alright, so given a database of facts such as the following, how is our query answered?

(0 name "Bob")
(1 name "John")
(1 parent 0)

(Note: here entities are designated by numbers)

Our first attempt went something like the following:

results = []

foreach fact in database:
    discovered = {}
    
    foreach clause in query:
        if clause.entity is a literal and does not equal fact.entity:
            continue to next fact
        if clause.entity is a variable:
            add (clause.entity.variable is equal to fact.entity) to discovered

        if clause.attribute is a literal and does not equal fact.attribute:
            continue to next fact
        if clause.attribute is a variable:
            add (clause.attribute.variable is equal to fact.attribute) to discovered

        if clause.value is a literal and does not equal fact.value:
            continue to next fact
        if clause.value is a variable:
            add (clause.value.variable is equal to fact.value) to discovered

    add discovered to results

return results

In this way, we look through everything and bind all of the variables to what they match in the database, and throw out any candidate results that contain anything that doesn’t match.

And this works for simple queries like find ?a where (?a name "Bob") (get the entities with the name Bob) or even find ?a ?b where (?a name ?b) (get the entities that have a “name” attribute and the values of that attribute), like we had in our initial tests!

However, this method can’t even begin to correctly satisfy our more complex example query!
Currently the algorithm can only extract information from a single fact per solution, but our query requires multiple facts to be considered and used to contrain the results simultaneously!
How can this be done? I’ll give you a minute.

Brain hurting yet? Here’s an answer: backtracking.

The Engine of Insight

According to Wikipedia:

Backtracking is a general algorithm for finding all (or some) solutions to some computational problems, notably constraint satisfaction problems, that incrementally builds candidates to the solutions, and abandons each partial candidate c (“backtracks”) as soon as it determines that c cannot possibly be completed to a valid solution.

Note, that unlike what our algorithm above, where we totally abandon the current candidate when a clause doesn’t match, the backtracking algorithm only abandons the current partial candidate, i.e. it keeps trying different options, only restarting from scratch when all possible solutions based on current knowledge have been explored.
Or in other words: any solution it finds will be considering and utilizing however many facts it needs, simultaneously! Ta-da!

The code we came up goes like this:

initial_state = { discovered = {}, clause_idx = 0, fact_idx = 0 }
results = []
stack = [initial_state]

while stack is not empty:
    state = stack.pop()
    current_clause = null
    current_fact = null     

    if state.clause_idx >= query.clauses.len():
        # all clauses have succeeded, add current knowledge to results
        add state.discovered to results
        continue
    else:
        current_clause = query.clauses[clause_idx]
    
    if state.fact_idx >= database.facts.len():
        continue
    else:
        current_fact = database.facts[state.fact_idx]

    # unify() is essentially the same as the body of the 
    # inner loop in the old code above
    match unify(state.discovered, current_clause, current_fact):
        Success(newly_discovered):
            # Move on to next fact
            increment state.fact_idx
            push state onto stack

            # Investigate the next clause with new info
            new_state = {
                          discovered = state.discovered + newly_discovered, 
                          fact_idx = 0, 
                          clause_idx = state.clause_idx + 1 
                        }
            push new_state onto stack
        Failure:
            # Nothing new was learned, so just move on to next fact
            increment state.fact_idx
            push state onto stack

return results

The real code with details is here.