Thursday, February 02, 2017

GPS vs ATP

There is a view to which I'm quite attracted: that an automated theorem prover is a general-purpose AI engine, which can be used to power arbitrary intelligent behaviour.



This kind of thinking goes back to the earliest days of AI. In "Paradigms of Artificial Intelligence Programming: Case Studies in Common Lisp" by Peter Norvig, we read this (Chapter 4):
"The General Problem Solver, developed in 1957 by Alan Newell and Herbert Simon, embodied a grandiose vision: a single computer program that could solve any problem, given a suitable description of the problem.

GPS caused quite a stir when it was introduced, and some people in A1 felt it would sweep in a grand new era of intelligent machines. Simon went so far as to make this statement about his creation:

'It is not my aim to surprise or shock you. . . . But the simplest way I can summarize is to say that there are now in the world machines that think, that learn and create. Moreover, their ability to do these things is going to increase rapidly until - in a visible future  -the range of problems they can handle will be coextensive with the range to which the human mind has been applied.' "

GPS was not a theorem prover, it was - as the name implies - a planner. It used a logic-like syntax to express goals, initial conditions, and the preconditions and postconditions of actions. Carrying out an action, where it was applicable, would delete the preconditions from the current database and assert the postconditions.

GPS started with the goal and contemplated applying the actions in reverse, to create a tree of subgoals looking for a way back to the initial conditions.

Here is how GPS stacks blocks.


I'll come back to why GPS did not solve the problem of artificial general intelligence in a moment. Let's think about GPS and theorem proving.

Logic concerns itself with abstract relationships. It deals in a deeply uninterpreted universe of objects (constants, variables and functions), sets of those objects, and the connections between them.

It's timeless and spaceless, and if you want those things you have to explicitly introduce them, either with extra axioms or by beefing up the inference system so that you're less like a vanilla theorem prover and more like a real-world planner. This is not a small task.

Prolog exemplifies the dilemma. Prolog is a resolution theorem prover for a highly-restricted syntax of first-order logic, namely Horn clauses. But Prolog also has many built-in functions and relationships, for example arithmetic which allows time to be encoded.

It can also extra-logically alter its own axioms mid-computation.

As David H. D. Warren did in the famous WARPLAN back in 1974, you can get the functionality of an efficient planner in an enhanced theorem prover - Prolog - but it's completely non trivial.

---

So why didn't GPS take over the world?

It has a major Achilles heel. It's only as good as its operators and the expressive power of its description language. These are magically introduced by the programmer reflecting a pre-existing paradigm of problem-conceptualisation.

GPS has nothing to say about how such a paradigm might be synthesised in the first place.

This is feature (or bug) which GPS-like systems share with all rule-based systems. It's why Expert Systems proved so fragile, and hard to build and maintain.

In real life, our engagement with 'reality' is always changing, the rules never exactly apply and obsolesce even as they cohere. Learning, intervention and further understanding has to occur at a more granular level than GPS - indeed, the latter is best understood as emergent.

Although artificial neural nets, with their fluid, distributed weighting-representations are the new central dogma of AI, the GPS approach was both easier and more obvious. It even works in some stable, well-defined domains.

It had to be tried; it was hardly wasted effort.

---

Another joy from Norvig's book (available for download here) is in section 1.7: the origins of lambda notation.
"It is also possible to introduce a function without giving it a name, using the special syntax lambda. The name lambda comes from the mathematician Alonzo Church's notation for functions (Church 1941). Lisp usually prefers expressive names over terse Greek letters, but lambda is an exception. A better name would be make-function.

Lambda derives from the notation in Russell and Whitehead's Principia Mathematica, which used a caret over bound variables. Church wanted a one-dimensional string, so he moved the caret in front: ^x(x + x).

The caret looked funny with nothing below it, so Church switched to the closest thing, an uppercase lambda, Λx(x + x).

The Λ was easily confused with other symbols, so eventually the lowercase lambda was substituted: λx(x + x).

John McCarthy was a student of Church's at Princeton, so when McCarthy invented Lisp in 1958, he adopted the lambda notation. There were no Greek letters on the key punches of that era, so McCarthy used (lambda (x) (+ x x)) .. and it has survived to this day."

No comments:

Post a Comment

Comments are moderated. Keep it polite and no gratuitous links to your business website - we're not a billboard here.