## Friday, March 31, 2017

### Naive generate-and-test won't hack it

When I was young I toyed with the following idea.

Pretty much any concept can be adequately expressed in a mini-essay of a thousand words.

Simply generate all possible articles of a thousand words and somewhere you will find the answer to all problems.

Want the design of a stardrive engine? Immortality? The theory of perfect governance?

It's all in there somewhere.

---

How many essays though? Apparently the average educated speaker of English knows about 40,000 words. So for our first estimate, we could simply raise 40,000 to the power of 1,000 .. but most of those 104,602 essays would be wildly ungrammatical. We can do better.

I reviewed a sample text: the introductory quote in Peter Seibel's "Practical Common Lisp".

The first five sentences comprised 100 words in total which broke down into:
• nouns: 20%
• verbs: 15%
• others: 55%
A certain amount of hand-wavy rounding of course. Assume we adopt the very restrictive constraint of exactly one syntactic structure for the entire set of essays, then the total number reduces to a product of:
(number-of-English-words-in-category) (number-of-words-of-this-category-in-essay)
or,
8,000200 * 6,000150 * 4,000100 * 22,000550 = 104,092
That's still a big number*. Suppose only one 'essay' in a billion was semantically sensible and we could read one essay per second. That's 104,083 seconds .. or 3 * 104,066 billion years.

The merits of a compact notation.

---

Exhaustive search through the space of all possible candidates isn't a very good way of proceeding. And this has important implications for DARPA's third wave - contextual AI - which I wrote about previously.

In his excellent exposition (YouTube), John Launchbury highlighted the very large number of training instances needed to force convergence for today's artificial neural networks. By comparison, children learn new concepts from very few examples.

John Launchbury's proposed solution was - correctly - to identify additional constraints which might dramatically collapse the search space. His chosen example showed the benefits of adding the dynamics of handwriting characters to the resultant bitmaps normally used for training. It turns out that if you consider how the image might have been created, it makes recognition a lot easier.

It's not hard to identify the extra constraints about the world which children use. They interact with new objects, touch them, throw them, bite them and try to break them. Thus are acquired notions of 3D structure, composition and texture to augment what their visual systems are telling them.

I really do think that a high priority should be given to embodied robotics in the next wave of AI research.

---

Another example John Launchbury discussed was the Microsoft Internet-chatbot "Tay".

Apparently this was the least-offensive tweet Launchbury could find. But what would an AI have to know about contemporary mores to self-reject statements like that?

For extra credit, discuss the 'situated cognition' thesis that only through active and corporeal participation in the social world can one truly understand social concepts.

Particularly emotionally-charged ones.

---

* Since
(i)  I don't consider all the syntactically-permissible permutations of the ways in which nouns, adjectives, verbs and others could be mixed up in the thousand words, while

(ii)  the size of the 'others' vocabulary is likely to be way smaller than 22,000 (so if, for example, the 'others' vocabulary size was 2,200, this would reduce the overall essay-set size by a factor of 10550 - a distinction, however, without a practical difference),
this calculation counts as pretty bogus. I only wanted to demonstrate, however, that no matter how you cut it, the numbers involved are simply ginormous.