Sunday, November 11, 2018

A second review of "The Master Algorithm" - Pedro Domingos

Amazon link

This is a guest review from Dr Roy Simpson.

---

Review of The Master Algorithm (by Pedro Domingos) 

By Dr. R. Simpson

This book provides both a history and a visionary project in machine learning by a leading professor in the field. The author writes fluently and well providing an informative book, which can be worth re-reading if one is interested in the details of machine learning techniques, as well as in (re-)evaluating his ideas about the future of the subject.

I was brought to this book after reading two other recent books on the subject of Algorithms: Weapons of Math Destruction, by Cathy O'Neil and the recent Hello World by Hannah Fry. Both of these books discuss the ongoing issues associated with the application of Algorithms, Big Data and Machine Learning to contemporary society – the latter book being newer and most relevant for UK audiences. These books recommended The Master Algorithm (2015) as a book for a deeper understanding of machine learning.

Much of this book is eminently quotable, and its text provides good introductions, for example here is from the first page of the Prologue:

You may not know it, but machine learning is all around you. When you type a query into a search engine, it's how the engine figures out which results to show you (and which ads as well). When you read your email, you don't see most of the spam, because machine learning filtered it out. Go to Amazon.com to buy a book or Netflix to watch a video, and a machine-learning system helpfully recommends some you might like...

Traditionally, the only way to get a computer to do something – from adding two numbers to flying an airplane - was to write down and algorithm explaining how, in painstaking detail. But machinelearning algorithms, also known as learners, are different: they figure it out on their own, by making inferences from data. And the more data they have, the better they get. Now we don't have to program computers; they program themselves.

It's not just in cyberspace, either: your whole day, from the moment you wake up to the moment you fall asleep, is suffused with machine learning.

The introductory chapters continue with more aspects of machine learning making some interesting points. For example the above passage indicates a “culture shift” within the programming world. In a later section it is noted that Microsoft has some difficulties with the new world, because its programmers are just that and its main products are produced in the traditional way, whereas Google is more of a machine learning organisation, with its main products produced the machine learning way.

He introduces the metaphor that whereas traditional programming is more like a manual (at best industrial) process, machine learning is more like farming – prepare the ground, then sit back and watch the systems (i.e. commercial products) grow themselves.

The author is aware that this field is not “General AI” - a topic often discussed in this blog to which I shall return at the end of this review. However we read that within this subfield of AI the traditional opponents are “Knowledge Engineers”. Knowledge Engineers hold (or at least once held) the view that systems which contain “knowledge” need to have that knowledge typed into them, and more generally be programmed in the traditional way.

The classic example was Cyc - a massive common sense knowledge based system over which programmers have been typing in “common sense rules” for several decades now.

The main counter to this view presented in this book is the matter of scale: knowledge engineers could work with thousands of rules; whereas machine learning will generate many millions of rules. (The argument against this knowledge engineering viewpoint held by the Agent Theory/AGI community has been different, and concerns its narrowness of scope. Agent Theorists might have used a “narrowness of scope” argument against early machine learning too, but the scalability of these techniques has won the commercial argument  - at least for now - though not necessarily the conceptual argument, as discussed later in this review.)

The primary content of the book is a review and overall analysis, from a modern machine learning perspective, of the five main strands of machine learning during the history of AI: Symbolism, Connectionism, Evolutionary Programming, Bayesian Networks, and Analogical Reasoning.

His purpose in all of this is to identify the principles involved and describe the essence of the technique and identify the best algorithm that the given technique has provided to the machine learning community. From there he moves to the main objective of the book: the development of a “Master (Learning) Algorithm” which incorporates all the best of the previous techniques and which is the subject of his own research group. This Master Algorithm would then be able to optimally learn anything .. .

A brief summary of each (with quotes from the book):

Symbolism

The symbolist's core belief is that all intelligence can be reduced to manipulating symbols. Whereas deductive reasoning is about going from axioms to conclusions, the learning aspect requires inductive reasoning: going from conclusions to axioms. Thus Hume's problem is discussed and the master algorithm here becomes “inverse deduction”.

Inverse deduction is like a super-scientist systematically looking at the evidence, considering possible inductions, collating the strongest, and using those along with other evidence to construct yet further hypotheses – all at the speed of computers. Yet this inverse deduction has limitations and issues, so we move to Connectionism.

Connectionism

How does your brain learn? Hebb's rule (from 1949) has become the cornerstone of connectionism, and is about neuron firing: “neurons that fire together wire together”. This history in AI begins with the Perceptron (from the 1950s). This was an electromechanical learning machine, based on a model of neurons, which had some basic classification skills (e.g is this a picture of a door or not?).

In the late 1960s it was proven that its classification skills were too limited to be of much generality and this approach to AI learning suffered a near total set-back (to the delight of the knowledge engineers of the time, due to funding competition).

It was not until the 1980s that physics inspired alternatives were introduced, such as the Boltzmann machine, which introduced probabilities into the field, and brought in techniques from statistical physics (ie thermodynamics) – so for a period the notion of “temperature” was important for a neural network learning system!

Although the introduction of probabilities has lasted, it is not clear what has happened to "temperature" in machine learning. It later transpired also that none of this physics theory was necessary to overcome the original Perceptron limitations. Nevertheless more mathematical techniques emerge from this era such as calculations in hyperspace, with associated weighted functions, convergence metrics, etc.

The main algorithm eventually identified from Connectionism is backpropagation, whose refinements are at the core of today's learning systems. However, the author raises an intriguing question: Is everything we “know” actually learned by our neurons? Has not evolution played a part too?

Evolutionary Learning

As an introduction to this chapter the author tells the following fantasy story:

Robotic Park is a massive robot factory surrounded by ten thousand square miles of jungle, urban and otherwise. Ringing that jungle is the tallest, thickest wall ever built, bristling with sentry posts, searchlights, and gun turrets. The wall has two purposes: to keep trespassers out and the park's inhabitants – millions of robots battling for survival and control of the factory – within.

The winning robots get to spawn, their reproduction accomplished by programming the banks of 3D printers inside. Step-by-step, the robots become smarter, faster – and deadlier. Robotic Park is run by the US Army, and its purpose is to evolve the ultimate soldier.

Needless to say this story brings out a large number of issues and concerns. The author seems to be sanguine about the dangers here, although modern “AI Ethics” movements (which included the late Stephen Hawking and Elon Musk) are concerned about this type of development. Towards the end of the book, the author actually suggests that working on “AI Ethics” could itself be a growth industry for displaced humans in the new era – robots and AI will have a lot of (human) ethics to learn!

The author is using this story here to dramatically introduce the AI Learning techniques inspired by Darwinian evolution: genetic algorithms and genetic programming. These techniques are overviewed, and some issues identified. However this reviewer is intrigued by the wider point that is being made here, and has an alternative way of expressing a related idea. The overall point that the author is making can be summarised by this formula:

   Learning == Agent Learning + Environment Learning + (Environment → Agent transfer)

In other words, when we ask “how does that small brain learn all this stuff?” - the answer is that the small brain has not had to do all the learning implicit in its actions. This viewpoint is implied also by the Chomsky view of language acquisition (Chomsky has been another critic of the traditional approach to machine learning – the author hopes that his wider “Master Algorithm” approach meets Chomsky's concerns.)

There are again several limitations to any master algorithm provided by what we could also call the Darwinian algorithm. Chief amongst them is the fact that the Darwinian algorithm (as currently understood) tends to find “suboptimal” solutions, not optimal solutions – so we move on to Bayesian theory.

Bayesian Networks

The path to optimal learning begins with a formula that many people have heard of: Bayes Theorem. But here we'll see it in a whole new light and realize that it's vastly more powerful than you'd guess from its everyday uses.

At heart, Bayes' theorem is just a simple rule for updating your degree of belief in a hypothesis when you receive new evidence: if the evidence is consistent with the hypothesis, the probability of the hypothesis goes up; if not, it goes down.

Bayes Theorem uses conditional probabilities. (P(A|B) is the probability that A happens given that B has happened/is assumed) and there is an associated inference system. Similar to deductive logic (e.g. the Prolog-based resolution systems sometimes discussed in this blog) the inference system allows the deduction of probabilistic conclusions and the management of probabilistic assumptions. This is all wrapped into a network structure amongst assumptions, and apparently it has been discovered that there is an isomorphism between the probabilities and weights often used in Artificial Neural Net models and the probabilities used in Bayesian Networks.

The success of this Bayesian approach arises directly from the quantity of data now available, making the probabilities and conditional probabilities very accurately determinable from millions upon millions of data points (pre-Internet such probabilities would have been input by researchers by hand as guesses, resulting in unconvincing performance and results from such probabilistic systems).

This has motivated much of the idea behind a complete Master Algorithm for learning. This has left open two remaining unification tasks, of which the hardest has been the unification of logic and probability; the other has been to incorporate what to do when there is essentially no data, only analogies to existing data.

Analogy Reasoning

Methane and Methanol have very similar chemical structure; but they are not identical and have some big differences since one is a (room temperature) gas, the other a liquid. So if your system knew much about one, what can it deduce about the other? Much of science and business-services-work operates by analogy: no two customers are exactly the same. We manage to cope, so what are the principles involved?

Various classification ideas have been developed in this strand of AI, and a very powerful technique called a Support Vector Machine was apparently the most powerful AI technique around the turn of the century. Only in recent years has it been superseded by Artificial Neural Nets for the top learning slot (although Artificial Neural Nets don't do analogical reasoning as such).

The Master Algorithm - Alchemy

So putting this all together has been the research project of Prof. Domingos, and he has developed a mathematical framework called Markov Logic Networks and a corresponding (open source) software system called Alchemy. The final chapters discuss the general steps required to produce a Master Algorithm from all the above ingredients. He does not describe the current form of Alchemy (or MLN) as the Master Algorithm, but uses it to suggest that this goal is feasible.

One feature he seems to claim Alchemy is lacking is the ability to explain itself fully, also there may still be optimisation issues. There are also some discussions about the social aspects of all this, suggesting that users should form “trusted data unions” to hold the value of their data (a bit like banks holding their money), rather than the current practice of just handing everything over to the big corporations who then benefit from any commercial consequences (and as UK residents are aware, don't pay appropriate tax either).

Analysis: Agent Theory and Computational Mathematics

I shall end this review with a few remarks about this idea of a “Master Learning Algorithm” from the perspective of Agent theory (which is often discussed in this blog); and some ideas about Computational Mathematics (my own interest).

Agent theory takes a more holistic, biologically realistic and physically realist view of AI than one finds in stand-alone techniques like machine learning. Although these machine learning techniques have become (and will remain for another decade at least) at the centre of a business and social revolution, in aiming for such a high goal as a Master Algorithm which can learn everything – and do so optimally - one has to ask if boundaries will eventually be reached due to insufficient attention to agent theoretic issues.

For example, how and in what way, is a biological cell a learning system? Likewise the mysteries of neurological biochemistry are not resolved in an agreed way. Even aspects of the Darwinian algorithm are not fully understood. So it is quite possible that nature has more to teach us about learning.

On the subject of Computational Mathematics and AI much can also be said. A famous critic of the entire AI project (at least as seen as a branch of Computer Science) has been Professor Penrose with his books (from 1989 and 1994). Since that time more results have appeared in the foundations of logic and mathematics which can be viewed as clarifying Penrose's position.

For example using the results of an ongoing project known as ReverseMathematics, this reviewer suspects that much of Applied Engineering Mathematics is not actually computable in the Turing sense. However it nearly is, so many effects are “subtle” - although the inability to predict weather systems accurately beyond (say) five days; turbulence theory; and mathematical puzzles related to physics models suggest these “weakly non-computable” effects may yet be important.

The author himself recognises that an outstanding and unsolved mathematical problem known as the NP-completeness problem is significant to the subject of AI. I shall leave the last words to the author:

The purpose of AI systems is to solve NP-complete problems, which may take exponential time, but the solutions can always be checked efficiently. We should therefore welcome with open arms computers that are vastly more powerful than our brains, safe in the knowledge that our job is exponentially easier than theirs.

---

My own review was posted back in June 2016 and can be found here.

No comments:

Post a Comment

Comments are moderated. Keep it polite and no gratuitous links to your business website - we're not a billboard here.