This is a guest review from Dr Roy Simpson.
---
Review of The Master Algorithm (by Pedro Domingos)
By Dr. R. Simpson
This book provides both a history and a visionary project in machine learning by a leading
professor in the field. The author writes fluently and well providing an informative book,
which can be worth re-reading if one is interested in the details of machine learning
techniques, as well as in (re-)evaluating his ideas about the future of the subject.
I was brought to this book after reading two other recent books on the subject of
Algorithms:
Weapons of Math Destruction, by Cathy O'Neil and the recent
Hello World by
Hannah Fry. Both of these books discuss the ongoing issues associated with the application
of Algorithms, Big Data and Machine Learning to contemporary society – the latter book
being newer and most relevant for UK audiences. These books recommended
The Master
Algorithm (2015) as a book for a deeper understanding of machine learning.
Much of this book is eminently quotable, and its text provides good introductions, for
example here is from the first page of the Prologue:
“
You may not know it, but machine learning is all around you. When
you type a query into a search engine, it's how the engine figures
out which results to show you (and which ads as well). When you
read your email, you don't see most of the spam, because machine
learning filtered it out. Go to Amazon.com to buy a book or
Netflix to watch a video, and a machine-learning system helpfully
recommends some you might like...
Traditionally, the only way to get a computer to do something –
from adding two numbers to flying an airplane - was to write down
and algorithm explaining how, in painstaking detail. But machinelearning
algorithms, also known as learners, are different: they
figure it out on their own, by making inferences from data. And
the more data they have, the better they get. Now we don't have to
program computers; they program themselves.
It's not just in cyberspace, either: your whole day, from the
moment you wake up to the moment you fall asleep, is suffused with
machine learning.”
The introductory chapters continue with more aspects of machine learning making some
interesting points. For example the above passage indicates a “culture shift” within the
programming world. In a later section it is noted that Microsoft has some difficulties with
the new world, because its programmers are just that and its main products are produced
in the traditional way, whereas Google is more of a machine learning organisation, with
its
main products produced the machine learning way.
He introduces the metaphor that
whereas traditional programming is more like a manual (at best industrial) process,
machine learning is more like farming – prepare the ground, then sit back and watch the
systems (i.e. commercial products) grow themselves.
The author is aware that this field is not “General AI” - a topic often discussed in this blog
to which I shall return at the end of this review. However we read that within this subfield of
AI the traditional opponents are “Knowledge Engineers”. Knowledge Engineers hold (or at
least once held) the view that systems which contain “knowledge” need to have that
knowledge typed into them, and more generally be programmed in the traditional way.
The classic example was
Cyc - a massive common sense knowledge based system over
which programmers have been typing in “common sense rules” for several decades now.
The main counter to this view presented in this book is the matter of scale: knowledge
engineers could work with thousands of rules; whereas machine learning will generate
many millions of rules. (The argument against this knowledge engineering viewpoint held
by the Agent Theory/AGI community has been different, and concerns its narrowness of
scope. Agent Theorists might have used a “narrowness of scope” argument against early
machine learning too, but the scalability of these techniques has won the commercial
argument - at least for now - though not necessarily the conceptual argument, as
discussed later in this review.)
The primary content of the book is a review and overall analysis, from a modern machine
learning perspective, of the five main strands of machine learning during the history of AI:
Symbolism, Connectionism, Evolutionary Programming, Bayesian Networks, and
Analogical Reasoning.
His purpose in all of this is to identify the principles involved and
describe the essence of the technique and identify the best algorithm that the given
technique has provided to the machine learning community. From there he moves to the
main objective of the book: the development of a “Master (Learning) Algorithm” which
incorporates all the best of the previous techniques and which is the subject of his own
research group. This Master Algorithm would then be able to optimally learn anything .. .
A brief summary of each (with quotes from the book):
Symbolism
The symbolist's core belief is that all intelligence can be
reduced to manipulating symbols.
Whereas deductive reasoning is about going from axioms to conclusions, the learning
aspect requires inductive reasoning: going from conclusions to axioms. Thus Hume's
problem is discussed and the master algorithm here becomes “inverse deduction”.
Inverse deduction is like a super-scientist systematically looking
at the evidence, considering possible inductions, collating the
strongest, and using those along with other evidence to construct
yet further hypotheses – all at the speed of computers.
Yet this inverse deduction has limitations and issues, so we move to Connectionism.
Connectionism
How does your brain learn? Hebb's rule (from 1949) has become the cornerstone of
connectionism, and is about neuron firing: “neurons that fire together wire together”.
This history in AI begins with the Perceptron (from the 1950s). This was an electromechanical
learning machine, based on a model of neurons, which had some basic
classification skills (e.g is this a picture of a door or not?).
In the late 1960s it was proven
that its classification skills were too limited to be of much generality and this approach to AI
learning suffered a near total set-back (to the delight of the knowledge engineers of the
time, due to funding competition).
It was not until the 1980s that physics inspired alternatives were introduced, such as the
Boltzmann machine, which introduced probabilities into the field, and brought in
techniques from statistical physics (ie thermodynamics) – so for a period the notion of
“temperature” was important for a neural network learning system!
Although the introduction of probabilities has lasted, it is not clear what has happened to "temperature" in machine learning. It later transpired also that none of this physics theory
was necessary to overcome the original Perceptron limitations. Nevertheless more
mathematical techniques emerge from this era such as calculations in hyperspace, with
associated weighted functions, convergence metrics, etc.
The main algorithm eventually identified from Connectionism is backpropagation, whose
refinements are at the core of today's learning systems.
However, the author raises an intriguing question: Is everything we “know” actually learned
by our neurons? Has not evolution played a part too?
Evolutionary Learning
As an introduction to this chapter the author tells the following fantasy story:
Robotic Park is a massive robot factory surrounded by ten thousand
square miles of jungle, urban and otherwise. Ringing that jungle
is the tallest, thickest wall ever built, bristling with sentry
posts, searchlights, and gun turrets. The wall has two purposes:
to keep trespassers out and the park's inhabitants – millions of
robots battling for survival and control of the factory – within.
The winning robots get to spawn, their reproduction accomplished
by programming the banks of 3D printers inside. Step-by-step, the
robots become smarter, faster – and deadlier. Robotic Park is run
by the US Army, and its purpose is to evolve the ultimate soldier.
Needless to say this story brings out a large number of issues and concerns. The author
seems to be sanguine about the dangers here, although modern “AI Ethics” movements
(which included the late Stephen Hawking and Elon Musk) are concerned about this type of
development. Towards the end of the book, the author actually suggests that working on
“AI Ethics” could itself be a growth industry for displaced humans in the new era – robots
and AI will have a
lot of (human) ethics to learn!
The author is using this story here to dramatically introduce the AI Learning techniques
inspired by Darwinian evolution: genetic algorithms and genetic programming. These
techniques are overviewed, and some issues identified. However this reviewer is intrigued
by the wider point that is being made here, and has an alternative way of expressing a
related idea. The overall point that the author is making can be summarised by this
formula:
Learning == Agent Learning + Environment Learning + (Environment → Agent transfer)
In other words, when we ask “how does that small brain learn all this stuff?” - the answer is
that the small brain has
not had to do all the learning implicit in its actions. This viewpoint
is implied also by the Chomsky view of language acquisition (Chomsky has been another
critic of the traditional approach to machine learning – the author hopes that his wider
“Master Algorithm” approach meets Chomsky's concerns.)
There are again several limitations to any master algorithm provided by what we could
also call the Darwinian algorithm. Chief amongst them is the fact that the Darwinian
algorithm (as currently understood) tends to find “suboptimal” solutions, not optimal
solutions – so we move on to Bayesian theory.
Bayesian Networks
The path to optimal learning begins with a formula that many
people have heard of: Bayes Theorem. But here we'll see it in a
whole new light and realize that it's vastly more powerful than
you'd guess from its everyday uses.
At heart, Bayes' theorem is just a simple rule for updating your
degree of belief in a hypothesis when you receive new evidence: if
the evidence is consistent with the hypothesis, the probability of
the hypothesis goes up; if not, it goes down.
Bayes Theorem uses conditional probabilities. (P(A|B) is the probability that A happens
given that B has happened/is assumed) and there is an associated inference system.
Similar to deductive logic (e.g. the Prolog-based resolution systems sometimes discussed
in this blog) the inference system allows the deduction of
probabilistic conclusions and the
management of
probabilistic assumptions. This is all wrapped into a network structure
amongst assumptions, and apparently it has been discovered that there is an isomorphism
between the probabilities and weights often used in Artificial Neural Net models and the
probabilities used in Bayesian Networks.
The success of this Bayesian approach arises
directly from the quantity of data now available, making the probabilities and conditional
probabilities very accurately determinable from millions upon millions of data points (pre-Internet such probabilities would have been input by researchers by hand as guesses,
resulting in unconvincing performance and results from such probabilistic systems).
This has motivated much of the idea behind a complete Master Algorithm for learning.
This has left open two remaining unification tasks, of which the hardest has been the
unification of logic and probability; the other has been to incorporate what to do when
there is essentially no data, only analogies to existing data.
Analogy Reasoning
Methane and Methanol have very similar chemical structure; but they are not identical and
have some big differences since one is a (room temperature) gas, the other a liquid. So if
your system knew much about one, what can it deduce about the other? Much of science and business-services-work operates by analogy: no two customers are exactly the same.
We manage to cope, so what are the principles involved?
Various classification ideas have been developed in this strand of AI, and a very powerful
technique called a Support Vector Machine was apparently the most powerful AI technique
around the turn of the century. Only in recent years has it been superseded by Artificial
Neural Nets for the top learning slot (although Artificial Neural Nets don't do analogical
reasoning as such).
The Master Algorithm - Alchemy
So putting this all together has been the research project of Prof. Domingos, and he has
developed a mathematical framework called Markov Logic Networks and a corresponding
(open source) software system called
Alchemy. The final chapters discuss the general
steps required to produce a Master Algorithm from all the above ingredients. He does not
describe the current form of Alchemy (or MLN) as
the Master Algorithm, but uses it to
suggest that this goal is feasible.
One feature he seems to claim Alchemy is lacking is the
ability to explain itself fully, also there may still be optimisation issues.
There are also some discussions about the social aspects of all this, suggesting that users
should form “trusted data unions” to hold the value of
their data (a bit like banks holding
their money), rather than the current practice of just handing everything over to the big
corporations who then benefit from any commercial consequences (and as UK residents
are aware, don't pay appropriate tax either).
Analysis: Agent Theory and Computational Mathematics
I shall end this review with a few remarks about this idea of a “Master Learning Algorithm”
from the perspective of Agent theory (which is often discussed in this blog); and some
ideas about Computational Mathematics (my own interest).
Agent theory takes a more holistic, biologically realistic and physically realist view of AI than
one finds in stand-alone techniques like machine learning. Although these machine
learning techniques have become (and will remain for another decade at least) at the
centre of a business and social revolution, in aiming for such a high goal as a Master
Algorithm which can learn everything – and do so optimally - one has to ask if boundaries
will eventually be reached due to insufficient attention to agent theoretic issues.
For
example, how and in what way, is a biological cell a learning system? Likewise the
mysteries of neurological biochemistry are not resolved in an agreed way. Even aspects of
the Darwinian algorithm are not fully understood. So it is quite possible that nature has
more to teach us about learning.
On the subject of Computational Mathematics and AI much can also be said. A famous
critic of the entire AI project (at least as seen as a branch of Computer Science) has been
Professor Penrose with his books (from 1989 and 1994). Since that time more results have
appeared in the foundations of logic and mathematics which can be viewed as clarifying
Penrose's position.
For example using the results of an ongoing project known as
ReverseMathematics, this reviewer suspects that much of Applied Engineering Mathematics is not
actually computable in the Turing sense. However it nearly is, so many effects are “subtle”
- although the inability to predict weather systems accurately beyond (say) five days;
turbulence theory; and mathematical puzzles related to physics models suggest these
“weakly non-computable” effects may yet be important.
The author himself recognises that an outstanding and unsolved mathematical problem
known as the
NP-completeness problem is significant to the subject of AI. I shall leave the
last words to the author:
The purpose of AI systems is to solve NP-complete problems, which
may take exponential time, but the solutions can always be checked
efficiently. We should therefore welcome with open arms computers
that are vastly more powerful than our brains, safe in the
knowledge that our job is exponentially easier than theirs.
---
My own review was posted back in June 2016 and can be found
here.