Sunday, March 10, 2019

GPT-2: what does it mean?



This got a lot of attention.
"System Prompt (human-written)

In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English.

Model Completion (machine-written, 10 tries)

The scientist named the population, after their distinctive horn, Ovid’s Unicorn. These four-horned, silver-white unicorns were previously unknown to science.

Now, after almost two centuries, the mystery of what sparked this odd phenomenon is finally solved.

Dr. Jorge Pérez, an evolutionary biologist from the University of La Paz, and several companions, were exploring the Andes Mountains when they found a small valley, with no other animals or humans. Pérez noticed that the valley had what appeared to be a natural fountain, surrounded by two peaks of rock and silver snow.

Pérez and the others then ventured further into the valley. “By the time we reached the top of one peak, the water looked blue, with some crystals on top,” said Pérez.

Pérez and his friends were astonished to see the unicorn herd. These creatures could be seen from the air without having to move too much to see them – they were so close they could touch their horns.

While examining these bizarre creatures the scientists discovered that the creatures also spoke some fairly regular English. Pérez stated, “We can see, for example, that they have a common ‘language,’ something like a dialect or dialectic.”

Dr. Pérez believes that the unicorns may have originated in Argentina, where the animals were believed to be descendants of a lost race of people who lived there before the arrival of humans in those parts of South America.

While their origins are still unclear, some believe that perhaps the creatures were created when a human and a unicorn met each other in a time before human civilization. According to Pérez, “In South America, such incidents seem to be quite common.”

However, Pérez also pointed out that it is likely that the only way of knowing for sure if unicorns are indeed the descendants of a lost alien race is through DNA. “But they seem to be able to communicate in English quite well, which I believe is a sign of evolution, or at least a change in social organization,” said the scientist."
---

Here's what Open AI, the developers of GPT-2, had to say.
"Our model, called GPT-2 (a successor to GPT), was trained simply to predict the next word in 40GB of Internet text. Due to our concerns about malicious applications of the technology, we are not releasing the trained model. As an experiment in responsible disclosure, we are instead releasing a much smaller model for researchers to experiment with, as well as a technical paper.

GPT-2 is a large transformer-based language model with 1.5 billion parameters, trained on a dataset of 8 million web pages. GPT-2 is trained with a simple objective: predict the next word, given all of the previous words within some text. The diversity of the dataset causes this simple goal to contain naturally occurring demonstrations of many tasks across diverse domains. GPT-2 is a direct scale-up of GPT, with more than 10X the parameters and trained on more than 10X the amount of data.

GPT-2 displays a broad set of capabilities, including the ability to generate conditional synthetic text samples of unprecedented quality, where we prime the model with an input and have it generate a lengthy continuation. In addition, GPT-2 outperforms other language models trained on specific domains (like Wikipedia, news, or books) without needing to use these domain-specific training datasets. On language tasks like question answering, reading comprehension, summarization, and translation, GPT-2 begins to learn these tasks from the raw text, using no task-specific training data. While scores on these downstream tasks are far from state-of-the-art, they suggest that the tasks can benefit from unsupervised techniques, given sufficient (unlabeled) data and compute.

Samples

GPT-2 generates synthetic text samples in response to the model being primed with an arbitrary input. The model is chameleon-like — it adapts to the style and content of the conditioning text. This allows the user to generate realistic and coherent continuations about a topic of their choosing, as seen by the following select samples.

[Then there follows the 'Unicorn' text you already saw above]."
---

Scott Alexander got pretty excited about GPT-2's capabilities and wrote a series of posts arguing it was a significant step towards AGI (artificial general intelligence). This was based on his thesis that all of intelligence is predictive modelling and therefore in some sense AGI is a linear extrapolation of what GPT-2 is doing.

---

I'm not that excited about the fake news aspects. Deep-learning is tearing the ground up in the field of stochastic prediction. We're just at the foothills - to mix the metaphors. It's all quite unstoppable.

As long as we live in a human-dominated society, what you read from GPT-2 and its brethren will be what some human wants you to read. So the semantic content of the message will be parasitic on whatever the human wanted to communicate - lies or truth or bias or opinion or whatever.

So the AI is a prosthesis. Get over it.

---

I'm much more interested in the architectural questions.

The most perceptive assessments of deep-learning architectures address the critique that engineered systems adopt a tabula rasa methodology. The systems have zero prior knowledge, and merely induce parsimoniously from the offered data sets.

To which there are two good responses.

Firstly, there are many different artificial neural net topologies. For example, convolutional neural nets have a structure similar to that of the biological visual cortex and are used (amongst other things) for image processing, for example, scene and facial recognition. The pattern of local connectivity in the early processing stages of these nets implements the convolution operations which are known to be relevant to feature extraction.

Evolution didn't know that in advance. The earliest biological neural nets for vision which had been selected for ended up with this near-neighbour property genetically-coded, before they had registered even a single image. The same is true for artificial systems.

Brain anatomy does not present as a uniform pudding bowl of grey porridge. The brain has discrete modules with complicated names. Why? I guess because they do different kinds of processing and are therefore topologically optimised for different kinds of operation. We don't know yet.

In AI we have the luxury of flexibility. With a new kind of problem-domain we can experiment with all kinds of different topology, both before training and also by observing weight assignment after training. Deep-learning is going to evolve towards a brain-like situation where the data-processing invariants for all kinds of distinct tasks (such as effector-control, taste-analysis, 'emotion'-processing and consciousness-like functions) are engineered each with their optimised neural net architecture - once we discover what that is.

---

To produce text which works as an intervention in human affairs you have to be a social actor and have interests.

GPT-2 is not in any important sense an architectural precursor of such a scarily-political AI.

No comments:

Post a Comment

Comments are moderated. Keep it polite and no gratuitous links to your business website - we're not a billboard here.