Tuesday, May 02, 2017

Real doctoring: more about perception than knowledge

Expert systems were big in the 1970s and 80s, the first golden age of AI. The paradigmatic system was MYCIN, which diagnosed bacteria causing severe infections such as bacteremia and meningitis, and recommended antibiotics. MYCIN "proposed an acceptable therapy in about 69% of cases, which was better than the performance of infectious disease experts who were judged using the same criteria."

MYCIN was never used.

"...  every blue box is a perceptual task"

In hindsight this held some lessons for us about the nature of real-world competence. MYCIN had world-class book-learning but no ability to interface conversationally with actual patients or examine test results. It couldn't do the effective application of assimilated knowledge, something which takes experts in all domains years of experience (as well as eyes, ears, touch and conversational ability).


A little gem I get every week in my mailbox: Denny Britz's "The Wild Week in AI".

This week's choice snippet, "The End of Human Doctors – Understanding Medicine", where Luke Oakden-Rayner writes this.
"Story time! Decades ago everyone thought being a medical expert was about knowledge, and computer scientists tried to automate the knowledge base of medicine. The earliest computer system that outperformed doctors was MYCIN, an AI system that was developed in Stanford in the mid 1970s.

"It was knowledge based, codifying the process of identifying the likely cause of infection and choosing a treatment for it in a set of 600 rules. ...

"With the benefit of 40 years of post-MYCIN experience, we can say a couple of things about it:
  • MYCIN worked really well. The system identified an acceptable therapy more often than the infectious diseases experts that they compared it with.

  • MYCIN was never used in clinical practice.
"The standard explanation for this failure to make a clinical impact was that MYCIN required time-consuming manual data input and that the IT infrastructure didn’t exist in the 70s to make use of it.

"But this argument isn’t very satisfying, because those challenges have long been solved. We can automatically mine electronic health records for phrases or keywords to populate our systems, and we certainly aren’t bottlenecked by computation or electronic infrastructure anymore. But we still don’t use MYCIN, or almost any of its successors.

"To understand my take on the clinical failure of MYCIN, imagine going to the doctor for a cough. Your doctor asks you some questions, listens to your chest, takes your temperature and your pulse rate. They send off a sputum sample or blood test. They might even get a chest x-ray. And after all that, how long does it take them to decide on an antibiotic?

"Seconds, probably. Most doctors would barely register the decision as a conscious choice.

"The fact is, the knowledge and decision making part of medicine becomes automatic for doctors. There is a long history of cognitive science research in medicine ... and it shows that senior doctors don’t really think about many of their decisions. They engage in an experience based form of pattern matching. In fact, senior doctors often perform worse if you slow them down and force them to think. ...

"The key point here is that the part of the medical process that MYCIN automated had a negligible cost (measured in time spent). Saving time on a process that takes between seconds and minutes just isn’t worth the effort of overhauling our current systems.

"But all of this does raise an interesting question: what are diagnostic doctors spending all their time on, if not “thinking”?

"Perception. ...

"The revolution that is happening in artificial intelligence can be understood in one idea; deep learning is really good at human-like perception. In fact, deep learning systems now perform around human level in a wide range of perceptual tasks, like visual object recognition and voice transcription.

"And because perception has such a big role in expertise we are achieving superhuman performance in “expert tasks”, like driving cars or playing complex games like Go. It turns out that these tasks were all bottle-necked by perception, not decision making.

"Machine learning guru Andrew Ng has a rule of thumb – “If a typical human can do it in 1 second, so can deep learning”. My experience in medical applications suggests a slight restatement of this rule:
“If an expert can perform a task in less than a few seconds, it can be automated.”
"... Your local family doctor still spends a lot of time looking, listening and feeling. A surgical specialist does too, and relies a lot on radiology and pathology tests. A psychiatrist learns a lot from sight and sound.

So we might be looking at a technology that can do parts of the work of most types of doctors, even if it can’t replace all of the jobs."
I've excerpted quite a bit here - do read the whole article.

Visiting the family doctor must be one of the least productive things anyone ever does - and I mean this in the economist's sense of productivity. The labour-intensive, manual nature of the consultation means that in the UK patients are allocated a mere ten minutes per session and told they must not mention more than one condition.
"I've got this really painful sore throat doctor, and perhaps I could schedule another appointment sometime about this odd lump?"
I imagine the near-future surgery as a quiet library of cubicles where patients interact in privacy with smart doctor-apps. A nurse helps with recommended tests (interpreted by deep-learning systems in the cloud) and a doctor or two handles referrals.

The result? Perhaps a twenty-fold increase in  productivity.

With apps like Babylon* offering pre-screening, healthcare productivity would be all the greater.

So what's holding things back? The interaction model (chatbot-style) is still tedious and leaden. Without access to the patient's medical history and personal data, the dialogue is lengthy, tedious and repetitive.

Any root-and-branch transformation of healthcare would be capital intensive (money!!) and couldn't be attempted until pilots showed that the new process model was effective, user-friendly and safe. As well as cost-effective. Perhaps doctors, despite their many, frequently-stated grievances, would not be totally supportive. Nevertheless I see glimmerings of change.

Ten years minimum - this is healthcare we're talking about.


* I tried Babylon as well as Your.MD, one of the leading medical apps, and was not impressed: typing a query into Google still outperforms the specialized apps.


  1. One interesting phrase from the quoted article is:
    "deep learning is automating the art rather than the science"

    This helps capture the MYCIN-modern AI (Deep Learning) distinction, and also the fact that there may not be any "science" to be associated with the type of diagnosis produced by Deep Learning.

    1. It's another reminder that implicit, tacit knowledge lies in the performance domain. There's science to be done, but the deep learning system is not it.

      Might as well ask a bird how it flies ... .


Comments are moderated. Keep it polite and no gratuitous links to your business website - we're not a billboard here.