MICHAEL TIER

Tags

The Picture Problem and the Promise of AI in Radiology

(and why I built a “second set of eyes” for mammograms at IBM)

We like to believe pictures tell the truth. A satellite photo reveals the battlefield, a weather map forecasts the storm, an X-ray illuminates what the body is hiding. But in breast cancer screening, the picture is more like a riddle. Mammograms are exquisitely detailed and maddeningly ambiguous at the same time. That paradox sits at the heart of how I came to build an AI radiology advisor at IBM—and why today’s systems from Google and others finally feel ready to deliver on the promise.

The “polar bear in a snowstorm” problem

Malcolm Gladwell captured this paradox memorably in The New Yorker two decades ago. Reporting from Memorial Sloan Kettering, he relayed radiologist David Dershaw’s vivid observation: put a small tumor in the fatty (dark) part of a breast and you can see it; bury that same tumor in dense tissue (which also appears white) and it can vanish—like a polar bear in a snowstorm. Gladwell’s point wasn’t that pictures lie; it was that interpretation is hard when the signal and the background share the same color. The limits of looking can mislead even the most skilled observers. (The New Yorker)

Those limits show up in the statistics that matter to patients. Screening mammograms miss a meaningful fraction of cancers (false negatives), especially in dense breasts, and they also trigger many alarms that turn out not to be cancer (false positives), which means anxiety, callbacks, and biopsies. The American Cancer Society explains both problems clearly: density both masks cancers and nudges up risk, while false-positive workups are common over a decade of annual screening. (American Cancer Society)

Here’s the twist that complicates the story even more: expert radiologists also develop an almost blink-like intuition. In controlled experiments, they can sense that “something is off” after a half-second glimpse—before they can say what or where. That instantaneous “gist” signal isn’t magic; it’s pattern recognition, honed by thousands of cases. But intuition, too, is imperfect—especially when fatigue, workload, and density conspire to hide the polar bear. (PMC)

When Watson met radiology (and what we learned)

Fresh off Jeopardy!, IBM set out to bring AI to medicine. At Watson Health, our imaging teams (built in part on IBM’s acquisition of Merge Healthcare) showed early “advisor” concepts at RSNA 2016: systems to pre-highlight suspicious regions, summarize priors and patient context, and nudge against cognitive biases—a second set of eyes that never tires. My remit: help build and integrate that advisor for breast imaging as part of our broader oncology efforts, so radiologists could spend attention where it mattered most. (IBM UK Newsroom)

Some things worked well. Radiologists responded to tools that quietly pulled forward relevant priors, structured history, and literature—context they needed but rarely had time to assemble. Other things were harder. Integrating into real workflows, earning trust case by case, proving outcomes and economics across sites—those were not “AI problems,” they were deployment problems. The broader Watson Health story has been told, warts and all: we overreached in places, and we learned that clinical impact requires less showmanship and more plumbing—data quality, governance, validation, and UX. Those lessons now shape how I lead AI/data programs for clients. (IEEE Spectrum)

The cavalry arrives: Google/Alphabet’s breast-imaging AI

By 2020, deep learning matured and Google Health (with academic partners) published a landmark Nature paper. Trained on large UK and US datasets, their model reduced false positives and false negatives versus historical radiologist reads—about 5.7% fewer false positives and 9.4% fewer false negatives in the U.S. test set, with smaller but still positive effects in the UK. That’s not techno-utopia; it’s a concrete shift in the trade-off that matters to women—fewer misses; fewer scares. (Nature)

Crucially, Google moved from paper to practice. In late 2022 it licensed the model to iCAD, whose breast-imaging software is installed in thousands of clinics, including as an independent reader in double-reading workflows abroad. The design is collaborative: AI acts as another pair of eyes, not a replacement—exactly the role radiologists and patients will accept when real lives are on the line. (blog.google)

What pictures can’t tell you—and what good AI should

If you step back, the arc is surprisingly human. We didn’t fail in the 2010s because convolutional nets were too weak. We stumbled because the plumbing wasn’t ready: fragmented data, sparse labels, brittle integrations, no continuous monitoring, thin change-management. The technology improved, yes—but the bigger gains came when teams paired strong models with strong data infrastructure and pragmatic product thinking.

That’s the blueprint I use now when leaders ask me to build their “Watson, but it actually ships”:

  • Start with a precise failure mode. For mammography it was: “Don’t miss small lesions in dense tissue; cut unnecessary callbacks.” Everywhere else, define the polar bear. (Nature)
  • Design for augmentation. Put AI in the loop where it reduces cognitive load (pre-read triage, concordance checks, prioritized worklists), not where it demands blind trust. (blog.google)
  • Invest in the boring parts. Data provenance, labeling policy, bias audits, drift monitoring, A/B evaluation on real workflows. That’s how you earn clinician—and regulator—confidence. (IEEE Spectrum)

A tale of two truths

Gladwell’s old essay wasn’t really about mammograms. It was about how pictures seduce us—and how we need help interpreting them. Radiologists don’t lack skill; they carry the burden of ambiguity. AI doesn’t bring certainty; it brings consistency and context. Put the two together and you get something closer to the truth than either alone. (The New Yorker)

In practice, that looks like a worklist where suspicious “normals” are quietly re-ranked for a second look; a prior-comparison panel that pre-computes subtle asymmetries; a density-aware threshold that’s stricter when masking risk is high. It looks like fewer sleepless nights for patients called back unnecessarily, and more early cancers found while they’re most treatable. That’s not science fiction—it’s happening now. (Nature)

Why this matters beyond medicine (and why hire me)

If you run a bank, a logistics network, or a media platform, you have your own mammogram problem: images, events, or transactions that look the same until they don’t—rare but pivotal cases drowned in a blizzard of normal. The lesson from radiology travels well:

  • Treat AI as a second reader that catches what humans miss and filters what they shouldn’t see.
  • Build the data backbone first—governance, lineage, feedback loops—so models improve with every case.
  • Measure success in human terms: fewer misses, fewer false alarms, and minutes given back to your experts.

At IBM, I learned these lessons the hard way, leading teams that pushed AI into the clinical trenches. Today, as a fractional Head of AI & Data Infrastructure, I help organizations apply them without the scars. If you want to turn your picture problem into a product that users trust, I’d love to help.


Sources & further reading

  • Malcolm Gladwell, “The Picture Problem” (The New Yorker, 2004); summary notes referencing his visit with Dr. David Dershaw at MSKCC. (The New Yorker)
  • American Cancer Society, “Limitations of Mammograms,” and ACS breast-density patient materials. (American Cancer Society)
  • Evans et al., “A half-second glimpse often lets radiologists identify breast cancer cases,” 2016; and related “gist” studies. (PMC)
  • McKinney et al., “International evaluation of an AI system for breast cancer screening,” Nature, 2020; Nature Medicine commentary on AI for screening. (Nature)
  • Google Health blog & iCAD press on commercial deployment and independent-reader use. (blog.google)
  • IEEE Spectrum, “How IBM Watson Overpromised and Underdelivered on AI Health Care,” context on deployment realities. (IEEE Spectrum)


Author: Michael Tier — AI systems builder, ex-Watson Health imaging team. I help organizations stand up trustworthy AI and the data infrastructure that powers it. Reach me at michaeltier.ai.

Leave a comment

All fields marked with an asterisk (*) are required