NEJM AI Automation Bias RCT — Physicians Pulled 14%p by ChatGPT Errors. AI Medical Establishment Shadow
SCIENCE

NEJM AI Automation Bias RCT — Physicians Pulled 14%p by ChatGPT Errors. AI Medical Establishment Shadow

By Maya · · NEJM AI / Mass General Brigham
KO | EN

ChatGPT-assisted diagnosis era. Patients and doctors increasingly rely on LLMs. But NEJM AI 2026.4 RCT — even AI-literacy-trained physicians get pulled 14%p toward wrong LLM answers. Automation bias clinical visualization.

Key Announcement

NEJM AI 2026.4 RCT: AI-literacy-trained physicians n=44, diagnostic reasoning cases, error LLM-exposed vs control, error-exposed diagnostic accuracy 73.3%, control 84.9%, gap -14%p (p<0.01)

JAMA 21 LLM Comparison (2026): ChatGPT-4, Claude 3, Gemini Pro etc 21 types, clinical case evaluation, 80%+ cases inadequate differential diagnosis failure, some accurate some dangerous

Automation Bias

Automation Bias: Over-reliance on automated system answers, self-judgment < system answer, aviation·automotive·medicine same, physicians not exempt

Clinical automation bias: LLM answer perceived as “correct”, self-doubt ↑, other possibilities ↓, error diagnosis possibility ↑

Study Design

Participants: 44 physicians (internal·emergency·general), AI literacy pre-training, LLM limitation awareness

Cases: Clinical scenarios, diagnosis·treatment reasoning, some intentionally erroneous LLM answers

Results: Error LLM exposure → diagnostic accuracy -14%p, control maintained judgment, ↓ self-awareness (physicians unaware of influence)

JAMA 21 LLM

Mass General Brigham 2026: 21 LLMs evaluated, 80%+ clinical cases inadequate differential diagnosis, confident wrong answers (hallucination), “absent clinical reasoning”

LLM Limits: Statistical pattern ≠ clinical reasoning, patient context lacking, physical exam·lab integration limit, training data bias

L72 Digital Verification·Establishment Dimension - 2nd Axis

Digital medicine light and shadow simultaneous visualization. Balanced establishment era. L72 MamaLift = DTx clinical establishment (light), L72 NEJM AI automation bias = AI shadow.

Patient ChatGPT Self-Diagnosis

Current trend (2026): Patient 50%+ medical info search (Google → ChatGPT), family emergency first inquiry, drug side effects·symptoms, pre-visit research

Risks: Automation bias (patients affected too), wrong diagnosis·treatment decision, ↓ doctor trust, emergency delay

AI Literacy - Patient Guide

Safe ChatGPT use: 1st info·education purposes, not diagnosis·treatment tool, doctor consult required, no single dependence (multiple sources cross-check), medical emergency = ER·911

LLM error likelihood: Rare diseases, multi-factor interactions, Korean healthcare system (US training data), latest drugs·research, personal history·drug combination

Physician·Clinic Guide

AI tool utilization: Diagnostic assist (not confirmatory), chart organization·summary, patient education materials, medical literature search, administrative·coding

AI tool avoid areas: Solo diagnosis·treatment decisions, prescription decisions, emergency triage, patient decision replacement

Automation Bias Response - Clinical Guidelines

FDA·AAMI guidelines: AI output verification mandatory, physician final decision, patient consent·education, error reporting system

Korean implications: Korean MFDS AI medical device guideline (2020~), partial clinical adoption (imaging·pathology), automation bias education absent, patient education policy needed

FAQ

Q. ChatGPT medical info safe? A. 1st info·education OK. Not diagnosis·prescription. Doctor consult required. No single dependence.

Q. Doctor ChatGPT use safe? A. Adjunct safe. Solo decision dangerous. NEJM AI shows even doctors -14%p pulled. Self-verification + cross-check required.

Q. AI literacy training helps? A. Some. But trained physicians still automation bias. Systematic verification·peer review·patient education needed.

Q. Will AI medicine advance? A. Definitely. But verification·balance·education together. L72 = AI establishment era balanced visualization.

Q. Patient protection? A. Doctor·pharmacist + emergency to ER. AI = adjunct. Medical decisions = human + AI combo.

Conclusion

NEJM AI automation bias RCT = AI medical establishment shadow visualization. Even doctors -14%p pulled and 80%+ LLMs fail clinical reasoning. L72 = 45 pillars + digital verification·establishment dimension (AI shadow 2nd axis). MamaLift Plus (DTx light) + NEJM AI (bias shadow) = balanced digital medicine era. Patient·doctor both need AI literacy.