There's a category of clinical AI product that markets itself as an ambient AI scribe but is, technically speaking, voice-to-text with formatting templates on top. The distinction matters because the failure modes are completely different.

What voice-to-text does

Speech is transcribed to text. The text is split into sections (subjective, objective, assessment, plan) using either keyword matching or simple ML classification. Templates auto-fill where keywords trigger. The output looks like a SOAP note.

This works well when the consult is short, the patient is articulate, and the clinician follows a standard interview structure. It breaks the moment the conversation goes sideways — which happens in maybe 30% of consults in a typical day.

What an actual AI scribe does

An LLM reads the full transcript and understands the clinical meaning. It separates the patient's stated symptom from the clinician's restatement of it. It recognises that the 'headache for 3 days' mentioned in passing at minute 12 is relevant to the chief complaint introduced at minute 2. It surfaces a differential the clinician hasn't explicitly stated but the symptoms support.

The LLM also handles ambiguity — when the patient says "a couple of weeks" the model captures that as approximately 14 days, not literally 'a couple of weeks'. When the patient says they're 'fine' but earlier described stress at home, the model flags the disconnect rather than recording 'patient is fine'.

How to tell which one you're evaluating

Run a complex consult through it. Multiple complaints, fragmented chronology, patient backtracking on what they said earlier. A voice-to-text tool will produce a note that reads as flat lists of utterances grouped by section. An actual AI scribe will produce a structured clinical narrative that organises the consult by clinical relevance, not chronology.

Why this matters for the clinic

Voice-to-text plateaus quickly. It saves you typing time but it doesn't save you thinking time, because you still have to organise the resulting text mentally. An actual AI scribe saves both — and the gap compounds across a day.

An LLM-native scribe, not transcription with formatting.See MedMETs in action