The CMI Intoxilyzer 4000.

Breathalyzer source code cases provide a precendent for LLM-generated evidence

If you’re on trial and evidence against you has been assembled or generated by AI, how can the providence of that evidence ever be proven?

One of the big, latent AI legal triggers I’ve been waiting for is for a defendant to request details on how AI-generated evidence (facial recognition systems, shot-detector systems, LLM-generated data, etc) was created. Because if the court supports the request, there’s no way to deliver the goods.

There’s precedent here: the breathalyzer source code cases of the late 2000’s. In several states, defendants requested access to the source code of CMI Intoxylizers, devices that were rumored to produce inconsistent results. The cases dragged on for years. Charles Short, writing in the Florida Law Review, summed up the absurdity of the situation nicely:

The manufacturer has assured the State of Florida that the Intoxilyzers work, and law enforcement has determined, to its satisfaction, that the machines produce accurate results. However, defense counsel is unable to independently verify any of these propositions. Thus, the outcome is truly circular: the machine is reliable because it produces results; the results are right because the machine is reliable.

But pressure kept building, in multiple states, until judges started tossing DUI convictions due to CMI’s refusal to produce source code. CMI eventually agreed to some audits (I found this one conducted by Georgia Tech). Still, breathalyzer admissibility remains a fraught issue, with state police departments taking different approaches to support convictions and defendants frequently challenging the reliability of the devices.

So what happens if someone makes a similar request to an AI system today?

LLM explainability remains an unsolved problem. We cannot detail precisely how an LLM arrived at an output.

Despite this, it’s only a matter of time before LLM-generated evidence is used in court. Here’s an article from The Associated Press about police departments using LLMs to write reports:

Normally, the Oklahoma City police sergeant would grab his laptop and spend another 30 to 45 minutes writing up a report about the search. But this time he had artificial intelligence write the first draft.

Pulling from all the sounds and radio chatter picked up by the microphone attached to Gilbert’s body camera, the AI tool churned out a report in eight seconds.

“It was a better report than I could have ever written, and it was 100% accurate. It flowed better,” Gilbert said. It even documented a fact he didn’t remember hearing — another officer’s mention of the color of the car the suspects ran from.

Oklahoma City’s police department is one of a handful to experiment with AI chatbots to produce the first drafts of incident reports. Police officers who’ve tried it are enthused about the time-saving technology, while some prosecutors, police watchdogs and legal scholars have concerns about how it could alter a fundamental document in the criminal justice system that plays a role in who gets prosecuted or imprisoned.

The system is an OpenAI-powered tool sold by Axon, the company behind the Taser and body camera systems. Axon’s team is saying the right things in the article, stressing that this tool should be a first draft, not a final report. Rick Smith, Axon CEO, quipped, “They never want to get an officer on the stand who says, well, ‘The AI wrote that, I didn’t.’”

In my opinion, it’s practically guaranteed many officers will generate the first draft and hit submit. After carefully reviewing output from the tool, they may deem it ‘good enough’ and become overly comfortable with the results. The behavior we’ve already witnessed from users being overly reliant on AI search summaries will likely reoccur here, with much higher stakes.

If Axon implements this tool correctly, supporting evidence used to generate reports will be archived as fall-back evidence.

But we’re getting closer to a case where a defendant demands details on the processes that lead to AI-created evidence – something that can’t be produced.


Have thoughts? Send me a note