Amara's Law in LegalTech: Why Lawyers Keep Getting AI Wrong

Feb 28, 2025

ChatGPT 3.5’s release was my “AlphaGo moment”—recognizing the tech as a fundamental shift in what machines could accomplish. I immediately saw how this technology could evolve and transform legal practice, picturing how it could drastically reduce the costs of the class action securities litigation I used to handle in BigLaw. Even that early version, despite its obvious limitations, could achieve useful outputs when pointed in the right direction. I foresaw human involvement gradually fading as these systems improved through feedback loops.

Yet the legal profession’s reaction to generative AI has followed a pattern that Amara’s Law describes perfectly: “We tend to overestimate the effect of a technology in the short run and underestimate the effect in the long run.”

This isn’t surprising. Legal work involves deeply complex reasoning, nuanced understanding of authority, and intricate language analysis. It makes perfect sense that lawyers, especially those without tech backgrounds, would be skeptical about machines performing such sophisticated intellectual work. We’ve seen this skepticism pattern before—from chess grandmasters before Deep Blue defeated Kasparov, to Go masters before AlphaGo’s triumph over Lee Sedol. Each time, experts in these domains initially scoffed at AI’s potential.

The Initial Reaction: Skepticism and Caution

When lawyers first encountered ChatGPT 3.5, most were distinctly unimpressed. They saw it as an interesting chatbot but definitely not something that could handle sophisticated legal analysis. Then came the infamous case of the “ChatGPT lawyer” who cited completely fabricated cases in a legal brief after prompting the AI to generate authorities—a cautionary tale that reinforced the profession’s natural skepticism.

This response was amplified by the legal profession’s inherent conservatism. Lawyers are trained to be risk-averse, particularly with client matters. The potential fallout from AI-generated errors in legal work can be significant, and the profession’s ethical rules around competence and diligence create additional barriers to rapid adoption of unproven technology.

The Current State of Legal Tech Tools

The LegalTech landscape has evolved rapidly since those early days. Numerous companies are building tools on top of frontier models, with varying approaches and degrees of success. Through demos, conversations with users, and analysis of their technical approaches, I’ve noticed clear patterns in how they’re developing.

These companies deserve credit for their innovation. They’re building the ecosystem that will eventually transform legal practice, experimenting with different approaches and solving real problems. Some are well-positioned for the future, focusing on areas where specialized legal knowledge adds substantial value beyond the underlying models.

That said, the reality is that most of the technical capability in these tools still derives primarily from the frontier models they’re built upon. Many companies are employing traditional software engineering approaches—creating workflows, developing prompt libraries, and building basic agency frameworks that add incremental value. These approaches certainly improve upon the prior state of art, but they’re still fundamentally limited by what the underlying models can do.

As frontier models advance—particularly with reasoning capabilities—many traditional engineering approaches may become unnecessary. LegalTech companies will increasingly leverage simple feedback loops where minimal human guidance can teach these systems to improve autonomously, fundamentally changing how legal solutions are developed.

Real-World Applications and Limitations

I’ve found tons of practical applications for these tools in my own work while continuously exploring the boundaries of what’s possible. The key insight? These models are tremendously valuable for sophisticated legal work when used to augment rather than replace human expertise.

Even when the AI produces imperfect outputs, there’s significant value. The process of evaluating and correcting AI-generated content forces you to refine your own thinking—similar to how teaching a subject deepens your understanding. The interaction becomes a kind of productive intellectual dialogue.

I suspect many lawyers are already doing this, even if they don’t openly talk about it. Studies have shown that there are many secret AI users that won’t admit it due to stigma, the “Secret Cyborg” phenomenon as Ethan Mollick calls it. A growing cohort of legal professionals have AI assistants open in separate windows throughout their workday, using them as thought partners and amplifiers of their own capabilities. The distinction here matters: we’re currently at the “assistant” stage rather than the “agent” stage—despite all the talk about agents lately. These tools excel at augmenting human work but aren’t yet reliable enough to operate autonomously on complex legal tasks.

That said, certain limitations persist. For legal research, the leading AI tools still can’t fully replace the old fashioned methods—that is, using Westlaw and Lexis, searching, reading, and analyzing. When using AI for open-ended research, you face real challenges: the model might not retrieve the most relevant documents, might get confused about hierarchies of authority, or might not allocate enough compute resources to thoroughly analyze complex legal questions.

The experience improves somewhat with closed-universe research, where you identify specific cases or materials and ask the AI to analyze just those documents. Yet even then, performance can vary. Cite-checking remains essential, and I’ve often found important details the AI missed, requiring me to review documents manually in many instances. This creates a potential “poisoning the well” problem—once you’ve seen the AI’s hallucinated analysis, it can bias your own thinking. Still, you live and you learn. Learning how best to use these tools is a constant work in progress, defined by the jagged edge of their capabilities.

A colleague shared an anecdote that illustrates these limitations: a legal department tried to automate NDA review, but found the AI couldn’t reliably distinguish the appropriate positions to take in different situations, such as confidentiality agreements with a public company versus a vendor—despite very different considerations in these contexts. Relatedly, at an L-Suite dinner I hosted last week in San Diego, CLOs expressed significant dissatisfaction with current AI-powered CLM tools. While many see the potential for using AI to accelerate redlining of less complex documents, they reported that similar challenges around context and nuance persist, creating a gap between promise and performance.

Where the Real Progress Is Happening

The most significant advances in legal AI consistently come from improvements in the frontier models themselves. We’ve seen an evolution from Gen 1 (ChatGPT 3.5 and earlier), to Gen 2 (GPT-4, Claude 3.5 Sonnet, Google’s Gemini models), and now to Gen 3 reasoning models (Claude 3.7 Sonnet, ChatGPT’s o1/o3, xAI’s Grok 3).

The arrival of GPT-4 prompted some to suggest halting further AI development (hah)—a perspective that now seems remarkably shortsighted given what’s happened since. ChatGPT 4 clearly marked a “generation 2” milestone, but the truly transformative developments were still coming. (Notably, as I write this, OpenAI just announced ChatGPT 4.5, further demonstrating how quickly development continues to accelerate.)

The reasoning models represent the most consequential leap forward. They have been around for a few months, but many casual users first learned about them with the recent release of DeepSeek. In my opinion, the reaction to DeepSeek’s capabilities was largely because many users simply had never seen a reasoning model—one that explicitly allocates significant compute resources to “think” through problems step by step before responding. It was like Muggles seeing magic for the first time.

What makes reasoning models different is test-time inferencing—essentially giving the model time to work through multiple iterations of analysis before producing a final answer. This is fundamentally different from the one-shot approach of earlier models, which would generate responses immediately with minimal deliberation. If you had been one-shotting work with non-reasoning models, you were missing out on a lot of use cases.

I recognized early that one-shot prompts weren't going to cut it for complex legal work. Getting meaningful analysis required developing sophisticated workflows with chains of prompts, breaking tasks into logical steps rather than expecting immediate comprehensive outputs. This often meant making dozens or even hundreds of calls to the AI to achieve better results, dramatically increasing both computational costs and complexity. Reasoning models promise to reduce the need for this manual engineering by internalizing many of these intermediate steps, though this potential advantage remains to be fully validated in practice.

My Recent Test: Gen 3 Reasoning Models in Action

To assess current capabilities, I recently conducted a test comparing two Gen 3 reasoning models: Anthropic’s Claude 3.7 Sonnet using extended reasoning, and xAI’s Grok 3 model (trained with massive compute resources, courtesy of Elon Musk throwing many bags of cash at Nvidia). I asked each model to compare and contrast their own and each other’s privacy policies.

Why privacy policies? They make an ideal test case for several reasons. First, they’re abundantly represented in these models’ training data due to web scraping. Second, they embody a fundamental tension: companies want to appear maximally privacy-protective while retaining broad rights to use data (the “new oil”). This dynamic produces intentionally complex documents with potential loopholes—precisely the type of legal text that requires sophisticated analysis.

The results were impressive. These models demonstrated analytical capabilities far beyond what was possible just weeks ago. I’ll detail the specific findings in a future post, but the key takeaway was clear: the most significant advances are emerging from improvements in the frontier models themselves, not from specialized legal adaptations.

Regarding Common Lawyer Concerns

When discussing AI with legal colleagues, several concerns consistently arise that are worth addressing:

Confidentiality concerns are often overstated, reminiscent of early skepticism about cloud computing. While lawyers should absolutely be careful about what information they input and use tools that don’t train on or retain their data, the level of concern frequently exceeds the actual risk, particularly when weighed against the benefits.

Accuracy and reliability remain legitimate issues. Even the most sophisticated models continue to hallucinate, and some research suggests this might be an inherent characteristic of large language models given their fundamental architecture. This creates particular challenges in legal contexts, where changing a single word can fundamentally alter an analysis. If hallucinations are indeed intrinsic to the current model designs, developing robust validation mechanisms becomes essential.

Regarding claims that “AI can’t understand nuance”—the issue isn’t that these models lack comprehension of nuance entirely, but rather that they frequently miss cross-references between documents or how different provisions interact. With appropriate prompting and adequate reasoning time, performance improves substantially.

Patience is required specifically regarding compute allocation. Humans spend considerable time reading complex documents, identifying cross-references, and processing information. Expecting a computer to accomplish this in seconds is unrealistic. Most AI providers optimize for cost-efficiency, limiting the compute resources allocated to each query, which creates a fundamental problem: users can’t specify when a question deserves more extensive analysis.

Anthropic’s recent capability to modulate reasoning time in their Claude API represents an important step forward. This approach lets users decide when to invest more compute resources in exchange for better results—analogous to telling an associate to spend 20 minutes versus 20 hours on a research task. The benchmarks consistently show that increased reasoning time yields better results, and I expect this capability to become standard across providers.

Economic concerns largely reflect our collective uncertainty about AI’s economic and social impacts. While nobody can predict precisely how this technology will transform legal practice, it’s clear that we’re not closing Pandora’s Box any time soon. Recently, discussions around Jevons Paradox have gained traction—particularly as companies like DeepSeek demonstrate that highly capable models can be trained for a fraction of the cost of leading frontier models.

Jevons Paradox, named after economist William Stanley Jevons, observes that when technological progress increases the efficiency of resource use, the rate of consumption often rises rather than falls. Applied to AI, this suggests that as intelligence becomes cheaper and more accessible, we may see increased rather than decreased demand for both AI tools and the legal expertise needed to wield them effectively. It’s not that lawyers will be replaced, but that lawyers who effectively leverage AI may replace those who don’t.

Fear of obsolescence is understandable but can be counterproductive. In his book Deep Thinking, Garry Kasparov describes how younger chess players rapidly adopted computer analysis while veteran grandmasters clung to traditional methods like chess journals and paper notebooks. The lesson is clear: professionals need to evolve and adapt by understanding precisely where human expertise adds value beyond what technology can provide. Identifying the areas where your judgment and experience exceed AI capabilities is essential for strategic adaptation as those capabilities advance.

Practical Advice for Getting Started

For lawyers looking to engage with these tools, I generally endorse the approach that Ethan Mollick and others recommend: work with the leading frontier models directly rather than immediately investing in specialized (and much more costly) LegalTech. Start by using these tools as thought partners for brainstorming and reviewing your work.

Try completing assignments independently, then inputting them into AI systems to compare approaches. This comparative process helps develop intuition about where these tools excel and where they fall short. The key is learning through experimentation rather than theoretical study—you’ll naturally discover optimal workflows through practical engagement.

While experimenting, it goes without saying, but maintain your position in the loop. These tools are currently powerful assistants but unreliable agents—they excel at augmenting human judgment but aren’t ready to operate autonomously on complex legal matters. Use them to enhance your capabilities rather than replace your expertise.

While embracing these tools, maintain healthy skepticism about their limitations. Hallucination and accuracy issues remain significant challenges. You can generally trust AI more for self-validating outputs—where the value lies in the expression itself rather than factual claims (like drafting emails or organizing notes). However, for any analysis that extends beyond your existing knowledge base, careful verification remains essential. When asking the AI to analyze case law or regulatory frameworks, treat its output as a first draft that requires your professional scrutiny. Don’t be the ChatGPT lawyer, please.

As these tools continue to evolve at a breathtaking pace, the lawyers who develop thoughtful, systematic approaches to incorporating them into their practice will gain significant advantages over those who either resist or adopt without strategy.

Conclusion

We're watching Amara’s Law unfold in real time within legal technology. After initial skepticism about AI’s immediate impact, the profession now risks substantially underestimating its long-term transformative potential.

The most consequential advances are occurring at the frontier model level, with reasoning capabilities representing the most significant advancement. While specialized legal tools provide value in specific contexts, they remain fundamentally constrained by the capabilities of their underlying models.

The path forward isn’t resistance but thoughtful adaptation. Lawyers need to understand these tools, experiment with them, and identify where human expertise provides value beyond what AI can currently deliver.

Whatever developments emerge, one certainty remains: the landscape of legal practice is evolving more rapidly than most lawyers anticipate. Those who comprehend and adapt to these changes will thrive in the transformed professional environment that lies ahead.

No Vehicles in the Park

Discussion about this post