As large language models (LLMs) continue to evolve, their models have become increasingly sophisticated. In the case of OpenAI, there are now over ten models available to address a wide range of needs; from generating natural language responses and embeddings to specialized tasks like coding, reading, summarization, and semantic analysis.
In July 2024, OpenAI launched its mini model, GPT-4o-mini. This model, technically referred to as a distilled model, emerged as a response to the increasing demand for AI models that are fast, efficient, and affordable.
Distilled LLMs are making waves in many industries, including legal tech. Their ability to process large contexts quickly and with just as much accuracy as their larger counterparts has unlocked new possibilities across various fields. For law firms and legal tech companies, tools that were once cost-prohibitive may now be within reach, enabling even smaller law firms to leverage AI in their day-to-day workflows. At Darrow, we see this trend as an opportunity to democratize access to legal intelligence, making justice more attainable for all.
Using distilled LLMs to conduct legal research at scale
These smaller models are derived through a process called knowledge distillation, where the full-sized model, often referred to as the teacher, transfers its knowledge to a smaller, more efficient student model.
This process not only reduces computational requirements but also allows companies like Darrow to scale operations without sacrificing the quality of results.
Why? Because companies can use them to achieve cost efficiency without sacrificing much in terms of performance. We can send API requests (a fancy term for coded queries) in bulk at a fraction of the cost.
.png)
While there has been a lot of research into knowledge distillation over the past few years, OpenAI sparked a trend by being the first tech giant to create an enterprise mini LLM. Once other firms realized there was a model offering comparable capabilities at a fraction of the cost, they swiftly followed suit, introducing their own mini, cost-efficient alternatives.

Now, let’s talk about parameters. Parameters refer to the internal numerical values that a model learns during its training process. These parameters act as the model's knowledge representation, enabling it to understand patterns, relationships, and context within data.
I’m mentioning this because while mini LLMs have fewer parameters than full-sized models, results maintain the bulk of their accuracy. This is ideal for businesses like ours that process hundreds of thousands of queries each month.
Another bonus is that mini LLMs are just as fast as previous full-sized models. For example, the GPT-4o mini is reported to be 2.5 times faster than previous models like GPT-3.5 Turbo.
Mini LLMs also allow for faster iteration and experimentation. Smaller models are quicker to fine-tune and adapt for specific tasks. Teams can experiment with different use cases or adjust the models for unique legal challenges without long training times or excessive costs.
Here’s a personal example to explain how distilled LLMs cut costs:
I recently worked on a task that required me to evaluate over 5,000 potential cases. Each case was unique and had over 10,000 rows of long textual data gathered from the web.
My task was to create a table that clearly showed a specific and complex correlation between some of this information and the damage experienced by victims. This involved processing 5,000 queries with a combined input of over 500,000 words and generating an output of 50,000 words.
Remarkably, the total cost to run all these queries was less than $3, an incredibly low cost for such a complex operation. Just imagine the time and expense required to perform this manually with human labor!
Cost efficiency meets quality
The integration of mini LLMs does not mean compromising on accuracy.
While distilled models may have slightly reduced precision, our R&D team ensures accuracy through rigorous validation methods. These include cross-checking results with another AI and manually labeling text samples to evaluate the model’s performance and determining if it’s ready for production.
Additionally, once our intelligence platform flags a potential legal violation, our Quality Legal Intelligence team reviews the findings to confirm their validity. This hybrid approach of AI-backed analysis combined with human oversight is how we deliver reliable, evidence-backed legal insights. Our reliance on human validation doesn’t just ensure accuracy; it also builds trust. Attorneys need to know they’re working with reliable data.
While this article demonstrates how we maximize impact through scale, it's merely the tip of the iceberg. Our team combines the speed and capabilities of AI with the diligence of our legal experts to create workflows that are both cost-effective and thorough enough to produce strong cases that hold up in court.
This might interest you: