Despite the tireless efforts and remarkable work of attorneys, we live in a world where countless legal violations occur daily, often going unaddressed. Darrow was born to tackle this pervasive issue, driven by a mission of frictionless justice. Our goal is to empower law firms to grow and excel, thereby fostering a more just society. By leveraging cutting-edge technology and innovative solutions, Darrow aims to detect legal violations at scale, enabling attorneys to focus on what they do best—upholding the law and advocating for those in need. Together, we strive to transform the legal landscape, ensuring that justice is not only served but also accessible to all.
The challenges of detecting legal violations in online data
One of the most challenging aspects of this innovation is defining what constitutes a legal violation. In the physical world, violations are context-dependent and nuanced, and translating these intricacies into online data amplifies the complexity. Clear definitions are crucial; without them, machine learning models lack the guidance needed to accurately detect violations.
Even with clear definitions, the question remains: what constitutes a legal violation? For example, consider content moderation on social media platforms, where platforms like Facebook are obligated to enforce legal standards. Their AI systems often face criticism for both over-censoring and under-censoring posts, illustrating the difficulty of defining harmful content universally. The challenge extends beyond identification to understanding the legal, social, and cultural factors that influence content moderation.
But that's not all. Another significant hurdle is creating a substantial and representative dataset for identifying legal violations. The class imbalance between legal violations and the vast amount of online data complicates model training, often leading to biases towards non-violative content. Ensuring dataset representativeness is crucial, as it must capture a wide range of legal violations across different domains. Annotating these datasets is challenging due to the complexity of the legal domain and the need for expert knowledge.
To summarize, there are two fundamental challenges (among many others):
- Defining the task of legal violation detection.
- Creating datasets to support this task.
Approaching these challenges
At Darrow, we approached these challenges in a unique and innovative way.
We defined the task as a two-step process. First, we focused on finding legal entities or violation markers within text, even "weak" indications, such as something that doesn't sound right. Second, we matched these markers with relevant laws or past cases. By defining the task in this manner—identifying potential indications and then linking them to the appropriate legal context—we have achieved significant results in detecting legal violations. This method mimics how our brains work, breaking down complex problems into manageable steps.
Next, we built a mechanism to detect legal violations. As mentioned, datasets are critical. Creating these datasets is cumbersome, so we developed an innovative approach: we combined human experts with GPT-4 to reinforce each other, creating a solid dataset that mimics real-life scenarios and is as challenging as real-world cases.
With both a clear task definition and a robust dataset, we built our "holy grail"—a machine that finds legal violations, LegalLens. We published an academic paper in a top NLP conference to present our innovative approach, including the data and code. The paper received significant attention, highlighting the potential of our methods.
Collaboration is key
The advancement of legal technology hinges significantly on the collaboration of research and development professionals. When innovative minds come together, they pool diverse perspectives, expertise, and innovative ideas, creating a fertile environment for breakthrough advancements. This collective approach not only accelerates the development process but also ensures a more comprehensive understanding of complex legal challenges, fostering innovative solutions. By collaborating, these professionals can leverage a wider array of technologies and methodologies, pushing the boundaries of what is possible in legal tech. This synergy is crucial for driving innovation that aims to democratize access to justice, streamline legal processes, and ultimately, contribute to a more just and equitable world. The cumulative impact of many minds working in concert amplifies the potential for significant, transformative changes in the legal landscape, demonstrating the profound importance of collaboration in achieving lofty goals.
Darrow's shared task
Through the work we did to approach the challenges of detecting legal violations, and creating datasets to support this, we realized there is much more knowledge to be discovered. This is why we decided to launch a shared task, inviting the broader community to join us in this effort to detect legal violations in online data.
Participants will be provided with the LegalLens dataset, curated and validated by GPT-4 and domain experts. This dataset includes texts from various sources such as class-action complaints, legal news articles, and reviews, with each document annotated for entities like laws, violations, violators, and victims. The goal of the task is to develop the best models to accurately identify and interpret these legal entities and relationships, bringing us closer to a world of frictionless justice where legal violations are swiftly detected.
We welcome participants from all fields.