Blinqx: Building a GenAI paralegal
Blinqx provides Enterprise Resource Planning (ERP) software for financial, governmental and business services companies via a Software as a Service (SaaS) platform. Founded in the Netherlands in 2019, the company now serves over 100,000 users worldwide across 1,500 clients.
Adding value for legal customers
For customers in the legal profession, Blinqx provides services including digital contract signing, document storage, and end-to-end workflow management. The company recognised the potential of Generative AI (GenAI) to add significant value to its offering to legal customers by accelerating document analysis, effectively taking on some of the tasks of a paralegal.
Blinqx ran a hackathon to explore this potential, creating a very promising prototype that demonstrated how lengthy legal documents could be summarised rapidly by pulling out key events and organising them into a timeline.
However, it’s often difficult to take a prototype into production, particularly with cutting-edge technology. On the strength of our track record, Blinqx commissioned Scott Logic to develop a GenAI microservice built on production-ready code and architecture, capable of providing demonstrably accurate outputs. This would be a considerable challenge – reining in untamed, non-deterministic technology so that it could be trusted to provide accurate and reliable results for customers.
Nonetheless, in just one month, we built the microservice and created a testing suite to ensure that the new AI features would justify customer confidence. As a result, Blinqx could prepare to go to market with a game-changing enhancement to its platform that would allow its legal customers to work faster and more efficiently.
Our startup mentality at Blinqx drives us to embrace innovative technologies to change the world of B2B SaaS solutions for our customers. With a small team and in a short timeframe, Scott Logic has accelerated our AI journey while providing assurance of the technology to justify the confidence level that we and our customers require.
Martijn Weltevreden, Product Owner AI, Blinqx
Validating the LLM’s accuracy
Blinqx had selected Open AI’s GPT-4 Turbo as the Large Language Model (LLM) to investigate and evaluate. It required minimal configuration and offered amazing power and speed. However, demonstrating its accuracy presented some interesting challenges, and not just because of the non-deterministic nature of artificial intelligence.
Our task was to create a single LLM ‘pre-prompt’ that would generate accurate and relevant results across a broad range of legal scenarios, matching what a paralegal would produce. Blinqx’s customers would then be able to configure the language and level of detail they wanted at the next stage before sending the prompt to the LLM.
Our first step was to engage a lawyer to provide summaries of a selection of different types of legal documents. These human summaries gave us an initial control variable against which we could assess the accuracy of the LLM’s outputs using a testing suite.
Our assessment criteria examined: whether the LLM extracted the same information, whether it returned any irrelevant information, whether it contradicted the human lawyer in any way, and whether the output was concise or verbose.
The next step was to automate the process of measuring the confidence level in the LLM’s outputs. To tackle this, we added an evaluation model into the testing suite to compare the human and GenAI outputs and assess them based on our four criteria, with a true/false mark against each one. These marks were expressed as a percentage confidence level, and we added a gold standard when all four criteria were met and a silver standard when only the first criterion was met.
This evaluation model played a pivotal role in enabling us to fine-tune the LLM pre-prompt so that it provided suitable results regardless of the document type.
Our next step was to build an API that allowed multiple documents to be uploaded and served to the LLM via the microservice. The LLM reviewed each document in turn and the results were compiled into a unified timeline of key events, citing the source document for each event.
Blinqx was seeking to present its legal customers with a simple feature that would ‘hide the wiring’, with no need for them to understand LLM prompts – and that’s what our novel solution delivered. Accessing their document repository on the Blinqx platform, customers would be able to follow a simple two-step process, and a few minutes later receive an ordered timeline of events extracted from potentially hundreds of pages of legal documentation.
From prototype to production
Drawing on our track record of productionising AI prototypes, we built the microservice from scratch in production-ready code. Given the confidential nature of legal documentation, the LLM was hosted and run within Blinqx’s Azure cloud environment, rather than sending data to Open AI. We containerised the microservice on Azure using Podman, making it readily accessible to the main ERP platform as its intermediary with the LLM.
We ensured alignment with Blinqx’s requirements through daily stand-up meetings with the client’s Product Owner and a developer who had worked on the hackathon. We also collaborated with the development team responsible for the Blinqx platform’s legal functionality to ensure that everything would integrate correctly. At the end of the engagement, we talked the client team through our well-documented codebase and explained the flow of the legal documents.
The handover equipped Blinqx’s product team to integrate the de-risked GenAI microservice into its ERP services within a sandbox environment, ready to gauge customer interest in the new functionality and shape its commercial offering. Blinqx would then be in prime position to take the new feature to market rapidly, ahead of the competition.