DWP: Readying ML models for operation

As the UK’s biggest public service department, the Department for Work and Pensions (DWP) administers the State Pension and a range of benefits to around 20 million claimants. DWP have an opportunity to harness Artificial Intelligence (AI) to improve its services for citizens.

Working towards real-time Machine Learning insights

DWP Digital has an advanced data science capability, with expert data scientists developing and applying models to identify ways to provide better support to claimants. Before now, the data science team has relied on batch processing which introduces a delay before data can be analysed to derive actionable insights. With DWP Digital’s growing capability in the latest AI and Machine Learning (ML) technologies, the data scientists saw the opportunity to harness their transformative potential to respond to claimants’ needs in real-time.

The team had a wealth of innovative ideas for how these technologies might be applied to service delivery, including a prototype ML model that evaluates claimants’ messages to flag priority cases for safeguarding interventions. The ML model could be iterated and improved by comparing its outputs with the results of human appraisals of claimants’ messages. What the data scientists needed was engineering expertise to build the capability to operationalise these prototypes once they were approved for use in production.

DWP Digital logo

Scott Logic has had the privilege of supporting DWP Digital to deliver its objectives for over five years, giving us a deep understanding of the department’s data infrastructure. We also have proven expertise in cloud data engineering and Machine Learning Operations (MLOps) – a set of practices that automate the safe deployment and maintenance of ML models. In combination, this led DWP Digital to engage us to develop the cloud data architecture and MLOps infrastructure through which its prototype ML models could be operationalised.

In a short timeframe, Scott Logic has helped lay the foundations on which we can build our machine learning capabilities, and demonstrated the feasibility of our plans to use the technology to provide urgent support to citizens in need.

Andy Tyack, Deputy Director for Delivery, Department for Work and Pensions

Establishing production-grade version control

A key challenge many organisations face in preparing AI and ML models for production is managing and tracking multiple model iterations. This is of particular importance in government where there must be transparent audit trails around the models used to process data. The solution is to create model repositories which facilitate the centralised management of version control, naming, storage, and model retrieval for use in existing pipelines.

To support the data science team in developing this capability, we built two proof-of-concept model repositories in DWP Digital’s development environment to assess the advantages and disadvantages of each. We started by decoupling the ML model and its outputs from the codebase. As things stood, message data was being piped out of the Kafka-based Universal Credit messaging system into a Python service in which the ML model was hard-coded, with the output simply logged by the service.

As the model was part of the Python codebase, it would be a time-consuming, manual task to compare the outputs of different model versions. In addition, the model’s outputs would need to be extracted manually from the service logs.

Our architecture introduced the proof-of-concept model repositories, separate from the Python service. A given version of a model was retrieved from a repository and copied into the Python service before processing the messages.

Rather than logging the outputs, they were piped back into the Kafka messaging service in real-time. Once in Kafka, another service could pick up the outputs for comparison with the outputs of other versions of the ML model.

In this way, our architecture introduced the capability to harness ML models to flag the urgency of claimants’ messages in real-time, and to compare the performance of the models with human appraisals. Importantly, every stage of the automated process was logged by the model repositories for audit purposes.

For the proof-of-concept model repositories, we compared an off-the-shelf product, MLflow, with a bespoke build in Amazon S3. Setting up two working prototypes, we put both repositories through their paces using the data science team’s existing ML models.

This allowed us to flush out any technical issues and assess the suitability of each repository within the specific context of DWP Digital’s systems. We documented all of this for the data science team so that they would be equipped to make an informed decision when they were ready to operationalise their models.

Diagram of the solution architecture

Facilitating monitoring and evaluation

Even with production-ready version control in place, organisations face further challenges in operationalising AI and ML models. How can they be continuously monitored for issues with performance and accuracy? How can new models be tested and evaluated safely? This is where MLOps processes and tooling come into play.

DWP already had Python pipeline monitoring in place for its ML models, checking that the services were operating within expected parameters, and alerting to any issues. What we added to the MLOps mix was the capability to monitor and report on the data, and we were able to do this with existing DWP tooling. Each prototype model repository we set up introduced a centralised dataset that could be analysed. DWP Digital already used an open source observability platform called Grafana, and we used this tool to demonstrate how it was possible to create simple dashboards displaying real-time data on the ML models’ outputs.

After the models were put into production, there would be the potential to monitor their accuracy by comparing their predictions with human judgements of each message’s urgency. With Grafana’s user-friendly interface, the data science team would be self-sufficient in creating dashboards to meet its future requirements.

Once in operation, the ML models would need retraining or replacing with new models, so the data science team required the capability to evaluate what it called its ‘incubation models’. The testing and evaluation of new versions of ML models is more complex than testing improvements to standard software code. It requires the capability to deploy the test model and expose it to the same data inputs as the version of the model that’s in production, allowing the outputs to be benchmarked against each other.

We built a prototype that would allow multiple live and ‘incubation’ ML pipelines to run in parallel. This applied DevOps-like processes of Continuous Integration and Continuous Deployment, but with the key difference that the incubation models were isolated from the live pipelines. Whereas the live model would pipe its outputs back into DWP Digital, the incubation models would safely pipe their data to the data science team for analysis and evaluation. In creating the prototype, we also surfaced and documented technical challenges that would need to be addressed once the team was ready to take the prototype into production.

Handing over production-ready pipelines

Throughout the engagement, we worked in close collaboration with the data science team and the DWP Digital’s Product Owner for MLOps. Following a weekly cadence, the team tasked us with areas to investigate and we ran demos at the end of each week to receive feedback and a steer for the week to come. At the end of the engagement, we delivered working prototypes, comprehensive documentation, and our recommendations for areas that could be investigated next.

Following our handover, the data science team was in a position to implement production-grade MLOps pipelines and processes allowing them to manage, maintain and rapidly iterate the ML models in a safe environment. In operation, the models have the potential to significantly improve the speed and effectiveness with which DWP Digital can respond to urgent messages. By replacing manual work with more efficient automation, there’s also real potential to free up DWP civil servants’ time to focus on other valuable tasks in support of claimants.

Get in touch

Want to know how we could support your organisation?

Drop us a line to talk about how our approach could help.