We’ve started 2026 working busily behind-the-scenes, and we’ll be sharing a lot more with you soon about what we’ve been doing and how you can get involved.
For now, you can read about a new report on our experiments in AI benchmarking and the session that kicked off our new webinar series, which will run through to the end of the year.
On Thursday 5 February, we published our work on developing the CitizenQuery-UK benchmark, a piece of research in which we tested how AI responds to “citizen queries.”
We asked different AI models from the Gemini, Claude, and ChatGPT families about different aspects of government information, including taxes, benefits, employment, and more. We then used a bespoke dataset we built from gov.uk and our own structured evaluation methods to check the accuracy of their answers.
Our research shows just how competitive AI models are with one another, but also how far they have to go, with many models giving information overloads and losing accuracy as a result. Also, we found that AI is rarely brave enough to say “I don’t know,” an important behaviour that undermines how much we can trust their answers.
You can read the research and the academic paper here.
Croissant update
We’ve been supporting the 1.1 update to the Croissant metadata standard, helping to improve the way hundreds of thousands of datasets are documented online and used for machine learning and AI. The 1.1 update allows richer annotation of data, meaning a publisher can tell the users of their datasets the information they need to know about how the data was collected or how it should be interpreted. With mappings to other metadata standards too, Croissant 1.1 is stepping forward as a way to get everyone on the same page, speaking the same language, when using data for AI. Watch this space as well announce more on this shortly.
We have two fantastic sessions lined up for you this month that you won’t want to miss.
First, join us on Wednesday, 18 February (4–5 PM GMT) for Solid World Feb 2026, where we’ll be charting the course for the next era of the decentralised web with key voices from the Solid ecosystem. Book your free ticket here.
Following that, on Monday, 23 February (4–5 PM GMT), we are hosting Data Ethics Professional #11, featuring Global Fishing Watch for an insightful discussion on embedding ethical AI considerations into real-world operations. Book your free ticket here.
Thomas Carey-Wilson was in Chile in January as part of our work with the UK’s Foreign Commonwealth and Development Office, which looked to support the design and implementation of Chile’s digital and AI regulatory frameworks.
Thomas has also been in Washington D.C. for the Digital Competition Conference, where he discussed his work last year with the Data Transfer Initiative on “Defining ‘Real-Time’: A toolkit for assessing data portability under the DMA and digital competition laws”.
Also on Thursday 5 February, we held a webinar on Public Services and AI. This built on the CitizenQuery work, mentioned above, with a presentation of key findings followed by a panel discussion between me, Richard Pope, and Andy Dudfield, chaired by Emer Coleman.
It was a great discussion with lots of insightful questions from the audience, the recording will be available shortly–and we’re looking forward to bringing you more research about the interactions between government and AI.