AI: The Future of Data

Statistically Speaking

20-05-2024 • 33 minutos

With the public release of large language models like Chat GPT putting Artificial Intelligence (AI) firmly on our radar, this episode explores what benefits this technology might hold for statistics and analysis, as well as policymaking and public services.

Joining host, Miles Fletcher, to discuss the groundbreaking work being done in this area by the Office for National Statistics (ONS) and across the wider UK Government scene are: Osama Rahman, Director of the ONS Data Science Campus; Richard Campbell, Head of Reproducible Data Science and Analysis; and Sam Rose, Deputy Director of Advanced Analytics and Head of Data Science and AI at the Department for Transport.

Transcript

MILES FLETCHER

Welcome again to Statistically Speaking, the official podcast of the UK’s Office for National Statistics. I'm Miles Fletcher and, if you've been a regular listener to these podcasts, you'll have heard plenty of the natural intelligence displayed by my ONS colleagues. This time though, we're looking into the artificial stuff. We'll discuss the work being done by the ONS to take advantage of this great technological leap forward; what's going on with AI across the wider UK Government scene; and also talk about the importance of making sure every use of AI is carried out safely and responsibly. Guiding us through that are my ONS colleagues - with some of the most impressive job titles we've had to date - Osama Rahman is Director of the Data Science Campus. Richard Campbell is Head of Reproducible Data Science and Analysis. And completing our lineup, Sam Rose, Deputy Director of Advanced Analytics and head of data science and AI at the Department for Transport. Welcome to you all. Osama let's kick off then with some clarity on this AI thing. It's become the big phrase of our time now of course but when it comes to artificial intelligence and public data, what precisely are we talking about?

OSAMA RAHMAN
So artificial intelligence quite simply is the simulation of human intelligence processes by computing systems, and the simulation is the important bit, I think. Actually, people talk about data science, and they talk about machine learning - there's no clear-cut boundaries between these things, and there's a lot of overlap. So, you think about data science. It's the study of data to extract meaningful insights. It's multidisciplinary – maths, stats, computer programming, domain expertise, and you analyse large amounts of data to ask and answer questions. And then you think about machine learning. So that focuses on the development of computer algorithms that improve automatically through experience and by the use of data. So, in other words, machine learning enables computers to learn from data and make decisions or predictions without explicitly being programmed to do so. So, if you think about some of the stuff we do at the ONS, it's very important to be able to take a job and match it to an industrial classification - so that was a manually intensive process and now we use a lot of machine learning to guide that. So, machine learning is essentially a form of AI.

MILES FLETCHER
So is it fair to say then that the reason, or one of the main reasons, people are talking so much about AI now is because of the public release of these large language models? The chat bots if you like, to simpletons like me, the ChatGPT’s and so forth. You know, they seem like glorified search engines or Oracles - you ask them a question and they tell you everything you need to know.

OSAMA RAHMAN
So that's a form of AI and the one everyone's interested in. But it's not the only form – like I said machine learning, some other applications in data science, where we try in government, you know, in trying to detect fraud and error. So, it's all interlinked.

MILES FLETCHER
When the ONS asked people recently for one of its own surveys, about how aware the public are about artificial intelligence, 42% of people said they used it in their home recently. What sort of things would people be using it for in the home? What are these everyday applications of AI and I mean, is this artificial intelligence strictly speaking?

OSAMA RAHMAN
If you use Spotify, or Amazon music or YouTube music, they get data on what music you listen to, and they match that with people who've been listening to similar music, and they make recommendations for you. And that's one of the ways people find out about new music or new movies if you use Netflix, so that's one pretty basic application, that I think a lot of people are using in the home.

MILES FLETCHER
And when asked about what areas of AI they'd like to know more about, more than four in 10 adults reported that they'd like to know better how to judge the accuracy of information. I guess this is where the ONS might come in. Rich then, if I could just ask you to explain what we've been up to, what the Data Science Campus has been up to, to actually bring the power of artificial intelligence to our statistics.

RICHARD CAMPBELL
Thanks Miles. Yeah, a few things that ONS has been doing in this very broad sphere of artificial intelligence, and it's really in that overlap area that Osama mentioned with data science, so I'd pick out a few sorts of general areas there. So, one is automation. You know, we're always keen to look at how we can automate processes and make them more efficient. It frees up the time of our analysts to conduct more work. It means that we are more cost effective. It means that our statistics have better quality. It's something we've done for years but AI offers some new opportunities do that. The other area which Osama touched on is the use of large language models, you know, we can get into the complexities of data. We can get much more out of data; we can complete tasks that would have been too complex or too time consuming for real data scientists. And this is good news, actually, because it frees up the data scientists to add real valuable human insights. Some of the places we've been using this. So, my team for example, which is called reproducible data science and analysis, and we use data science and engineering skills to develop computer systems to produce statistics where the data is a bit big, or what I tend to call a bit messy or a bit complex for our traditional computer systems. We use AI here through automation, as I mentioned, you know, really making sure that we're making systems as efficient and high quality as possible. Another thing we're interested in doing here is quite often we’re doing something called re-platforming systems. So, this is where we take a system that's been used to produce our statistics for years and years and look to move it on to new technology. Now we're exploring with Osama's team the potential for AI to do a lot of the grunt work for us there to sort of go in and say, right, what is going on in this system? How is it working, how we can improve it? One other thing I'll mention, if Osama doesn't mind me treading on the territory of his team, is the Stats Chat function that we've used on the ONS website. So, this is using AI to enable a far more intelligent interrogation of the vast range of statistics that we've got, so it no longer requires people to be really knowledgeable about our statistics. It enables them to ask quite open questions and to be guided to the most relevant data.

MILES FLETCHER
Because at the moment, if you want to really explore a topic by getting into the depths of the data, into the granular data, you’ve really got to know what you're looking for haven’t you? This again is an oracle that will come up with the answers for you and just present them all ready for your digestion.

RICHARD CAMPBELL
That's right. And I tend to think of these things as a starting point, rather than the whole answer. So, what it’s enabling you to do is to get to the meat of the issue a lot quicker. And then you can focus your energy as a user of our statistics in doing the analysis that you want rather than thinking “how do I find the right information in the first place?”

MILES FLETCHER
Osama, that sounds like an intriguing tool. Tell us precisely how it works then, what data does it capture, what's in scope?

OSAMA RAHMAN
So the scope is publicly available documents on the ONS website. And there's a specific reason for that. So, these AI tools, you can have it look at the whole internet, you can have it look at subsets of data, you can point it to specific bits of data, right? And what's important for us is actually the work of the ONS, that statistics we produce are quality assured and relevant. And by providing these guardrails where you know, Stats Chat only looks at ONS published data, we have a degree of assurance that the data coming back to the user is likely to be of good quality and not based on who knows what information.

MILES FLETCHER
Because when you use, to name one example, ChatGPT for example, the little warning comes back saying “ChatGPT can make mistakes, consider checking important information.” And I guess that's fundamental to all this isn't it. These tools, as intelligent as they might be, they're only as good - like any system - as the information that's going in the front end.

OSAMA RAHMAN
That's absolutely correct, which is why we have these guardrails where, you know, the functionality on Stats Chat is focused on published ONS information.

MILES FLETCHER
That does mean that something that's offered by an organisation like the ONS does have that sort of inbuilt potential to be trustworthy and widely used. But of course, you might say, to have a really good tool it's got to be drawing on masses of information from right across the world. And it's interesting how, and you mentioned that it's open-source data, of course, that's most available for these tools at the moment, but you're seeing proprietary data coming in as well. And this week, as we're recording this, the Financial Times, for example, has announced that it's done a deal with one of the big AI firms to put all of its content into their database. Do you think there's scope for organisations like the ONS around the world to collaborate on this and to provide you know, really powerful tools for the world to exchange knowledge and data this way?

OSAMA RAHMAN
So there is collaboration going on. There's collaboration, both within government - we're not the only department looking at these sorts of tools; there's also collaboration internationally. I think the difference you know... our information on our website is already publicly available. That's why it's on the net, it is a publication. But there's a difference in situation with the FT where, you know, a lot of the FT information is behind a paywall.

MILES FLETCHER
Yeah, it has a sort of democratising tendency that this publicly available information is being fed into these kinds of sources and these kinds of tools. That's big picture stuff. It's all very exciting work that's going on. But I'll come back to you Rich just for a second. What examples practically, because I think that the Stats Chat project is still a little way off actually being available publicly, isn't it?

RICHARD CAMPBELL
Yeah, I think it is still a little way off. So, I think the key thing that we're doing at the moment and something we've done for years, but AI is helping is the use of automation principles. Just making things quicker. Now in a data science context, this might be going through very, very large data sets, looking for patterns that it would take an analyst a huge amount of time and probably far too much patience than they would have to find.

MILES FLETCHER
So for example, in future then we might find that - and this is one issue that recurs in these podcasts - obviously about the limitations of official statistics is they tend to lag.  This is another way of making sure that data gets processed faster. And therefore, the statistics are more timely, and therefore the insights they provide are really much more actionable than perhaps they might be at the moment.

RICHARD CAMPBELL
Yeah, that's spot on. There's potential in there for pace of getting the statistics from the point that the data exists to getting it into published statistics. There's potential there for us to be able to combine and bring more sources together. There's also some behind the scenes stuff that helps as well. So, for example, quite often we are coding up the systems to produce new or improved versions of official statistics. And we're looking at the possibility of AI speeding up and supporting that process, perhaps for example, by giving us an initial draft of the code. Now, why does that matter for people in the public, you know, does anybody actually care? Well, what it means is that we can do things quicker and more to the point we can focus the time of our expert data scientists and other analysts in really helping people understand the data and the analysis that we're producing.

MILES FLETCHER
Okay, so lots of interesting stuff in the pipeline there. But I’d like to bring in Sam now to talk about how AI is actually being used in government right now. Because in your work Sam at the Department for Transport, you've actually been working on some practical projects that have been gaining results in the real world.

SAM ROSE
We have - we've been doing loads actually, and my poor team probably haven't had any time to sit still for the last 18 months or so. And I think like most ministerial departments, we're doing lots and lots of work to automate existing processes, so much like Rich has alluded to in your space, we're looking at the things that take up most of the time for our policy colleagues and looking at how we can automate those. So, for example, drafting correspondence, or automating policy consultation processes, or all of that kind of corporate memory type stuff. Can we mine big banks of data be it text or otherwise and summarise that information or generate new insights that we wouldn't have been able to do previously? But I think slightly more relevant maybe for you guys, is the stuff we're doing on creating new datasets or improving datasets. So, a few things. We're training a machine learning model to identify heavy goods vehicles from Earth observation data. And that's because we don't have a single nationally representative data set that tells us where these heavy goods vehicles park or stop outside of existing kind of service stations, and what we want to understand is where are those big areas of tarmac or concrete where they're all parking up as part of their routine journeys, so that we can look at when we're rolling out the green infrastructure for heavy goods vehicles, we're looking at where the important places that we need to put that infrastructure are. And that data doesn't exist at the moment. So we're using machine learning to generate a new dataset that we wouldn't otherwise have.

MILES FLETCHER
And how widespread are these kinds of projects across government in the UK now?

SAM ROSE
So I think that there are loads of different things and I wouldn't be able to speak on behalf of everybody but I know lots of different areas of government are looking at similar kind of automation and productivity projects like our kind of drafting all of the knowledge management area. I think there's things like Osama alluded to where DEFRA for example, I think they're using Earth observation data to assess biodiversity for example. So, there's lots of stuff that's common between lots of government departments, and then there's lots of stuff that's very specific to individual departments. But all along the way there's lots of collaboration and working together to make sure we're all learning continuously and where we can collaborate on a single solution that we are.

MILES FLETCHER
I guess one of the central public concerns about the spread of AI once again that it will cost jobs, that it will do people out of the means of making a living that they've become used to. And I guess from government's point of view, it's all about doing much, much more with the resources that we have and making government much more effective.

SAM ROSE
Yes, absolutely. And it's not necessarily - and I think Rich mentioned this earlier - it's not necessarily about doing our jobs for us. It's about improving how we can do our jobs and being able to do more with less, I think, so freeing up the human to do the bit that the human really needs to do and enabling the technology to do their very repeatable very automatable parts of the job. And indeed, in some instances, this technology can actually do the work better than humans. So be it identifying really complex patterns and datasets, for example. Or a good example from us in transport is we've trained machine learning model to be able to look at images of electric vehicle charge point installations and be able to identify that similar or the same image that has been submitted more than once. Now that's estimated to have saved over 130 man years of time, you know, that's not a task that we would have been able to do with just humans.

MILES FLETCHER
And you would have to be pretty alert as a human and have a very high boredom threshold to process all that material yourself and spot the fraudsters.

SAM ROSE
Yeah, well, quite. And that's, I think, a really nice example of where again, it's not taking our jobs, but it's enabling us to do something that we wouldn't have been able to do previously and improve the service that we're providing.

MILES FLETCHER
Now, our ability collectively, whatever sort of organisation we're involved in, our ability to make the most of AI depends on of course having the right skills, and Osama I guess this is where the Data Science Campus comes in as the government's Centre of Excellence for data science, principally, but I guess also in this context, artificial intelligence as well. What work have you been involved in to make sure that the supply of those skills and knowledge is on tap for government?

OSAMA RAHMAN
So firstly, I would say we are a (one) centre of excellence within government. I think you know, what's been brilliant to see since the campus was set up has been that actually more and more government departments have excellent data science, AI teams. Sam leads one at DfT. There is, of course, 10DS (or 10 Data Science) at number 10 [Downing Street]. There's a Cabinet Office team. So, there's lots of teams that now work in this area. Some of the stuff we've been doing is we have various training programmes that we have run. We have senior data masterclasses so that actually, senior leaders within government can understand better the power of data. 10DS, Sam's area, have all been running hackathons, which actually improve skills as well. So, it's no longer just us who are building capability. I think it's great to see that across government and across departments there are teams improving skills within their departments, bringing in others from outside to work with them. So, there's a lot going on there.

SAM ROSE
Just really quickly, it's important to think that skills are not just skills of data scientists, but skills of everybody's ability to use this kind of technology. There's a lot of work going on at the moment looking at what we need to do both internally to government, but also out there in all of our sectors to make sure that our workforce has the skills it needs to be able to more rapidly kind of adopt and be able to take advantage of all the benefits that this technology brings to us. I mean from a very personal point of view, and I don't really know all of the answers to this, but you know, I'm thinking about what actually, if large language models can help us to generate efficient code, then actually, what skills do I need in my data scientists? If it's not writing code, is it actually the analytical thinking and being able to understand how to apply these kinds of technologies? So, I think it changes what we need in the workforce that we have.

MILES FLETCHER
Inevitably, though, if we're talking about this kind of technology being rolled out across government and thereby increasing the power of government to know more about more people, then concerns obviously, about the ethical use of data come in...

Te podría gustar