There is a mountain of insights already gathered, and often published, on both the challenges, and the opportunities, to build thriving communities in Gloucestershire.

From national datasets, to local community group annual reports, we could spend a whole year of Common Good Gloucestershire time on desk research before we even came to conversations with communities. But, in practice, we don’t have the capacity - and we want to be out engaging in conversations as well as reading reports.

So, how can we make sure we’re still informed by all the data and analysis work that goes before us? We are (cautiously) experimenting with how generative AI tools based on large language models can help.

What are we doing?

For each of the reports submitted to our library, we have an automated process that:

We then review this against a skim read of the report, before adding the item to the public library.

We are also adding the reports to an instance of Notebook LM, which is a ‘Retrieval Augmented Generation’ (RAG) tool that can answer questions drawing on all the resources uploaded to it, or that can generate different format summaries of content (mind maps, presentations etc.).

We will experiment with making this available to our enquiry team to see how it can help them to dig into themes they are exploring, or whether it can prompt new understanding and insight when used alongside roundtable conversations.

What did we consider?

Bias and inaccuracy: We are aware that large language models (LLMs) can have systematic biases, and are probabilistic systems fine-tuned to present confident responses, prone to producing inaccurate summaries at times. We have identified a specific model/platform that should ground answers in the provided documents to mitigate against this, but we recognise it will not remove all error or bias.

By using automatic summarisation alongside our enquiry roundtables, we hope to have a social process that can further mitigate against the impacts of LLM bias affecting the analysis that Common Good Gloucestershire will produce.

Cognitive offloading and task replacement: We are aware that use of LLMs can lead to ‘cognitive offloading’ where we rely on the machine rather than our own processes of thinking and learning, and it can lead to role that would previously been undertaken by an employee or volunteer being replaced or substantially changed. In this case, without use of AI tools, we anticipate we would not be able to engage with past publications in any depth at all, and so we anticipate that the summaries produced will increase the range of insights our enquiry can engage with, without replacing any roles that would have otherwise existed.

We will be carefully designing the presentation of summaries to encourage engagement with the underling reports and documents.

How we will evaluate this experiment?

Question Assessment
Does AI provide clear and accurate summaries?

We are looking for summaries at least comparable to the quality that could be produced by a junior team member reading and summarising documents. | Tracking cases where we identify errors during review.

Inviting feedback on summaries. | | Do we observe any systematic bias in the insight summarisation? | Periodic reflection reading across multiple summaries.

Reflection comparing automated summaries with manually gathered insights. | | Do AI summaries increase, or decrease, enquiry team member engagement with past published reports and data? | Discussion with enquiry team members. |

<aside> ⚙

The prompt

We are currently using the following prompt:

Provide a summary in basic HTML format with the following H3 headings:

The first section should contain a maximum of 75 words. Other sections should not exceed 200 words each, and may use bullet points (unordered lists) as appropriate - using bold for the opening words of each unordered list item.

</aside>