Self-harm hidden in clinical notes: can AI find it without the data leaving the NHS?
Self-harm – intentional self-injury or self-poisoning, regardless of motivation – is one of the strongest risk factors for suicide. Clinicians working in secondary mental health services record vital information about it every day, in the notes they write after seeing patients. But that information is trapped in free text: narrative accounts, clinical observations, fragments of conversation. It cannot be easily searched, counted, or analysed at scale.
Modern AI language models can read and interpret text like this. But the most powerful commercial tools – systems like ChatGPT – work by sending data to external servers. For sensitive mental health records, that creates serious privacy and security concerns. The question this project set out to answer was whether a different kind of AI, one that runs entirely within NHS infrastructure, could do the job just as well – with the data never leaving the building.
Our approach and partners
Working with Oxford Health NHS Foundation Trust, the research team used the Oxford Clinical Records Interactive System (CRIS) to test an AI language model against real clinical records. A sample of 1,352 clinical notes from adults with a confirmed psychiatric diagnosis was selected. Experts in self-harm reviewed each note, recording whether self-harm was present and whether it had occurred recently (within the past 90 days) or historically.
The AI model – a freely available, open-source system installed on the Trust's own secure computers – was then tested against these expert judgements. Crucially, no patient data was transmitted to any external server at any point.
What we found – and why it matters
- High accuracy for detecting self-harm. The model correctly identified 97% of notes that did not contain self-harm, and performed strongly on notes that did – matching expert judgements with over 90% agreement across key measures.
- Reliable detection of recent episodes. Identifying when self-harm occurred is clinically critical but technically harder: a single note may reference both a recent crisis and events from years earlier. The model correctly detected recent self-harm in around four out of five cases – a strong result for a task that even trained reviewers find challenging.
- Privacy preserved without sacrificing performance. Because the model ran entirely within NHS infrastructure, no patient data left the Trust's servers – yet accuracy was comparable to what would be expected from larger commercial systems. NHS trusts do not need to hand over sensitive records to benefit from AI.
These findings were shared through a preprint publication and presentations to clinical and research audiences.
What this means
This work provides evidence that AI running within secure NHS systems can reliably read clinical notes and extract consistent information about self-harm. That matters for two reasons. First, it could give mental health services a far more complete and timely picture of self-harm among the people they care for – supporting clinical monitoring and earlier intervention. Second, it opens up large-scale datasets for research into the causes, patterns, and prevention of self-harm, without requiring patient data to be sent to commercial cloud platforms.
If scaled, this approach could also support the evaluation of interventions to reduce self-harm, help reveal inequalities in how self-harm is recorded and responded to, and reduce the manual burden on clinical and research teams.
What needs to happen next
The model has been tested in one NHS trust. To move from proof of concept to routine use, it needs testing in other mental health services, with different record-keeping practices and patient populations. Performance for less common categories – particularly historical self-harm – needs further improvement. And the approach needs to be piloted within real clinical workflows, so that clinicians, patients, and information governance teams can judge whether it works in practice.
Lessons for future research
Distinguishing when self-harm occurred – not just whether it was mentioned – proved the hardest technical challenge. Clinical notes often weave together past and present, and teaching a model to make that distinction required detailed annotation work and careful, repeated refinement of the instructions given to the AI. Future projects working with clinical notes should anticipate this complexity and invest in expert annotation from the outset.
Lead researchers:
Galit Geulayov, Senior Postdoctoral Researcher, Department of Psychiatry, University of Oxford
Andrey Kormilitzin, Senior Scientist, Department of Psychiatry, University of Oxford
Contact: Galit.Geulayov@psych.ox.ac.uk
ARC OxTV theme: Mental Health
Alignment with the 10 Year Health Plan for England:
This work supports the shift from analogue to digital by using AI to structure routinely collected clinical information on self-harm, enabling more efficient use of existing NHS data. It also supports the shift from sickness to prevention by improving the identification of individuals at risk, potentially enabling earlier intervention.
NIHR narrative themes:
- Innovation – Developed a privacy-preserving AI approach that keeps sensitive patient data within NHS infrastructure while delivering high-accuracy text analysis.
- Impact – Demonstrated that AI can reliably identify self-harm in clinical notes, with potential to improve clinical monitoring and research across mental health services.
- Investment – If scaled, automated identification of self-harm in clinical notes could reduce the manual burden on clinical and research teams, contributing to NHS efficiency.
Partners:
Oxford Health NHS Foundation Trust
Key resources:
What continues beyond ARC funding:
The partnerships with NHS teams, the skills and tools developed during this project, and a clear set of research and implementation questions will remain in place to support further validation and wider adoption of this approach.