ai-safety.txt: Making AI vulnerability reporting easy
The UK and US governments have both secured voluntary commitments from many major AI companies on AI safety, including reporting mechanisms for model vulnerabilities.
Back in February 2024, I tested how companies were doing on these commitments. While doing this, I found that it was often difficult to find the reporting process for model vulnerabilities. This document puts forward a proposal for organisations to publish ai-safety.txt files, in a very similar way to security.txt files.
A model vulnerability is a safety or security issue relating to an AI model that isn't directly related to its cybersecurity. This could include vulnerability to jailbreaks, prompt injection attacks, privacy attacks, unaddressed potential for misuse, controllability issues, data poisoning attacks, bias and discrimination and general performance issues.
This is based on the definition in the UK government's paper.
Why do we need this?
AI systems are rapidly becoming more capable and accessible, and being put in charge of more decisions. Better tools being widely available have the potential to supercharge productivity and make the world a radically better place. Additionally, automating decision making can reduce bias, or increase access to services.
However, these factors also bring about risks: more capable and accessible systems can enable serious misuse. Flaws in software systems in charge of more decisions can go disastrously wrong.
There's broad agreement that organisations using AI systems should be receptive to reports of things going wrong. But in my recent research, I discovered just finding a way to contact organisations about model vulnerabilities was difficult! And this was for organisations that had explicitly committed to having good processes to manage vulnerability reporting - let alone more general organisations that might employ AI systems in key processes like healthcare triage, policing or recruitment.
A step towards better model vulnerability reporting would be to have a standardised way for people to find contact details. Without this, people are less likely to disclose vulnerabilities, or do so securely.
The proposal
In short, publish a file plain text file at:
example.com/.well-known/ai-safety.txt
that details how people should report model vulnerabilities, for example:
Contact: mailto:ai-vulns@example.com
Last-Updated: 2024-12-12T00:00:00.000Z
Required fields:
- Contact. A URI representing an email, phone number or webpage that indicates how to get in touch to report AI vulnerabilities. Multiple Contact fields can be specified. This may be a link to an organisation's security.txt, indicating model vulnerabilities should be submitted through the organisation's cybersecurity processes.
- Last-Updated. An ISO 8601 timestamp indicating when the file was most recently updated. At most 1 allowed. It's recommended you keep this up to date to give researchers confidence that the file is still relevant. This must never be set to a date in the future.
Optional fields:
- AI-Model. A link that clearly identifies an upstream AI vendor or AI model, so researchers can understand which AI models are being used for different purposes. Multiple AI-Model fields can be specified.
- Acknowledgments, Canonical, Encryption, Expires, Hiring, Policy, Preferred-Languages. All are defined in the same way as RFC9116, except for model vulnerabilities rather than cybersecurity.
Comments can be placed in the file by prefixing lines with the # symbol. This can be useful for briefly explaining how you are using AI systems.
This is heavily inspired by security.txt, and to reduce confusion uses similar terminology in the same way. The only major difference is using Last-Updated instead of Expires.
An example with all the possible fields:
Contact: mailto:ai-vulns@example.com
Preferred-Languages: en
Last-Updated: 2024-12-12T00:00:00.000Z
Expires: 2025-01-12T00:00:00.000Z
# Recruitment: extracts factual information from candidates resumes
AI-Model: https://openai.com/gpt-4
# Customer service: triage tickets
AI-Model: https://llama.meta.com/llama2
Acknowledgments: https://example.com/ai-safety-stars
Canonical: https://example.com/.well-known/ai-safety.txt
Encryption: https://example.com/pgp-key.txt
Hiring: https://example.com/careers#safety
Policy: https://example.com/disclosure-policy.html
What can you do to help?
Developers of AI systems should publish an ai-safety.txt file. They should also encourage their customers to publish an ai-safety.txt file, possibly making it mandatory for customers deploying AI systems in higher-risk domains such as healthcare.
Organisations using AI systems should publish an ai-safety.txt file. They should encourage other organisations in their industry to do so, and consider the presence of an ai-safety.txt file as part of due diligence processes when procuring AI systems.
AI safety researchers should look for these files when reporting issues. If you encounter difficulty finding the right way to report issues to organisations, call them out and encourage them to adopt this standard!
Standard-setting organisations can incorporate having an ai-safety.txt file as part of their standards. If you're interested in turning this into a real RFC please get in touch.
Policymakers and regulators should encourage organisations developing and using AI systems to publish ai-safety.txt files, or at least set some minimum quality bar for having a working vulnerability reporting process.