How are AI companies doing with their voluntary commitments on vulnerability reporting?

Adam Jones

How are AI companies doing with their voluntary commitments on vulnerability reporting?

This is my personal blog, where I write solely in my personal capacity. It does not represent the positions of any organisations I'm associated with.

Also, everything here is provded "as is", without warranties of any kind, express or implied.

20 February 202420 February 2024

The UK and US governments have both secured voluntary commitments from many major AI companies on AI safety.

These include having appropriate reporting mechanisms for both cybersecurity vulnerabilities and model vulnerabilities.

I took a look at how well organisations are living up to these commitments as of February 2024. This included reviewing what the processes actually are, and submitting test reports to see if they work.

What's a model vulnerability?

A model vulnerability is a safety or security issue relating to an AI model that isn't directly related to its cybersecurity. This could include vulnerability to jailbreaks, prompt injection attacks, privacy attacks, unaddressed potential for misuse, controllability issues, data poisoning attacks, bias and discrimination and general performance issues.

This is based on the definition in the UK government's paper.

What's the wording of the commitments?

In the US, all companies agreed to one set of commitments which included:

US on model vulnerabilities: Companies making this commitment recognize that AI systems may continue to have weaknesses and vulnerabilities even after robust red-teaming. They commit to establishing for systems within scope bounty systems, contests, or prizes to incent the responsible disclosure of weaknesses, such as unsafe behaviors, or to include AI systems in their existing bug bounty programs.

In the UK each company submitted their own commitment wordings. The government described the relevant areas as follows:

UK on cybersecurity: Maintain open lines of communication for feedback regarding product security, both internally and externally to your organisation, including mechanisms for security researchers to report vulnerabilities and receive legal safe harbour for doing so, and for escalating issues to the wider community. Helping to share knowledge and threat information will strengthen the overall community's ability to respond to AI security threats.

UK on model vulnerabilities: Establish clear, user-friendly, and publicly described processes for receiving model vulnerability reports drawing on established software vulnerability reporting processes. These processes can be built into – or take inspiration from – processes that organisations have built to receive reports of traditional software vulnerabilities. It is crucial that these policies are made publicly accessible and function effectively.

Summary table

Company	Score
Adobe 🇺🇸	6/20
Amazon 🇺🇸 🇬🇧	6/20
Anthropic 🇺🇸 🇬🇧	13/20
Cohere 🇺🇸	12/20
Google 🇺🇸	20/20
Google DeepMind 🇬🇧	18/20
IBM 🇺🇸	5/20
Inflection 🇺🇸 🇬🇧	16/20
Meta 🇺🇸 🇬🇧	14/20
Microsoft 🇺🇸 🇬🇧	18/20
NVIDIA 🇺🇸	20/20
OpenAI 🇺🇸 🇬🇧	12/20
Palantir 🇺🇸	5/20
Salesforce 🇺🇸	9/20
Scale AI 🇺🇸	4/20
Stability AI 🇺🇸	1/20

Show more details

Some high-level takeaways

Performance was quite low across the board. Simply listing a contact email and responding to queries would score 17 points, which would place a company in the top five.

However, a couple companies have great processes that can act as best practice examples. Both Google and NVIDIA got perfect scores. In addition, Google offers bug bounty incentives for model vulnerabilities and NVIDIA had an exceptionally clear and easy to use model vulnerability contact point.

Companies did much better on cybersecurity than model vulnerabilities. Additionally, companies that combined their cybersecurity and model vulnerability procedures scored better. This might be because existing cybersecurity processes are more battle tested, or taken more seriously than model vulnerabilities.

Companies do know how to have transparent contact processes. Every single company's press contact could be found within minutes, and was a simple email address. This suggests companies are able to sort this out when there are greater commercial incentives to do so.

Rubrics

Cybersecurity ease of reporting score:

No easy way to contact the company about a vulnerability (e.g. have to use a sales contact channel).
Has a general contact email.
Has some contact point for reporting security vulnerabilities.
An expired or otherwise invalid security.txt file.
A non-expired security.txt file.

Model vulnerability ease of reporting score:

No easy way to contact the company about a vulnerability (e.g. have to use a sales contact channel).
Lists a general contact email.
Some contact point for reporting model vulnerabilities (e.g. a general abuse email).
A clear contact point for reporting model vulnerabilities.

Overall score is the sum of:

cybersecurity ease of reporting
model vulnerability ease of reporting
6 for a cybersecurity response (0 for no response)
6 for a model vulnerability response (0 for no response)
-1 to have scores range between 1-20 (rather than 2-21)

Method

Reporting processes

To evaluate the quality of reporting processes, I took the following steps for each company. I chose these because I thought they'd roughly be the process security researchers are likely to take when trying to report vulnerabilities, but I appreciate there may also be other signposting methods I have missed.

Cybersecurity only: Attempt to find a security.txt file on their website.
Where multiple methods were available, I prioritised email given this was easiest to schedule for the same time.
Google '<company name> [cybersecurity|AI model] vulnerability report'
Google '<company name> [cybersecurity|AI safety] team'
Google '<company name> abuse contact'
Go to the company website and generally try to browse to find the most reasonable contact details. Often this meant gravitating to security, safety, privacy or general contact pages.

Each Google search was undertaken in a new guest window, and all searches were conducted in one afternoon to minimise the effect of changing Google search results. Only the first page of results was considered. Google was picked given they have the largest search market share, and although they are one of the AI labs on this list being evaluated I think it is incredibly unlikely that they do anything special with the results here for themselves. I ignored any sponsored results, unless these were from the company itself.

For model vulnerabilities, where it was ambiguous where to go next (often where there was a choice between reporting abuse vs. inherent flaws) I followed instructions as if I wanted to report an issue where an AI model had dangerous capabilities that could be used by malicious actors (e.g. an LLM that could help a criminal group to launch cyberattacks). It was still often unclear which process to follow. If I was still uncertain, but it was a choice between the same route for cybersecurity vulnerabilities and something else I would follow the other procedure given that was already being tested.

Response times

To get response times, I followed the reporting processes at each company to submit a report that just asked them to contact me when they received it. The exact text of the message was:

tldr: Please respond saying 'We got your [cybersecurity|AI model] vulnerability test report', and then you can close this report / ticket / thread.

Hey,

I'm a researcher investigating the vulnerability reporting practices of AI companies. I'm contacting you because you publicly committed to having effective third-party vulnerability reporting processes related to AI systems [1], so I wanted to check they are working correctly.

I've tried to follow your processes for reporting a vulnerability, so hopefully I've got through to the right team. Please could you simply confirm that you did get this report responding saying: 'We got your [cybersecurity|AI model] vulnerability test report'. You can then close the report: there is no actual vulnerability here.

I plan to use this information to compile a review of the public vulnerability reporting processes. If you have any other questions, please do feel free to ask!

Best,

Adam Jones

[1]: One or more of:

https://www.whitehouse.gov/briefing-room/statements-releases/2023/07/21/fact-sheet-biden-harris-administration-secures-voluntary-commitments-from-leading-artificial-intelligence-companies-to-manage-the-risks-posed-by-ai/

https://www.whitehouse.gov/briefing-room/statements-releases/2023/09/12/fact-sheet-biden-harris-administration-secures-voluntary-commitments-from-eight-additional-artificial-intelligence-companies-to-manage-the-risks-posed-by-ai/

https://www.aisafetysummit.gov.uk/policy-updates/#company-policies

I sent all requests on 2024-01-08 at 12:00 UTC. To optimise for fairness, I decided to:

Only publish 24-hour periods passed, given this reduces bias between companies that are based in different timezones. I also think this is fairer on smaller companies, where around-the-clock coverage might be infeasible or place unreasonable working demands on individuals.
Checked there was not a bank holiday in any country that these labs have significant public presence. (It was very hard to find a day with no bank holidays anywhere).
Send it on a Monday which is a working day in almost all countries.

It should be noted that problems in the real world might not be so polite as to optimise for fairness, so this might be seen as a 'best-case' situation.

Why might these timelines be slower than a real incident?

This is a slightly weird request. It is possible some companies may have not responded or reacted more slowly to this than a real report, given:

they might have had to seek internal clearance to respond to this kind of request
they might have had to think about whether this released inappropriate information about their security response procedures
automated systems might have triaged this as low priority

I did give all companies' press teams the opportunity to comment on this blog post before publicising it to explain any delays.

Is reporting this information hazardous?

Revealing information about security response processes at organisations has the potential to weaken those organisation's ability to respond to future incidents.

This can be true even if the information is already public, but has been aggregated together in one place.

However, I believe publicising this information is in the public interest given:

The likely benefit of improving AI companies' vulnerability reporting processes by providing feedback and scrutiny. This is especially true due to the relative ease of fixing these issues.
The limited potential for harm, as the same information could be gathered by a malicious actor with similar effort. Harms are further mitigated by only reporting 24-hour periods, reducing information as to when information security teams might view reports. Many companies already publish statistics about time to response e.g. Google and OpenAI.

Is this a waste of security teams' resources?

Sending these emails does use up some resources from security teams at the AI companies. In addition, these reports might be seen as crying wolf and discourage more careful monitoring of future reports.

However, the effort required by companies is near-minimal to process these test reports. In addition, the risk of this causing companies to take reports less seriously has to be balanced against the overall motivation this article gives to encourage companies to improve reporting procedures. On balance, I think this is a net positive for security.

Lastly, some companies get a huge volume of these reports. For example, Google received over 2900 valid reports in 2022, and presumably many more invalid ones. A single extra report is unlikely to change their behaviour that much. I think this is valid given very few others (if anyone) have done these kinds of tests, so I'm not otherwise contributing to a large number of tests overall.

Individual company results

Adobe

Cybersecurity report:

Process
- Had a security.txt, which listed an email. The security.txt said it had expired though (Expires: 2023-07-27T01:00:00.000Z). I also found a page Notifying Adobe of Security Issues on their website which gave the same email, so this was enough for me to decide it was still appropriate.
Response timeline
- No response.

Model vulnerability report:

Process
- Google search returned the Adobe Product Security Incident Response Team as the first result. It had a clear link to Notifying Adobe of Security Issues, which gave me an abuse email.
Response timeline
- No response.

Amazon

For testing purposes I considered this as Amazon Web Services, given this seems to be where their responsible AI policies are listed and press releases link to.

Cybersecurity report:

Process
- Had a security.txt, which listed an email. The security.txt said it had expired though (Expires: 2023-12-31T18:37:07z). It also linked to a page Vulnerability Reporting on their website which gave the same email, so this was enough for me to decide it was still appropriate.
Response timeline
- No response.

Model vulnerability report:

Process
- Google search returned the Vulnerability Reporting page, which listed an abuse email. Google search also pulled up a press release on the commitments, which linked back to the same page.
Response timeline
- No response.

Anthropic

Cybersecurity report:

Process
- Had a security.txt, which linked to their Responsible Disclosure Policy, which gave an email. The security.txt said it had expired though (Expires: 2023-12-31T23:59:00.000Z). However, the responsible disclosure policy was still linked in their footer and I failed to find any other information so concluded it was still appropriate.
Response timeline
- Got a response after 24 hours saying they had received the ticket. This appeared to be from a person at Bugcrowd, the service they use to triage reports.

Model vulnerability report:

Process
- The Google search found Anthropic's UK commitments. Far down (the 23rd result!) it picked up the Contact Us page, which listed a contact email and also linked back to the Responsible Disclosure Policy. Google search did not seem to pick up the responsible disclosure policy (the page is indexed in Google Search, so I'm not sure why it ranks so low down).
- However, the footer of the Anthropic website links to the Responsible Disclosure Policy so it is relatively easy to find that way. The policy was also incredibly clear that I had found the right place to send the report to (e.g. compared to others which had much more general and weaker language about abuse reports):
  
  We welcome reports concerning safety issues, “jailbreaks,” and similar concerns so that we can enhance the safety and harmlessness of our models. Please report such issues to [email address] with enough detail for us to replicate the issue.
Response timeline
- No response.

Cohere

Cybersecurity report:

Process
- No security.txt.
- Google search came up with the Security page as the third result. This page suggested two ways to contact the security team: a contact email which seemed to be linked incorrectly (as the mailto included an extra space), and a link that appeared to be for contacting their sales team ( https://cohere.com/contact-sales ).
- Just in case this was something clever, I contacted them both at the address listed, and at the address I thought they meant.
Response timeline
- Got a response within 24 hours saying they had received the report.

Model vulnerability report:

Process
- Google search came up with a blog and the Responsibility page. The responsibility page makes it clear such reports are accepted, and direct users to https://cohere.com/contact-sales.
- However, it was not possible to submit this form with a @gmail.com address.
Response timeline
- I received a response from their sales team that seemed to be a fairly generic sales message. After explaining what I was looking for and asking for it to be routed to the appropriate person, I received no response.

Press response:

Cohere thanked me for reaching out, and stated that they'd get the website team to fix the security mailto link remediated as soon as possible. Within 24 hours this had been fixed.

Google

Cybersecurity report:

Process
- Had a security.txt, which listed an email.
Response timeline
- Received auto response directing me to bughunters form, saying that this email was not monitored.
- See model vulnerability report process from here, as this is the same route.

Model vulnerability report:

Process
- Google search returned Acting on our commitment to safe and secure AI, Google's reward criteria for reporting bugs in AI products, Google and Alphabet Vulnerability Reward Program (VRP) Rules. The third seemed most relevant, which linked to the Bug Hunters reporting form. (Additionally, the first links to the second, which links to the third: so reporters are likely to find this).
Response timeline
- Got a response within 24 hours saying they had received the report.

Google DeepMind

Cybersecurity report:

Process
- No security.txt
- Google search returned the UK commitment blog, which linked to the Bug Hunters site. Google search also returned the Google and Alphabet Vulnerability Reward Program (VRP) Rules page.
Response timeline
- See Google model vulnerability process, as this is the same route.

Model vulnerability report:

Process
- Google search again returned the UK commitment blog, and the Google and Alphabet Vulnerability Reward Program (VRP) Rules page.
Response timeline
- See Google model vulnerability process, as this is the same route.

IBM

Cybersecurity report:

Process
- Had a security.txt, which listed an email.
Response timeline
- No response.

Model vulnerability report:

Process
- First Google search returned many IBM pages (including an ad from IBM), none of which seemed directly relevant.
- Second Google search returned Trustworthy AI, but this just seemed like a blog with no obvious way to contact this team to report a problem.
- On the IBM website, the most obvious links looked to be:
  - Homepage main cover
    - The homepage was coincidentally a full page cover on AI: 'Now available, watsonx.governance accelerates responsible, transparent and explainable AI workflows'. Clicking 'more details' took me to a whole page that talked about AI governance at a very high level, but was not useful for finding contact details.
  - Header: Support > Generative AI
    - This took me to a page that listed various open source AI projects, but looked not to be the right place to raise a concern.
  - Header: Support > Open a case
    - This required an IBMid account, so I skipped this.
  - Header: More > Security, privacy & trust
    - This had a long list of different things. Many documents (IBM Business Conduct guidelines, IBM Principles for Trust and Transparency) seemed just to be generic policies.
    - The one that seemed maybe relevant was the 'IBM Security Vulnerability Management (PSIRT)' page. However, this seemed to be much more focused on cybersecurity.
  - Footer: Contact IBM
    - This had a list of several different contacts, but did not have anything on AI safety (or even cybersecurity). It did have a general contact form which is what I used.
Response timeline
- No response.

Inflection

Cybersecurity report:

Process
- No security.txt.
- First Google search returned no relevant results.
- Second Google search returned only Our policy on frontier safety. This describes a 'closed pilot', with no signposting for security researchers to report something.
- On the website, the safety page has a collapsed-by-default section on 'Review and Improvement'. This lists an email contact.
Response timeline
- Initially got a response from a customer service desk that seemed a little confused. However, after further clarification they seemed to understand.
- Got a response within 24 hours saying they had received the report.

Model vulnerability report:

Process
- Google search returned Our policy on frontier safety and We welcome the G7 Hiroshima Code of Conduct for developing advanced AI systems. Neither of these had clear contact details.
- The email on the safety page seems to be the most appropriate place to report vulnerabilities.
Response timeline
- See Inflection cybersecurity process, as this is the same route.

Microsoft

Cybersecurity report:

Process
- No security.txt.
- Google search returned FAQs - Report an issue and submission guidelines as the first result. This directed me to the Microsoft Security Response Centre Researcher Portal, where I could sign up for an account (as well as supporting login with Microsoft or Google accounts). I signed in with my Microsoft account, then created a researcher profile and verified my email. I then created a report via the report form in the portal.
Response timeline
- Got a response within 24 hours saying they had received the report.

Model vulnerability report:

Process
- Google search returned Microsoft Vulnerability Severity Classification for Artificial Intelligence and Machine Learning Systems, Microsoft's AI Safety Policies, Updating our Vulnerability Severity Classification for AI Systems, Microsoft AI Bounty.
- The current severity classification suggests a dangerous capability being available for misuse would be out of scope for reports to Microsoft.
- The policy ultimately indicated reports should be submitted through the MSRC Researcher Portal.
Response timeline
- See Microsoft cybersecurity process, as this is the same route.

NVIDIA

Cybersecurity report:

Process
- No security.txt.
- Google search returned Report Security Vulnerability, which listed a PSIRT contact email.
Response timeline
- Got a response within 24 hours saying they had received the report.

Model vulnerability report:

Process
- Google search returned Report Security Vulnerability, which listed the same PSIRT contact email.
- The wording made it clear this was appropriate for AI concerns:
  
  How do I report a security or trust concern with an NVIDIA AI model or dataset (such as an NVIDIA AI vulnerability or hack, AI trust or ethics concern, malicious prompt engineering, data poisoning possibility, etc)
Response timeline
- See NVIDIA cybersecurity process, as this is the same route.

OpenAI

Cybersecurity report:

Process
- No security.txt at .well-known/security.txt. Trying to access it actually resulted in a 500 server error instead of a 404, surprisingly.
- Google search returned 1: Security and 2: How to Report Security Vulnerabilities to OpenAI. 1 links to OpenAI Security Portal and OpenAI - Bugcrowd. 2 links to Coordinated vulnerability disclosure policy, which then links to OpenAI - Bugcrowd and https://openai.com/security.txt.
- The /security.txt file is at the wrong location for the security.txt standard, but does seem to be a validly formatted file. This lists an email, which I used.
- The Bugcrowd also does look like a legitimate alternative way to submit reports.
Response timeline
- Received auto response directing me to their bugcrowd, saying that the disclosure email was no longer monitored. Submitted BugCrowd report shortly after.
- OpenAI responded on BugCrowd quickly, within a few minutes. Not certain whether this was automated or a human, but I think a human.
- However I then also got a separate confirmation from a human that they had received the report.

Model vulnerability report:

Process
- Google search returned How to Report Security Vulnerabilities to OpenAI. This again points either to the email or Bugcrowd. On Bugcrowd, it was suggested model vulnerabilities were reported via the model behaviour feedback form, which is what I used.
Response timeline
- No response.

Palantir

Cybersecurity report:

Process
- Had a security.txt, which listed an email.
Response timeline
- No response.

Model vulnerability report:

Process
- First Google search returned Palantir AI Ethics, Reporting security concerns, Response to the Office of Science and Technology Policy “Request for Information: National Priorities for Artificial Intelligence”.
- Second Google search returned Palantir's Response to OSTP's National Priorities for AI, Palantir Foundry for AI Governance, Enabling Responsible AI in Palantir Foundry, Palantir AI Policy Contributions, AI on RAILs, Palantir Information Security.
- Third Google search returned Contact Us, National Center for Missing & Exploited Children, Protecting identity.
- None of these had an obvious contact point for raising model vulnerabilities, besides maybe the cybersecurity process.
- The general contact us page had no obvious option where AI safety reports would be submitted, however the closest was 'Product Training & Support' which is what I used.
Response timeline
- No response.

Salesforce

Cybersecurity report:

Process
- No security.txt.
- Google search returned Vulnerability/Penetration Report Summary, Responsible Disclosure Policy, Vulnerability Reporting Policy.
- The second and third links listed the same email contact, which is what I used. In addition, the responsible disclosure policy is linked in the main website footer.
Response timeline
- Got a response within 24 hours saying they had received the report.

Model vulnerability report:

Process
- First Google search returned Vulnerability Reporting Policy, Vulnerability/Penetration Report Summary, As AI Advances, Trusted Data and Security Concerns Grow, Conversational AI Programming with CodeGen: Let AI Write Code For You.
- Second Google search returned Salesforce Ethical AI Architect on Safety Culture in the Era of Artificial Intelligence, The AI Opportunity: Collaboration Is Key, How Salesforce Develops Ethical Generative AI from the Start, Meet Salesforce's Trusted AI Principles, Meet Salesforce's Trusted AI Principles, Salesforce AI Research, Salesforce Outlines 7 Opportunities to Deepen Trust in AI in Response to White House Executive Order, How Salesforce Infuses Ethics into its AI.
- Third Google search returned Salesforce Email Abuse Policy, Contact Salesforce Security, Abuse Reports, Anti-Spam Policy, Phishing and Malware.
- None of the above seemed relevant to submitting model vulnerabilities.
- Website also has a legal section which lists various policies, including a compliance page, although this didn't seem to go anywhere obviously relevant.
- In the end, I used the general contact form listed in their website footer.
Response timeline
- No response.

Scale AI

Cybersecurity report:

Process
- No security.txt.
- Google search returned Security at Scale, Setting test and evaluation standards
- The former listed an email contact.
Response timeline
- No response.

Model vulnerability report:

Process
- First Google search returned Test & Evaluation, Setting test and evaluation standards | Scale AI, Test & Evaluation: The Right Approach for Safe, Secure, and Trustworthy AI, AI Readiness Report 2023 (did not download the report given it required completing a form, subscribing to marketing emails, and did not seem likely to be relevant), Scale AI Policy Framework: Key Elements to Ensure American Leadership in AI.
- Second Google search returned SEAL: Scale's Safety, Evaluations and Analysis Lab, Our Commitment to Ensuring Safe, Secure, and Trustworthy AI, Security at Scale, Donovan: AI Digital Staff Officer for national security. | Scale AI, Scale AI Policy Framework: Key Elements to Ensure American Leadership in AI.
- Third Google search returned no results.
- None of the above seemed relevant to submitting model vulnerabilities.
- In the end, I used the general contact email listed in their website footer.
Response timeline
- No response.

Stability AI

Cybersecurity report:

Process
- No security.txt.
- First Google search returned no results.
- Second Google search returned Expanding Our Leadership Team: Meet Some Of Our New Team Members, About.
- Third Google search returned Privacy Policy, Discord Terms, Contact Us, Acceptable Use Policy, Stable Chat Terms of Service.
- In the end, I used the general contact form to submit the query. This required me to sign up for marketing from them and required company details, so it seems really to be a sales channel.
Response timeline
- No response.

Model vulnerability report:

Process
- First Google search returned Stable Chat, research preview and participation in DEF CON AI Village, StableLM: Stability AI Language Models, Discord Terms.
- Second Google search returned Expanding Our Leadership Team: Meet Some Of Our New Team Members, Careers: Join Our Team of Innovators, Company, Stable Diffusion Public Release.
- Third Google search returned Privacy Policy, Discord Terms, Contact Us, Acceptable Use Policy, Stable Chat Terms of Service.
- In the end, I used the general contact form to submit the query.
Response timeline
- See Stability AI cybersecurity process, as this is the same route.

How are AI companies doing with their voluntary commitments on vulnerability reporting?

Summary table

Some high-level takeaways

Rubrics

Method

Reporting processes

Response times

Individual company results

Adobe

Amazon

Anthropic

Cohere

Google

Google DeepMind

IBM

Inflection

Meta

Microsoft

NVIDIA

OpenAI

Palantir

Salesforce

Scale AI

Stability AI