Adam Jones|HomeBlog

How are AI companies doing with their voluntary commitments on vulnerability reporting?

Headshot of Adam Jones

The UK and US governments have both secured voluntary commitments from many major AI companies on AI safety.

These include having appropriate reporting mechanisms for both cybersecurity vulnerabilities and model vulnerabilities.

I took a look at how well organisations are living up to these commitments as of February 2024. This included reviewing what the processes actually are, and submitting test reports to see if they work.

Summary table

Show more details

Some high-level takeaways

Performance was quite low across the board. Simply listing a contact email and responding to queries would score 17 points, which would place a company in the top five.

However, a couple companies have great processes that can act as best practice examples. Both Google and NVIDIA got perfect scores. In addition, Google offers bug bounty incentives for model vulnerabilities and NVIDIA had an exceptionally clear and easy to use model vulnerability contact point.

Companies did much better on cybersecurity than model vulnerabilities. Additionally, companies that combined their cybersecurity and model vulnerability procedures scored better. This might be because existing cybersecurity processes are more battle tested, or taken more seriously than model vulnerabilities.

Companies do know how to have transparent contact processes. Every single company's press contact could be found within minutes, and was a simple email address. This suggests companies are able to sort this out when there are greater commercial incentives to do so.

Rubrics

Cybersecurity ease of reporting score:

  1. No easy way to contact the company about a vulnerability (e.g. have to use a sales contact channel).
  2. Has a general contact email.
  3. Has some contact point for reporting security vulnerabilities.
  4. An expired or otherwise invalid security.txt file.
  5. A non-expired security.txt file.

Model vulnerability ease of reporting score:

  1. No easy way to contact the company about a vulnerability (e.g. have to use a sales contact channel).
  2. Lists a general contact email.
  3. Some contact point for reporting model vulnerabilities (e.g. a general abuse email).
  4. A clear contact point for reporting model vulnerabilities.

Overall score is the sum of:

  • cybersecurity ease of reporting
  • model vulnerability ease of reporting
  • 6 for a cybersecurity response (0 for no response)
  • 6 for a model vulnerability response (0 for no response)
  • -1 to have scores range between 1-20 (rather than 2-21)

Method

Reporting processes

To evaluate the quality of reporting processes, I took the following steps for each company. I chose these because I thought they'd roughly be the process security researchers are likely to take when trying to report vulnerabilities, but I appreciate there may also be other signposting methods I have missed.

  • Cybersecurity only: Attempt to find a security.txt file on their website.

  • Where multiple methods were available, I prioritised email given this was easiest to schedule for the same time.

  • Google '<company name> [cybersecurity|AI model] vulnerability report'

  • Google '<company name> [cybersecurity|AI safety] team'

  • Google '<company name> abuse contact'

  • Go to the company website and generally try to browse to find the most reasonable contact details. Often this meant gravitating to security, safety, privacy or general contact pages.

Each Google search was undertaken in a new guest window, and all searches were conducted in one afternoon to minimise the effect of changing Google search results. Only the first page of results was considered. Google was picked given they have the largest search market share, and although they are one of the AI labs on this list being evaluated I think it is incredibly unlikely that they do anything special with the results here for themselves. I ignored any sponsored results, unless these were from the company itself.

For model vulnerabilities, where it was ambiguous where to go next (often where there was a choice between reporting abuse vs. inherent flaws) I followed instructions as if I wanted to report an issue where an AI model had dangerous capabilities that could be used by malicious actors (e.g. an LLM that could help a criminal group to launch cyberattacks). It was still often unclear which process to follow. If I was still uncertain, but it was a choice between the same route for cybersecurity vulnerabilities and something else I would follow the other procedure given that was already being tested.

Response times

To get response times, I followed the reporting processes at each company to submit a report that just asked them to contact me when they received it. The exact text of the message was:

tldr: Please respond saying 'We got your [cybersecurity|AI model] vulnerability test report', and then you can close this report / ticket / thread.

Hey,

I'm a researcher investigating the vulnerability reporting practices of AI companies. I'm contacting you because you publicly committed to having effective third-party vulnerability reporting processes related to AI systems [1], so I wanted to check they are working correctly.

I've tried to follow your processes for reporting a vulnerability, so hopefully I've got through to the right team. Please could you simply confirm that you did get this report responding saying: 'We got your [cybersecurity|AI model] vulnerability test report'. You can then close the report: there is no actual vulnerability here.

I plan to use this information to compile a review of the public vulnerability reporting processes. If you have any other questions, please do feel free to ask!

Best,

Adam Jones

[1]: One or more of:

I sent all requests on 2024-01-08 at 12:00 UTC. To optimise for fairness, I decided to:

  • Only publish 24-hour periods passed, given this reduces bias between companies that are based in different timezones. I also think this is fairer on smaller companies, where around-the-clock coverage might be infeasible or place unreasonable working demands on individuals.
  • Checked there was not a bank holiday in any country that these labs have significant public presence. (It was very hard to find a day with no bank holidays anywhere).
  • Send it on a Monday which is a working day in almost all countries.

It should be noted that problems in the real world might not be so polite as to optimise for fairness, so this might be seen as a 'best-case' situation.

Individual company results

Adobe

Cybersecurity report:

  • Process
    • Had a security.txt, which listed an email. The security.txt said it had expired though (Expires: 2023-07-27T01:00:00.000Z). I also found a page Notifying Adobe of Security Issues on their website which gave the same email, so this was enough for me to decide it was still appropriate.
  • Response timeline
    • No response.

Model vulnerability report:

Amazon

For testing purposes I considered this as Amazon Web Services, given this seems to be where their responsible AI policies are listed and press releases link to.

Cybersecurity report:

  • Process
    • Had a security.txt, which listed an email. The security.txt said it had expired though (Expires: 2023-12-31T18:37:07z). It also linked to a page Vulnerability Reporting on their website which gave the same email, so this was enough for me to decide it was still appropriate.
  • Response timeline
    • No response.

Model vulnerability report:

Anthropic

Cybersecurity report:

  • Process
    • Had a security.txt, which linked to their Responsible Disclosure Policy, which gave an email. The security.txt said it had expired though (Expires: 2023-12-31T23:59:00.000Z). However, the responsible disclosure policy was still linked in their footer and I failed to find any other information so concluded it was still appropriate.
  • Response timeline
    • Got a response after 24 hours saying they had received the ticket. This appeared to be from a person at Bugcrowd, the service they use to triage reports.

Model vulnerability report:

  • Process
    • The Google search found Anthropic's UK commitments. Far down (the 23rd result!) it picked up the Contact Us page, which listed a contact email and also linked back to the Responsible Disclosure Policy. Google search did not seem to pick up the responsible disclosure policy (the page is indexed in Google Search, so I'm not sure why it ranks so low down).
    • However, the footer of the Anthropic website links to the Responsible Disclosure Policy so it is relatively easy to find that way. The policy was also incredibly clear that I had found the right place to send the report to (e.g. compared to others which had much more general and weaker language about abuse reports):

      We welcome reports concerning safety issues, โ€œjailbreaks,โ€ and similar concerns so that we can enhance the safety and harmlessness of our models. Please report such issues to [email address] with enough detail for us to replicate the issue.

  • Response timeline
    • No response.

Cohere

Cybersecurity report:

  • Process
    • No security.txt.
    • Google search came up with the Security page as the third result. This page suggested two ways to contact the security team: a contact email which seemed to be linked incorrectly (as the mailto included an extra space), and a link that appeared to be for contacting their sales team ( https://cohere.com/contact-sales ).
    • Just in case this was something clever, I contacted them both at the address listed, and at the address I thought they meant.
  • Response timeline
    • Got a response within 24 hours saying they had received the report.

Model vulnerability report:

  • Process
    • Google search came up with a blog and the Responsibility page. The responsibility page makes it clear such reports are accepted, and direct users to https://cohere.com/contact-sales.
    • However, it was not possible to submit this form with a @gmail.com address.
  • Response timeline
    • I received a response from their sales team that seemed to be a fairly generic sales message. After explaining what I was looking for and asking for it to be routed to the appropriate person, I received no response.

Press response:

Cohere thanked me for reaching out, and stated that they'd get the website team to fix the security mailto link remediated as soon as possible. Within 24 hours this had been fixed.

Google

Cybersecurity report:

  • Process
  • Response timeline
    • Received auto response directing me to bughunters form, saying that this email was not monitored.
    • See model vulnerability report process from here, as this is the same route.

Model vulnerability report:

Google DeepMind

Cybersecurity report:

Model vulnerability report:

IBM

Cybersecurity report:

  • Process
  • Response timeline
    • No response.

Model vulnerability report:

  • Process
    • First Google search returned many IBM pages (including an ad from IBM), none of which seemed directly relevant.
    • Second Google search returned Trustworthy AI, but this just seemed like a blog with no obvious way to contact this team to report a problem.
    • On the IBM website, the most obvious links looked to be:
      • Homepage main cover
        • The homepage was coincidentally a full page cover on AI: 'Now available, watsonx.governance accelerates responsible, transparent and explainable AI workflows'. Clicking 'more details' took me to a whole page that talked about AI governance at a very high level, but was not useful for finding contact details.
      • Header: Support > Generative AI
        • This took me to a page that listed various open source AI projects, but looked not to be the right place to raise a concern.
      • Header: Support > Open a case
        • This required an IBMid account, so I skipped this.
      • Header: More > Security, privacy & trust
        • This had a long list of different things. Many documents (IBM Business Conduct guidelines, IBM Principles for Trust and Transparency) seemed just to be generic policies.
        • The one that seemed maybe relevant was the 'IBM Security Vulnerability Management (PSIRT)' page. However, this seemed to be much more focused on cybersecurity.
      • Footer: Contact IBM
        • This had a list of several different contacts, but did not have anything on AI safety (or even cybersecurity). It did have a general contact form which is what I used.
  • Response timeline
    • No response.

Inflection

Cybersecurity report:

  • Process
    • No security.txt.
    • First Google search returned no relevant results.
    • Second Google search returned only Our policy on frontier safety. This describes a 'closed pilot', with no signposting for security researchers to report something.
    • On the website, the safety page has a collapsed-by-default section on 'Review and Improvement'. This lists an email contact.
  • Response timeline
    • Initially got a response from a customer service desk that seemed a little confused. However, after further clarification they seemed to understand.
    • Got a response within 24 hours saying they had received the report.

Model vulnerability report:

Meta

Cybersecurity report:

  • Process
    • Had a security.txt, which directed people to the Facebook reporting page. This required creating or logging into a Facebook account to access. Once I logged into Facebook, this resulted in an error screen stating 'This content isn't available at the moment'. I later realised this was because the profile enabled in Facebook had defaulted to a Facebook page I manage, rather than my personal account. On switching to my personal account, I was then able to submit the form.
  • Response timeline
    • Got a response within 24 hours saying they had received the report.

Model vulnerability report:

Microsoft

Cybersecurity report:

  • Process
  • Response timeline
    • Got a response within 24 hours saying they had received the report.

Model vulnerability report:

NVIDIA

Cybersecurity report:

  • Process
  • Response timeline
    • Got a response within 24 hours saying they had received the report.

Model vulnerability report:

  • Process
    • Google search returned Report Security Vulnerability, which listed the same PSIRT contact email.
    • The wording made it clear this was appropriate for AI concerns:

      How do I report a security or trust concern with an NVIDIA AI model or dataset (such as an NVIDIA AI vulnerability or hack, AI trust or ethics concern, malicious prompt engineering, data poisoning possibility, etc)

  • Response timeline
    • See NVIDIA cybersecurity process, as this is the same route.

OpenAI

Cybersecurity report:

  • Process
  • Response timeline
    • Received auto response directing me to their bugcrowd, saying that the disclosure email was no longer monitored. Submitted BugCrowd report shortly after.
    • OpenAI responded on BugCrowd quickly, within a few minutes. Not certain whether this was automated or a human, but I think a human.
    • However I then also got a separate confirmation from a human that they had received the report.

Model vulnerability report:

Palantir

Cybersecurity report:

  • Process
  • Response timeline
    • No response.

Model vulnerability report:

Salesforce

Cybersecurity report:

Model vulnerability report:

Scale AI

Cybersecurity report:

Model vulnerability report:

Stability AI

Cybersecurity report:

Model vulnerability report: