OHGOOD: A coordination body for compute governance
Core to many compute governance proposals is having some kind of register that records who owns AI chips.
This article explores how this register could be implemented in practice, outlining an organisation that maintains such a register and its necessary processes. It's named OHGOOD, the Organisation Housing the GPU & Others' Owners Database.
Motivation
Training highly-capable and broad AI systems requires lots of compute and data, along with efficient algorithms.
Compute is the easiest to track of these three since it currently relies on specialised expensive AI chips that can only be produced by a few actors in the world. Both data and algorithms are comparatively much harder to track: public datasets such as Common Crawl (a set of 250 billion web pages) can be downloaded by anyone, and key algorithmic breakthroughs that have enabled recent AI advances are published in scientific journals.
Training an AI model requires a computer to do lots of mathematical operations, like adding, subtracting, multiplying, or dividing numbers. When we say ‘compute’, we mean doing all these calculations.
The compute involved in training a model is typically measured in the number of FLOP, or floating-point operations. A floating point operation is a single calculation involving numbers with decimals, like 3.14 or 6.023 (as opposed to integers like 42 or 365).
Epoch, an AI forecasting organisation, estimated that training Google DeepMind’s Gemini Ultra took about 1026 FLOP. That’s 100,000,000,000,000,000,000,000,000 operations.
An AI chip is any kind of computing chip that could feasibly be used as a significant part of training a high-risk AI model.
This includes AI-specific chips, such as tensor processing units (TPUs) and other AI-oriented application-specific integrated circuits (ASICs).
In addition, it covers other chips useful for AI training, such as graphics processing units (GPUs). In the future other types of chips might also become more useful for AI training, such as field-programmable gate arrays (FPGAs) or particularly optimised central processing units (CPUs).
As of early 2024, NVIDIA seems to be a leader in designing high-end AI chips. There are also a few other large competitors at the high-end such as AMD, Intel and Qualcomm. Additionally, some large cloud providers such as Google and Amazon make their own custom chips.1TechTarget. Top 8 AI hardware companies (2023).
Over the last couple of years, the US has placed export controls on high-end AI chips, restricting their export to China. This hasn’t been perfect, with companies such as NVIDIA developing chips to work around the ban. The ban has spurred Chinese tech companies such as Huawei, Hygon Information Technology and Moore Threads Technology to develop competing AI chips - although manufacturing difficulties have slowed the production of these chips.2Nikkei Asia. China rushes to homegrown AI chips as Nvidia cutoff expands (2024).
In practice we probably care the most about newer high-end chips, as these are much more powerful than older ones. At time of writing, the NVIDIA H100 is considered one of the best and retails at $40,000 a piece. To train state of the art models, thousands of these chips are used: for example Meta and Microsoft each are believed to have bought 150,000 each. It’s expected that NVIDIA will make 1.5 to 2 million of these H100 chips in 2024.
By tracking AI chips, the hope is that we are able to identify people with the capability to train highly-capable and broad AI models, and thus many of the most risky models. We could then ideally verify that these actors are using their AI chips in a safe way.
Previous work in compute governance has briefly touched on the need for this tracking body:
- Shavit3
Shavit Y. What does it take to catch a Chinchilla? Verifying Rules on Large-Scale Neural Network Training via Compute Monitoring (2023).
put forward a framework for enforcing rules about large scale ML training, by recording snapshots of work done by AI chips (via on-chip firmware) and then requiring developers to show it is part of a compliant training run. In section 6.1 it explains a ‘chip-owner’ directory (corresponding to this proposal) is needed to be confident a developer is reporting all their training activity. - Baker4
Baker M. Nuclear Arms Control Verification and Lessons for AI Treaties (2023).
analysed the use of verification in nuclear arms control with a view to how it could be applied to a future AI safety treaty. In Annex G it describes how AI chip accounts might be verified, using methods analogous to nuclear arms control verification. The bodies responsible for these accounts correspond to this proposal.
This paper explores the various functions it would need to carry out in more detail, as well as some potential incentive schemes.
Compute governance methods, including the ones detailed in this paper, often rely on several assumptions. While there’s a good chance that these assumptions do hold, given that AI is a rapidly evolving field it is fairly uncertain whether compute governance will be robust to future changes.
Assumption: Large amounts of compute are needed to train the most risky AI base models. However:
- New algorithms may significantly reduce the amount of compute required. If they make training much more efficient, so much hardware might then need to be tracked it may become infeasible to do so. Additionally, even if this additional hardware could be tracked it might make the verification part infeasible.
- Narrow models, trained on far less compute, can still pose significant risks. In 2022, a narrow AI system suggested 40,000 new possible chemical weapons in just six hours after optimising a safe drug development model in the opposite direction.5
Urbina F., Lentzos F., Invernizzi C., & Ekins S. Dual use of artificial-intelligence-powered drug discovery (2022).
Assumption: This compute comes from high-end specialised chips. However:
- Future algorithmic advances might allow for training AI models on commodity hardware. Some methods that exploit the sparse nature of neural networks have demonstrated impressive training performance on commodity CPUs. Despite this, it still seems unlikely that this will unseat specialised hardware for training.6
Kirchner JH. Compute Governance: The Role of Commodity Hardware (2022).
Assumption: AI risk can be significantly reduced by controlling those training the base models. However:
- Model risks come from where they are deployed. The same base model might be fine to suggest birthday party ideas, but unsafe to give advice in a medical or military context.
- Model risks come from the affordances available to models. Scaffolding or tooling might make a model significantly more powerful: an expert cybersecurity model might be quite useful to a responsible researcher, but deployed as an internet-connected subagent of a project like ChaosGPT things might not go so well.
- If base models are released open-source, fine-tuning can make them more dangerous without using much compute. In 2023, researchers were able to undo safety training from Llama 2 for under $200. This allowed it to author a plan to torture someone to death, write a mass shooting threat letter, and plan someone’s fictional suicide.7
Lermen S., & Ladish J. LoRA Fine-tuning Efficiently Undoes Safety Training from Llama 2-Chat 70B (2023).
The idea
We propose an international non-profit body that keeps a register of AI chips and their owners. This register should be:
- accurate and up-to-date
- trusted or at least mostly verifiable
- accessible, in the sense that relevant stakeholders (e.g. nation states wanting to ensure compliance with a future treaty) can query or view the register
- international, given that AI chips are likely to move between countries as part of complex supply chains and that there is interest in global compute governance
The initial mission of this non-profit could therefore be:
Ensure the responsible use of AI chips, by making accurate, up-to-date and trusted information about global AI chip control easily accessible to relevant stakeholders.
In practice this is likely to involve:
- Recording when new AI chips are created
- Handling transferring ownership of AI chips
- Handling renting AI chips
- Handling destroying AI chips
- Determining who the ‘relevant stakeholders’ are, and making the information available them
- Encouraging compliance with all the relevant procedures
- Evolving to address advances in compute governance
We explore these processes in further detail below.
Organisations that share some parallels to this have been set up before, which suggests this structure is feasible. Additionally, more carefully studying these organisations is likely to be helpful to learn how to set up such an organisation successfully (but we don’t do that in detail here).
For example, the Internet Assigned Numbers Authority (IANA) is the organisation that oversees a register of who owns which IP address ranges used to connect devices to the wider internet. It’s part of the Internet Corporation for Assigned Names and Numbers (ICANN), an independent non-profit which has a multistakeholder governance model with many international interests represented. This is not a perfect analogy, because in this case the register is necessary for the IP addresses to function. However, it does show an example of a functioning international organisation that keeps track of who owns technology.
Another set of potentially similar organisations are those managing State Systems of Accounting for and Control of Nuclear Material (SSACs). State parties of the Treaty on the Non-Proliferation of Nuclear Weapons are obliged to have such a SSAC, which tracks where nuclear materials are within a particular state. Again this is not a perfect analogy, as while the register itself is more similar, the organisational structure is different: rather than a single independent international organisation, each state has a state-backed organisation to carry this out.
Now.
Unlike more advanced compute governance solutions, it is simple, requires no hardware changes and is a relatively cheap ask of people involved. Additionally, as it’s a necessary component of stronger compute governance solutions, it’d be good to get this groundwork done early.
When it is founded, it could potentially try to trace existing owners of high-end chips. This could be through reviewing existing sales records of key suppliers and following where those chips have ended up. This could be done using OSINT data, or collaborating with friendly major AI chip owners. It is likely easier to do the sooner it is founded.
Another alternative is that for countries onboard with the idea, legislation could require organisations that have chips already to disclose this. Currently we have some external estimates, but it’s unclear how accurate these are - especially for organisations trying to fly under the radar with how much compute they have (rather than those shouting about it to attract VCs or customers).
Getting this information could help bootstrap the organisation’s database, and then one up and running governments could mandate AI chip registration at various points in the supply chain (see below).
You, maybe?
As far as I know, nobody has tried to start this body yet. If you have the skills to found an organisation and are interested in this idea, you can reach out to me to be connected with some initial advisors and possible funding sources.
Technical implementation of the OHGOOD system as proposed would be fairly feasible. Most off-the-shelf database systems could make this work.
Assuming AI chips transfers or rentals happen on average every hour (this is almost certainly an overestimate, given the majority of chips are either directly owned or rented out for longer periods), and that we’d want to track 20 million chips (for comparison, NVIDIA sold 0.5 million A100 and H100 AI chips in Q3 2023), this is 5600 register updates per second. For comparison a laptop from over 10 years ago can handle up to 32,000 updates per second, and services like WhatsApp process 1,150,000+ messages per second.
If you wanted to be fancy, this does feel like the kind of thing that could lend itself to being implemented on the blockchain with smart contracts. This could eliminate the need for the central non-profit to be involved in things like recording the transfer of AI chips, and would by default mean more people had a copy of the register making it harder to tamper with. However in practice a central body is likely necessary anyways, and publishing logs transparently achieves similar tamper-resistance benefits with less complexity.
1. Registering new AI chips
When AI chips are created, they should have some kind of unique identifier. This identifier should be sent to the non-profit body with details about the chip.
Determining chips in scope
Chips in scope should be those that could feasibly be used as a significant part of training a high-risk AI model.
In most cases it’ll likely be obvious whether a chip falls under this definition, however there will be some edge cases where it is unclear. Where things are unclear, there are general arguments both for and against including them.
Including ambiguous chips maximises coverage and therefore reduces the chance important chips go untracked. It’s easy to later remove chips from the register if it becomes clear they do not meet the definition, but much harder to trace them down and add them.
Excluding these chips, this reduces the scope of the organisation and could make it easier to get buy-in from other actors given the requirements would be less of them.
This definition is still fairly broad. Further work could help develop a more precise definition. Doing so will be difficult, as it may need to be resistant to organisations working around such a definition - for example in 2023, NVIDIA developed the H800 and H20 chips to work around US export controls of AI chips to China.
AI chip identifiers
Most manufacturers of AI chips already issue serial numbers to devices, and so are used to generating unique identifiers for their chips. However, going beyond adding serial numbers there are a few properties of identifiers for compute governance that would be useful.
The identifiers should be hard to remove. Ideally its removal could make the chip inoperable, or at least any tampering should be obvious on inspection.
In addition, it should be hard to forge identifiers. This is to prevent bad actors pretending to be holding chips in certain locations or using them for certain purposes (and being able to pass inspections), while using the real chips somewhere else or for other purposes.
One way to achieve forge resistance could be to use cryptography. For example, a tamper-resistant secure element could be added to the chip, similar to Apple’s Secure Enclave root cryptographic keys. This could be used to hold a key unique to the chip, to sign data i.e. so each chip has a unique and mathematically difficult to forge signature. This would significantly increase the complexity of forging chips, while not significantly increasing costs: external chips implementing secure elements can be bought for under a dollar - a trivial addition to the $40,000+ retail price of AI chips (but for security purposes these chips would need to be on the same silicon die as the AI chip itself, rather than being an external chip that could more easily be swapped out).
Lastly, it should be possible to query the identifiers via software. This is likely to complement the identifiers being difficult to remove and forge, and makes it easier to remotely gain some assurance that the chips are genuine. While remote inspections won’t give perfect proof, they could serve as a low-cost type of inspection that can be done more frequently and at larger scale than a manual inspection, and gives some additional confidence in the control of the chips (especially if cryptographic measures are implemented).
While perfect anti-forgery measures are hard to attain given people have unlimited physical access to the chips, adding these safeguards would make it much harder for even well resourced bad actors to hide chip ownership. The increased complexity would serve as a strong deterrent against forgery itself, make it more probable that the forger makes mistakes that could reveal their actions, and require additional people to be involved, therefore making it more likely that one of them exposes the scheme.
While the above properties would make for good chip identifiers, current chips not implementing this should not be seen as a reason to block or delay founding a chip tracking body. Starting by tracking lower-quality identifiers would still be valuable in the interim.
In short: it’s mixed. Higher-end cards seem to implement this better.
Top-end NVIDIA chips (such as the A100, H100, A800 and H800) have unique serial numbers, accessible via software. It’s unclear whether it’s difficult to remove or forge these, but being software accessible likely makes it a little harder. There doesn’t appear to be any cryptographic verification.
Other NVIDIA chips also have unique serial numbers, but these are often just recorded as a numbered sticker on the card and box packaging.8Based on a the online forum posts I could find: Super User post, NVIDIA developers forum post and NVIDIA GeForce forum post.
They are not always queryable by software, and could be easily removed or forged.Documentation seems lacking for AMD chips, especially for their accelerator series (like the MI300 and MI250). For consumer grade graphics cards, they appear to only use stickers attached to the graphics card.
There isn't any information publicly available as to whether Google’s TPUs or Amazon's Trainium chips have identifiers, and I didn’t explore other manufacturers.
Sending information to the non-profit body
At time of creation, it should be relatively simple for the manufacturer to send information about the chip to the non-profit, e.g. over an API.
Incentives for manufacturers to do this are discussed in the ‘encouraging compliance’ section below.
2. Transferring ownership of AI chips
When chips are bought and sold, the register needs to be updated with the new owner of the chips.
At a minimum, the buyer must confirm the transfer as otherwise a seller could falsely claim that they had passed them on to a buyer when they hadn’t actually done this transfer. Additionally, the buyer is the one who could prove ownership of the chips if the identifiers had the cryptographic measures detailed above (whereas it’s hard for the seller to prove non-ownership in the same way, and it doesn’t mean that the seller has them).
However in practice, it is likely useful for both parties to ensure the details are correct before the register is updated. This reduces the chance of mistakes and could make the seller liable for incorrect updates. Additionally, requiring the buyer to install each chip and extract a cryptographic signature from them to prove ownership is likely unfeasible for intermediaries such as resellers.
Where this transfer of ownership takes place, each party’s identity needs to be appropriately verified, to ensure chips are being genuinely transferred to the expected organisation. This may need KYC (Know Your Customer) processes to be implemented at some stage: likely when an account is set up with the non-profit to avoid needing to do KYC for every transaction. To avoid organisations purchasing AI chips through shell companies or similar, they should be required to declare the true key owners - similar to people with significant control or ultimate beneficial owners.
Information about the transfer should be sent to the register in a timely fashion, for example within 7 days of it occurring.
Incentives for stakeholders to accurately record transfers are discussed in the ‘encouraging compliance’ section below.
Low-risk transfers
Chips capable of training AI models often overlap with chips for other uses. For example, high-end GPUs that could train AI models can also be used for playing video games, rendering video content, or mining cryptocurrency.
At the moment, high-end AI chips are so optimised for AI workflows that using them for other purposes is relatively rare. In addition, the increased funding in the AI space has increased demand for these chips, making them much more expensive and thus further discouraging other usages.
This said, back in 2018 chips were commonly marketed for a wider variety of workloads. Marketing materials for the flagship NVIDIA QUADRO GP100 highlight both its 3D rendering and deep learning competencies. This contrasts to current cards like the H100, where the []marketing materials just focus on AI. It’s possible that in future we see demand for AI-specific products drop, or demand for other use cases increase, such that cards become less AI specific again.
For cryptocurrency specifically, ASICs have become more popular for mining popular cryptocurrencies as they’re more cost-effective: ASICs are both cheaper to purchase and run (in terms of energy costs) for the same rate of crypto mining.
For gaming, the comparison is primarily with consumer GPUs. It’s somewhat unclear how good consumer GPUs are for training large AI models. In general, the consumer grade chips are often similar to professional ones although have less memory, less memory bandwidth and certain features gated off. This makes them significantly worse at training large models where being able to process lots of data through the GPUs is important. In addition, professional GPUs have features that allow developers to speed up training by parallelising more of the process e.g. NVIDIA NVLink allows connecting up to 256 H100 cards, but is not available on new consumer cards.
I’d be interested in further exploring how consumer GPUs compare to AI chips for large model training, given I think this would have significant ramifications for the feasibility of compute governance as a whole. In particular, benchmarks or other data that could help answer a question like ‘How many RTX 4090s would replicate the large model training performance of a thousand H100s?’ would be helpful. Found some? Get in touch and I’ll link them here.
Tracking small scale purchases of these chips, where it seems highly unlikely that the chip will be used for high-risk AI training, may create unnecessary overheads and privacy risks, particularly for individual consumers.
Thresholds should be put in place to determine when chips are transferred to low-risk owners and the chip can stop being tracked. This is likely to be based on a combination of:
- Chip type: e.g. $40,000+ chips, or chips designed almost solely for AI use cases, should be in scope
- Purchase quantity: e.g. buying thousands of consumer-level GPUs might be in scope
- Buyer information: e.g. whether they’re an individual or business, use of cryptic or vague identities, use of unusual payment methods, recent purchases of large amounts of RAM, network cards, or server motherboards. Lessons can likely be learnt from anti-money laundering processes.
Where a transfer happens to a low-risk owner, this should still be recorded so that it is clear that this transfer has occurred. This record should contain metadata, such as a retailer’s order id, so that details about this purchase could be investigated should the AI chip be later found in possession of an organisation training high-risk AI models.
An extension might be to require very high-end chips to have certain features unique to AI training permanently disabled before they are untracked. Such a policy would have to carefully balance the risk of the chip being used for dangerous AI training against the intentional destruction of chip capabilities that could be effectively applied elsewhere. This is likely only to be relevant in scenarios where AI chips are practical to use for other purposes e.g. for 3D rendering, which is not true of current AI chips.
Where large quantities of chips are becoming untracked, for example at large electronics retailers selling GPUs, audits should take place to ensure the low-risk transfers are genuine.
Finally, where a chip has moved to a low-risk owner, but a high-risk owner wants to buy the chip from them, this should be recorded on the register. Here the high-risk owner should be responsible for recording it in the register correctly, given they are likely the party with more resources and a better understanding of how the register works.
3. Renting AI chips
Many AI chips are owned by cloud providers, and are rented out to users including top AI companies. Key players in this space include traditional cloud providers such as Amazon Web Services, Microsoft Azure and Google Cloud Platform, as well as AI-focused cloud providers such as CoreWeave and Lambda Labs. For example, OpenAI rent the compute to power their research, products and API services from Microsoft Azure and Anthropic similarly rent their compute from Google Cloud and Amazon.
Understanding who is renting AI chips is therefore crucial to understanding who is ultimately controlling the AI chips, and potentially using them to train risky AI models.
Therefore the system needs to handle temporary transfers, as well as permanent transfers. Given the short-term nature of many such transfers (e.g. for on-demand hourly billing), the implementation of this process needs to be simple. One possible implementation could be delegating access to the cloud provider through an OAuth-like process to record rentals on the renter’s account.
Similar to low-risk transfers, an exception might be made for lower-risk rentals. Again, it should be recorded that the rental occurred, with some metadata that allows for further investigation - but full details about the renter might not be included.
Information about the rental should be sent to the register in a timely fashion, for example within 7 days of it starting. If at the time of reporting the end date of the rental is unknown, this should be noted and it should be updated when the end date is known. Curtailments or extensions of rental agreements should also be sent to the register.
4. Destroying AI chips
When AI chips are destroyed, damaged beyond use, lost or stolen the register should be updated to note this. For large numbers of chips, this might warrant an in-depth investigation to ensure the chips have not been diverted for potentially dangerous uses.
We expect relatively few of these reports involving large numbers of chips. Given that chips are powerful enough to be worth tracking, most of them will be highly valuable and therefore their owners will be incentivised to handle them securely.
Where an organisation with a large number of these chips is upgrading to a newer version or similar, the old chips are likely to be sold to someone else rather than destroyed. The few exceptions are likely to be very security conscious customers, such as intelligence agencies.9The UK standards for sensitive information require that any form of digital memory is destroyed by guillotining, disintegration, hammer-milling, shredding, incineration or smelting (Annex A). Most AI chips will have some form of memory built-in, requiring their destruction (e.g. graphics cards are given as an example in Annex B). For example, NVIDIA’s blog post on their H100 chip explains it has on-chip L1 and L2 caches (SRAM), and on-die HBM3 memory (DRAM).
5. Determining relevant stakeholders and making information available
There are a wide range of stakeholders that AI chip ownership information might be relevant to. This presents a number of options for register information disclosure including:
State parties, only for information in its state: The most strict form might be that information is only available to state bodies, for AI chips in those states. This feels like a minimum, given the country could create legislation to force organisations to share this information anyways. This might help unilateral compute governance measures e.g. to understand what competition is looking like within a state. It also would still allow states to independently decide whether to publish the statistics publicly.
All state parties: All information on the register is shared with state bodies that have signed up to some kind of treaty. This is different from above in that each state can see all other states’ AI chip ownership. In practice, states designate some kind of body to share this information with, e.g. a national AI safety institute.
Trusted non-state parties: Information on the register is selectively shared with a group of trusted organisations, based on some review process. For example, to access the information you need to apply with a use case which would then be reviewed by a governance team. This is similar to Research Data Centres for US census data, or access to US or UK healthcare data via PCORnet or OpenSAFELY.
Full transparency: All information on the register is made public. This makes accessing the information for different purposes easy and avoids the need to guard the register from information disclosure given it’s already public. Other analogous organisations work like this, even with sensitive data: IANA’s IP ranges are public (highlighting addresses where military equipment is connected to the internet), and the IAEA makes the location of member state nuclear reactors public via their PRIS platform.
Full transparency of the AI chip register should be the default starting point. Making the information public has several benefits - it reduces opportunities to hide dangerous chip usage, enables broader research and understanding of the AI compute landscape, and builds public trust through transparency.
6. Encouraging compliance
There are a few ways compliance with the processes above could be achieved. We explore using an international treaty, a sanctions-like framework, domestic enforcement, and a deposit scheme.
International treaty
An international treaty signed by key countries could create obligations for member states to have organisations within their jurisdiction comply with AI chip ownership rules. It would also obligate states to enforce this law and properly resource any national body responsible for overseeing the system.
Peer pressure from other member countries via treaty meetings as well as dispute resolution mechanisms for non-compliance create incentives to effectively implement required legislation. This could be especially powerful when combined with a sanctions-like framework detailed below.
Overall, this treaty would be similar to the Treaty on the Non-Proliferation of Nuclear Weapons, which obligates member states to track fissionable materials. It designated the IAEA as the international non-profit to audit compliance with the treaty (although the registers themselves are maintained by member states individually and data is shared with the IAEA, rather than the IAEA managing this information directly). Parallels between a potential AI treaty and existing nuclear treaties are explored more deeply by others.4Baker M. Nuclear Arms Control Verification and Lessons for AI Treaties (2023).
Sanctions-like framework
A properly maintained global AI chip register creates opportunities for enforceable sanctions on chip transfers.
Organisations with poor compliance records or countries with lax registration laws could wholesale be deemed 'high risk' - forcing more scrutiny of chip transfers to those jurisdictions. Entities found repeatedly flouting registration rules or broader responsible AI commitments could effectively have their access to advanced chips cut off worldwide.
A register therefore turns non-compliance with AI commitments into enforceable reputational costs and transaction friction, backed up by a credible threat of cutting off access to leading AI compute. Over time this could shape markets towards responsible and trackable AI development.
In addition, this might make countries with lax AI regulations be seen as difficult to work in, given more due diligence has to be carried out before receiving AI chips. This could create additional positive incentives to introduce effective AI regulation.
Domestic enforcement
Domestic regulators’ set up under the treaty should have the primary goal of ensuring the register is kept accurate and up-to-date. They should collaborate with the international non-profit to facilitate international inspections, investigate potential incidents, and explore ways to further encourage global compliance.
Regulators should be empowered to fully investigate missing or otherwise inaccurate registrations, and prosecute related offences. This will require properly resourcing them so they are able to effectively supervise powerful technology companies.
These regulators for ensuring register compliance could be part of wider AI regulators set up to enforce other related AI regulations.
Organisations that do not comply with rules around AI chip registration could receive fines or other penalties. Graduated penalties could distinguish accidental non-disclosure versus deliberate evasion or obstruction of oversight. Penalty size could also reflect company resources, from simple warning letters for smaller entities up to major fines or criminal charges for large multinationals willfully flouting obligations.
Deposit scheme
A deposit scheme would financially incentivize organisations to comply with AI chip registration. When producing a new chip, developers would pay a refundable deposit, of say 5% of the chip cost, which is returned to the then-owner in instalments, for example of 5 equal payments over the next 5 years. The exact amounts and repayment schedules would have to be set at a high enough level to encourage compliance during the period where the chip is still relevant to AI training, while balancing the increased costs added to purchasing AI chips.
Random sampling could be used to ensure registrations were accurate and up-to-date. Where this unearths batches of AI chips with inaccurate registration data, some part of the deposit could be forfeited as a penalty. This incentivises actors owning AI chips to keep records accurate.
Additionally, deposits left unclaimed can signify chips not properly registered, automatically alerting authorities to investigate the last known controlling organisation’s activities.
Compared to international treaties and domestic enforcement regimes, a deposit scheme is potentially easier to set up. This is because it only needs buy-in at one stage of the supply chain, which is a much narrower bottleneck than getting all countries involved in the transfer or use of AI chips to agree to a treaty.
The AI chip supply chain is quite narrow, with just a handful of key players dominating the market.
It's useful to understand three key types of players:
- people who design chips: known as fabless semiconductor companies. Examples: NVIDIA, AMD, Qualcomm.
- people who manufacture chips: known as pure play foundries. Examples: TSMC, GlobalFoundries, UMC, SMIC.
- people who do both: known as integrated device manufacturers (IDMs). Examples: Intel, Samsung, Texas Instruments.
To capture the market, you therefore need just the IDMs and one of the other categories. There are relatively few companies in this space, especially because both designing and manufacturing chips is extremely capital-intensive:
- being competitive in designing top AI chips requires top talent, which demand high salaries
- being competitive in manufacturing chips requires a fabrication plant ('fab'), which can cost over $20 billion, are difficult to source talent for, and often incur construction delays.
In addition, factors like software compatibility add further barriers to enter the AI chip market. This has resulted in some companies dominating the market, especially NVIDIA and TSMC. For example, 95% of AI chips are designed by NVIDIA, and Taiwan manufactures over 90% of advanced chips (primarily via TSMC).
7. Addressing future compute governance advances
The organisation set up to track AI chips should be forward-looking to ensure it is appropriately encouraging AI chips to only be used safely. This section outlines future processes the organisation might consider.
Declassifying low risk chips
As AI chips age and become obsolete for cutting edge AI work, the need to tightly track them diminishes. A declassification process could transition older chip generations to reduce or eliminate registration requirements for older chips.
Expanding hardware governance
While AI chips are the current focus, expanding hardware governance to other inputs to the AI training process could become necessary. This could include:
- raw silicon wafers
- lithography equipment
- high bandwidth memory
- high-speed or advanced storage devices
- very high-throughput network cards
The suggestions above are highly influenced by common methods of training and running today's AI models, particularly large language models. Currently, as well as top-end AI chips to do the computations, large amounts of high-bandwidth memory and network devices are needed to handle the large amounts of training data and model weight updates.
The exact types of hardware that should be considered for tracking need to be chosen with regard to the future AI chip supply chain, AI training methods, and understanding of how else these hardware components are used.
Advanced compute governance measures
After laying the groundwork for basic chip tracking, other more advanced governance approaches may be brought in, such as:
More granular location tracking: More precise locations of chips (e.g. which data centre) could help enable more in-depth verification measures and support investigations of lost chips.
Utilisation auditing: Telemetry or similar reporting could provide insight into the intensity and kinds of workloads being run on chips. For example, a retailer keeping chips in a storage warehouse to sell to retail customers is very different to them being at 100% usage in a data centre.
Training run compliance: Snapshots of training weights during model training could be taken, and later inspected to ensure the training run complied with future rules on safe AI training. Others have explored this in much more detail.3Shavit Y. What does it take to catch a Chinchilla? Verifying Rules on Large-Scale Neural Network Training via Compute Monitoring (2023).
These additional compute governance approaches would increase confidence that AI chips were not being used to create risky AI models. However, they would also place additional burdens on owners of AI chips.
A risk-based approach could be taken to introduce different governance measures. For example, large deployments of very high-end chips might be subject to the most strict and intensive measures, while smaller deployments of older or weaker chips might be subject to only simple ownership tracking.
Unknown Unknowns
Finally, emerging technologies may necessitate tracking new metrics not initially obvious. The non-profit administrating the register should continually survey the AI landscape for oversight gaps, and update governance controls as necessary.
This should include the possibility that compute governance is no longer a viable option to govern the development of high-risk AI models. While unlikely, this could happen if algorithmic breakthroughs or significant general hardware advances mean a much wider range of actors could train dangerous AI models, such that tracking compute did not add much value.
Risks
The above proposal comes with a few potential risks that should be considered before implementing it. Despite these risks, we believe as proposed it’s still likely significantly net positive (but before starting it would be worth doing a botec!).
Privacy risks
Detailed AI chip tracking risks unnecessarily infringing people’s rights to privacy, as well as creates a wasteful regulatory burden. This can be mitigated by:
- excluding AI chip tracking in low-risk circumstances, such as small purchases to individuals for purposes unrelated to AI
- declassifying low risk chips over time, to avoid excess tracking
- only requiring more intrusive compute governance measures for larger scale deployments or high-end AI chips
Only focusing on high-end data centre AI chips excludes 99.99974% of semiconductor chips, limiting privacy risks.
Promoting arms race dynamics
Transparent AI chip registers theoretically reduce arms race incentives by providing mutual visibility into rival capabilities. However, this may backfire in case a state has significantly more AI chips than another and this provokes a fear or political pressure to ‘catch up’.
Managing these situations is likely to be difficult. Careful framing of this information before release, that encourages collaboration or negotiations between states would likely be necessary to minimise fallout.
Other similar agreements have been thought to generally reduce tensions between states. For example:
- The Treaty on Open Skies, where member states grant others the permission to fly observation aircraft over their territory to gather information on their military forces, with the idea that greater transparency can reassure countries that potential adversaries are not about to go to war.
- The IAEA carries out inspections of civilian nuclear sites, which sometimes unearths non-compliance with nuclear weapons agreements. So far they have generally seemed able to flag issues effectively to encourage compliance, without escalating them into arms races.
Barriers to entry
In general, introducing regulations creates some additional burden on organisations operating within the area. Additionally, this often affects new entrants the most - as they don’t have the existing resources to absorb the compliance cost.
One proposed method for encouraging compliance was a deposit scheme. While this aligns incentives, it increases the capital needed to purchase AI chips and thus could also discourage new startups in the area. This could exacerbate the risk of concentrating power in the hands of few organisations that do currently have the capital to build state of the art AI models. If used, the deposit scheme contribution amounts would need to balance reducing dangerous AI model training against this risk.
Security risks
A register with details about high-end AI chips raises security concerns.
Even without location data, it is likely possible to know when chips are being transported by aligning register data with other OSINT data like ship, plane or train tracking databases. This might help adversaries steal valuable chips that are potentially dangerous in the wrong hands. Further investigation is necessary to validate whether this truly is a credible threat (as this might already be possible, or it might be that the register doesn’t help). If it is a risk, this might be mitigated by delaying public release of the data, or redacting data about particularly vulnerable points in the supply chain.
Extended versions of the register with more location data will pose greater risks. Governments are likely to be hesitant to publish locations of secure facilities with AI chips as this could make them more vulnerable to attacks or sabotage. This more sensitive information might be aggregated, or only selectively disclosed to trusted partners.
Governance at Scale
Running any large international organisation poses significant challenges due to the number of stakeholders involved. Each member country brings varying geopolitical interests, creating a complex landscape to navigate.
Additionally, running the organisation's operations is likely to be challenging. In-person inspections might necessitate operating in many different countries, and the technical nature of AI research will likely make finding qualified technical staff difficult.
Feedback
This is one of my first public blog posts on AI governance. I’d be keen to receive feedback via the contact details on my homepage.
Acknowledgements
Thanks to Rudolf Laine for reviewing and providing feedback on an early draft of this document. All errors are still very much my own!
Footnotes
-
TechTarget. Top 8 AI hardware companies (2023). ↩
-
Nikkei Asia. China rushes to homegrown AI chips as Nvidia cutoff expands (2024). ↩
-
Shavit Y. What does it take to catch a Chinchilla? Verifying Rules on Large-Scale Neural Network Training via Compute Monitoring (2023). ↩ ↩2
-
Baker M. Nuclear Arms Control Verification and Lessons for AI Treaties (2023). ↩ ↩2
-
Urbina F., Lentzos F., Invernizzi C., & Ekins S. Dual use of artificial-intelligence-powered drug discovery (2022). ↩
-
Kirchner JH. Compute Governance: The Role of Commodity Hardware (2022). ↩
-
Lermen S., & Ladish J. LoRA Fine-tuning Efficiently Undoes Safety Training from Llama 2-Chat 70B (2023). ↩
-
Based on a the online forum posts I could find: Super User post, NVIDIA developers forum post and NVIDIA GeForce forum post. ↩
-
The UK standards for sensitive information require that any form of digital memory is destroyed by guillotining, disintegration, hammer-milling, shredding, incineration or smelting (Annex A). Most AI chips will have some form of memory built-in, requiring their destruction (e.g. graphics cards are given as an example in Annex B). For example, NVIDIA’s blog post on their H100 chip explains it has on-chip L1 and L2 caches (SRAM), and on-die HBM3 memory (DRAM). ↩