Measuring Carbon is Not Enough — Unintended Consequences
There are plenty of problems in the way of measuring the carbon footprint of an IT workload, and I’ve been leading a Green Software Foundation project to make it easier to get better measurements for cloud provider based workloads. Along the way I’m learning a lot about the subtleties of measurement and reduction of carbon, and there’s some complex and counter-intuitive nuances that I’m going to try to explore in this post. I’ve linked to several papers and blog posts along the way.
Let’s start by assuming that you have figured out a way to estimate the energy consumption and carbon footprint for a workload that is sensitive enough to register changes in the way you run the workload. That’s quite a big assumption, but I want to focus on what happens next. You want to reduce the carbon footprint of the workload, and need to decide what to do.
The simplest thing to do is nothing. The electricity you use is gradually being cleaned up as solar and wind replaces coal and gas, and the embodied carbon used to make and deliver the machines you use is also getting better over time, as carbon optimizations take effect in supply chains. You trust your suppliers to solve the problem for you, but that could take a long time. This is basically the position of the major cloud providers, they will all take care of it for you by 2030 or so.
But you want to *do something* to help sooner. The first thing to understand is that doing things consumes time and energy and has a carbon footprint of its own. For a sense of proportion, a single economy flight between the USA and Europe is order-of-magnitude a ton of carbon. The carbon footprint of a large rack mounted server is order-of-magnitude a ton of carbon a year. Buildings, food and transport have a much bigger carbon footprint than IT globally.
You could try to estimate the average annual carbon footprint of employees at your company to come up with a carbon per hour per employee number, then total up the time spent discussing what to do via meetings and emails, and the time spent working a change. Like any optimization project, you need to figure out the return on investment (ROI) for different project options and pick the one that looks like it will have a worthwhile carbon reduction ROI. For many companies, that high ROI project may have nothing to do with IT, but for now lets keep focused on sustainable development and operations practices, which tend to dominate for “digital” product companies.
Carbon Optimization
The simplest carbon optimization projects focus on doing less work and using less of the resources that contribute to the carbon footprint. These also usually save money. However, while the measured carbon attributed to the workload goes down, the resources you stopped using are still there.
In the simplest case, you have a growing workload, and you optimize it to run more efficiently so that you don’t need to buy or rent additional hardware, so your carbon footprint stays the same, but the carbon per transaction or operation is going down.
However if you have a containerized workload that you are measuring with Kepler, or a cloud instance based workload that is running in virtual machines, you measure your carbon footprint based on how many and how big the containers and instances are, and how busy their CPUs are. CPUs consume power at idle, and due to various monitoring agents, operating daemons, application background task overhead etc. etc. they are never really idle. A rough guide if you don’t have any better data is that with no traffic to a system it will be 10% utilization and use 30% of peak power, 25% utilization uses 50% of peak power, and at 50% utilization it uses 75% of peak power. So you get a small energy benefit for reducing average CPU utilization, but you get a bigger benefit for increasing utilization by downsizing the container or instance or using fewer of them, and that also reduces the share of embodied carbon attributed to the workload. There are some interesting projects based on Kepler measurements that try to optimize pod scheduling and vertical scaling to reduce power consumption.
The best way to increase utilization is to spread out or prevent load peaks, so that the system runs without problems at higher average loads. Load peaks can be caused by inefficient initialization code at startup, cron jobs, traffic spikes, or retry storms. I’ve written before about how to tune out retry storms. For larger horizontally scaled workloads autoscaling by adding and removing instances from the service is the best way to increase average utilization. This can be improved by reducing start up time, avoiding the need to drain traffic from instances before they are removed, and predictive autoscaling that cuts and adds load faster than reactive autoscaling.
Consequences
Even though the carbon attributed to your workload has gone down, that may not be the outcome you really wanted. If you want to “save the planet” then the outcome you want is that the consequences of your optimization project reduce the total carbon emitted for the planet, not just for your own workload. This brings in a topic that is important and poorly understood, the difference between attributional and consequential carbon accounting. Both are well documented and standardized, but most of the discussion occurs around attributional accounting.
For an example of an unintended consequence, let’s say the result of your optimization project is spare capacity at a cloud provider. Then that capacity is then offered on the spot market, which lowers the spot price, and someone uses those instances for an opportunistic CPU intensive workload like crypto or AI training, which they only run when the cost is very low. The end result is that capacity could end up using more power than when you were leaving it idle, and the net consequence is an increase in carbon emissions for that cloud region. Alternatively you could argue that eventually the cloud provider notices an increase in spare capacity and delays purchasing a capacity increase to the region, which decreases carbon emissions. It’s clear that one problem with consequential models is that the choices of what to include and the timescales to take into account are somewhat arbitrary, include a lot of external information that you may not have access to, and are likely to be challenged if you publish your results. The main requirement in the Green House Gas Protocol is that the boundary of the consequential model is clearly defined and documented as part of carbon reduction project accounting.
The other area where this gets complicated is related to the sources of renewable energy, how they are attributed and what the consequences of changes could be. The normal attribution model is to use an average carbon intensity for a grid region that is calculated over an hour, a month or a year. The carbon intensity is a mixture of carbon free sources like solar, wind, battery, geothermal, hydro and nuclear, versus carbon intensive gas, oil, coal. Carbon intensity tends to reduce as solar and wind are added, but also reduces as coal and oil is replaced by gas. The lowest cost (usually solar, wind and gas) and cleanest sources are preferred, but when peak loads arrive, it’s currently common to use gas to supply the peak. In some grid regions there is too much solar and wind at times, so the sources are curtailed, and some clean energy isn’t generated. This explains why you might see a wind farm that isn’t operating, on a windy day.
Another complication is that there are two ways to calculate carbon intensity for a grid region, the production intensity, and the consumption intensity, as described in another ElectricityMaps blog post. The production intensity is based on the energy generated within that region and is easier to figure out, the consumption intensity takes into account that significant amounts of energy can be imported and exported across regions, and it’s a better measure of the carbon intensity of the energy you are using. Google’s carbon data for each region is based on an accumulation of hourly location based consumption based carbon intensity measurements (sourced from ElectricityMaps) over a year. That number is different to the average carbon intensity for the year (also reported by ElectricityMaps), because if Google uses less energy when the carbon intensity is worse, or more when it is better, that improves their hourly weighted average for the year. Of course, Azure and AWS use different methodologies to calculate their carbon footprint, which are currently based on monthly averages of the energy they purchase (using the market method, rather than what they use via the location method). There is also some amount of selling of renewable energy credits across borders, in case this wasn’t complicated enough.
Given the above, what is the consequence of using more or less energy? This is called the marginal emissions. In many grid regions an incremental kilowatt will always be supplied by a gas powered “peaker-plant”, even if there’s substantial carbon free energy in the average mix. In some grid regions batteries are beginning to be used, and if solar or wind curtailment is happening, then it’s possible that the consequence is carbon free. It also depends what time of day or night you use the energy, and how close to the peak load the grid region is when you increase your load. The marginal emisions in real time can be obtained via a commercial service from Watttime, but is not freely available to a cloud end user trying to deciding what to run where, when, to minimize carbon. There is also an argument as described in this ElectricityMaps blog that it can be better to use the average carbon intensity as that drives better long term decisions. Part of the goal of the GSF Real Time Cloud project is to make more of this information available in standard ways, and to provide an interface that could eventually support better decisions on an hour by hour basis.
That’s all pretty complicated, but what are you going to try and do to make a difference? Moving workloads around the world to reduce their carbon footprint sounds like a good idea on the face of it, but can backfire if you keep moving them as I described in a previous “Don’t follow the sun” blog post. Picking a lower carbon region over a high carbon region for a workload and staying there is a better option. However the cleanest regions may cost more or be further away from your customers. In addition if we increase the energy use in a clean grid region, there’s less clean energy there to export to a nearby dirty region, so the gobal consequence may be less clean than you hoped. That’s a bit of an edge case though, so in general it is good to site workloads in ways that get cloud providers to grow their clean regions faster than their dirty ones.
As an example, many of us now have electric cars and solar panels. When should you charge your car? It might be cheaper at night when the load is low, and time-of-use electric costs are reduced, but there’s no solar, so the marginal grid mix is probably higher carbon. I charge mine during the day, when the sun is out and my own solar panels are moving energy from an inverter just a few feet away to the car battery. This is extremely efficient, and I can be sure I’m putting solar power in my car, and doing it at a time when the grid might be curtailed. A counter argument might be that the solar energy I didn’t put into the grid was needed to run air conditioners during the day, so a gas peaker plant had to supply it, and the low overnight energy load could be supplied by wind farms rather than gas. I happen to live in California, close to one of the largest grid battery installations, but I have no idea what’s actually happening from hour to hour.
I think the good news is that over time it will get less complicated. For example if you run in the cloud provider regions near Montreal, Google reports that energy is 100% carbon free hydro powered, all the time. There is however a small residual carbon footprint related to construction and maintenance of renewable energy sources. Over the next few years grid regions around the world are de-carbonizing, along with supply chains for manufacturing, delivery and maintenance.
Bottom Line
The common approach is to use the previous year’s annual average consumption based grid mix carbon intensity in carbon footprint calculations, as the data is freely available, and this gives you an approvimate attributional result that may be good enough to give you confidence that your optimizations are directionally helpful. To do a consequential analysis you need to use the marginal carbon intensity, which can vary a lot more in real time. The GSF SCI publishes an average marginal data set, and there are real-time data sources provided by WattTime and ElectricityMaps on a commercial basis, with free access for previous years data from ElectricityMaps, and free API access to real-time marginal data for Northern California from WattTime. The GSF Software Carbon Intensity specification has published a case study that shows how to do this for application workloads, where SCI produces a marginal carbon per transaction figure for a specific workload.
I think the bottom line is that what we want is the good consequences of global carbon reduction, but in practice it may be too complicated to figure out all the externalities and data sources and it’s especially complex to get everyone to agree which ones to include, and which methodology to adopt, so we will mostly end up continuing to focus on reducing our energy use, calculating and reducing our attribution model carbon footprint. However we should also think about the bigger picture, short and long term effects, and understand where there may be large unintended consequences.
When making claims about carbon reduction projects, to be accurate, attribution model reductions should be stated along the lines of “we reduced our own carbon footprint”, but if you want to claim you are actually “saving the world”, you should spend some time figuring out a consequential model for your project, use marginal carbon intensity data, and reading this white paper by WattTime would be a good start.