Don’t follow the sun: Scheduling compute workloads to chase green energy can be counter-productive
We want to reduce carbon emissions of our compute and storage workloads, and one way of doing this is to choose a time and place where the “grid mix” of energy consumed is less carbon intensive. In particular, there is usually an excess of solar energy during the day that makes the grid mix better than at night. This leads to people suggesting that workloads should move around the world to follow the sun and run on green energy. This seems reasonable at first, but there are several reasons why it’s not really going to help, and may actually increase carbon emissions.
The computers you stopped using aren’t following the sun. They stay in the same datacenter and are still turned on and using power, whether you use them or not. If you are running on a cloud provider then someone else is likely to be using them, perhaps getting a bigger spot market discount incentive than before. Just because the carbon emissions aren’t charged to your account, it doesn’t make them go away. Meanwhile your workload is generating demand in a different cloud region, and all regions do demand based capacity planning, so the cloud provider buys more computers, which increases carbon emissions both for manufacturing and shipping (scope 3) and the energy they use (scope 2). In addition, the data your workload operates on either needs to be accessed over the network, which increases capacity demand for switches and undersea fibers, or an additional local copy of the data would be maintained, which all adds to the carbon footprint.
Cloud providers track how often customers ask for capacity of a specific instance type in a region and don’t get it. AWS calls this an Insufficient Capacity Exception (ICE) and works very hard to avoid it. There were problems reported on Azure during the COVID lockdown where lots of people were suddenly working from home and the global change in workload mix meant that capacity ran out in some regions. At that time the AWS Spot market shrank to the point where some customers had to move workloads to regular instances. You can think of the spot market as the buffer capacity for a cloud provider. If it gets too small, they provision more racks of systems. When someone asks for a regular instance, if there are no idle ones available, it gets taken from a spot market user.
If you can choose where to put a workload, then it’s worth picking a region that has a good grid mix, most of the EU and US based regions from the major cloud providers are already running on low carbon energy, but Asian regions are currently high carbon, reducing over the next few years. If you have global backup/archive workloads where you store copies in different regions, it would be best to avoid storing them in Asia.
If we consider how best to run a daily batch job, the default is often to start it at midnight or on an hourly boundary. This is particularly bad for the cloud providers, as many customers running cron jobs at the same time causes load spikes in network and storage services, that makes them need to deploy extra capacity, which increases the over-all carbon footprint, even if it doesn’t show up on your bill. So it’s good practice to cron-jitter start times. Would it be better to run it after sunrise when the grid mix is better? Again, not if that would increase capacity demand for the cloud provider. The best option would be to run the batch job when the spot market has the lowest price. Spot market prices could even be manipulated to take carbon emissions into account, and use that mechanism to smooth out peaks and incentivize workloads to move to lower carbon regions and times. If the batch job is scalable (like a Hadoop or Spark cluster doing ETL processing), and the deadline is flexible then it would be better over-all to run a smaller cluster for longer, as that smooths out the total capacity demand at the cloud provider level.
Another way of looking at grid mix and where to run a workload is to consider the mix for incremental additional usage. If you have an idle computer it uses less power (about half) than a busy one, so when you make it busy the extra energy demand comes from whatever power source is currently operating as a peaker-plant. Even if most of the total energy is renewable, the peaker-plant is most likely to be a fossil gas power plant, although in the coming years we’ll see more large scale batteries being used. This is a complicated issue and there’s been some discussion about it at the Green Software Foundation recently.
I suggest that the best policy is to optimize your workloads so that they can run on fewer more highly utilized instances, minimize your total footprint in Asia where possible, and to use the spot market price as a guide for when to run workloads.
Some people seem to have the attitude that they should just optimize for their own workloads, and it’s the cloud provider’s problem to solve for capacity and carbon footprint issues, but this is shortsighted. If the way we all architect and deploy workloads makes it easier for cloud providers to maximize their utilization and deploy less total capacity, then we are helping to reduce the carbon footprint of the computing industry as a whole.
The problem here is a “tragedy of the commons” kind of issue. If a lot of people optimize their workloads to reduce their own carbon footprint by moving workloads around, the combined effect is to increase the capacity and reduce the utilization of the cloud providers and to increase the carbon footprint over-all for the computing industry.