The Hidden Power Bill of Artificial Intelligence with MIT's Vijay Gadepally

Energy vs Climate 05.Mar.2026

A single AI model running uses as much electricity as 10-15 households combined. And that's just one query.

Vijay Gadepally joins Ed and Sara to break down the real energy footprint of AI—and why most people (and companies) are getting it wrong.

They discuss:

How "agentic" AI systems use an order of magnitude more energy than ChatGPT.
Whether efficiency gains can keep pace with exploding usage (spoiler: not yet).
The one simple change that could cut AI energy use by 80%.

Vijay is Senior Scientist at the MIT Lincoln Laboratory Supercomputing Center and Co-Founder of Bay Compute and Radium Cloud. He studies what's actually happening under the hood of AI systems—and has the data to back it up. If you've been wondering whether AI is derailing the clean energy transition, or whether smarter software design could keep energy use in check, this is the conversation you need to hear.

Listen Now

References & Notes

(02:00) Radium and Bay Compute.

(10:30) Traditional server racks typically operate at 7-10 kw, whereas AI computing racks can have power densities of 30-100+ kW per rack.

Data Center Rack Power Costs: A Condensed Analysis (June 2025)

(12:10) MIT Technology Review, Making an image with generative AI uses as much energy as charging your phone (December 2023)

(16:25) “Inference will account for 65% of AI compute by 2029 and 80-90% of lifetime AI costs.”

AI Inference vs Training Infrastructure: Why the Economics Diverge (March 2026)

(25:20) AI has become more energy efficient, improving by 40% each year.

Stanford Institute for Human-Centered Artificial Intelligence, The 2025 AI Index Report

(26:25) Forbes, Why 'Tokens Per Watt' Is Crucial For Measuring AI Efficiency (October 2025)

(31:15) AESO Announces Interim Approach to Large Load Connections (June 2025)

(35:40) A “tech singularity” causing human extinction scenario graph:

Financial Times, Who’s right about AI: economists or technologists? (November 2025)

(36:43) Forbes, Should You Say Please And Thank You To ChatGPT? (may 2025)

(41:25) CBC, Alberta’s AI data centre boom unleashes ‘gold rush’ for electricity allotments (February 2026)

(42:15) Town of Olds, SYNAPSE DATA CENTER INC. Announces Major Data Centre Development in Olds, Alberta (January 2026)

(47:00) 20.7 gigawatts of data centres are proposed in Alberta, almost equal to the current installed capacity.

Reynar IT Data Centre Innovation, Alberta's Data Center Boom Has a 1,200 MW Problem (February 2026)

Episode Transcript

[00:00:00] Vijay Gadepally: A single computer, a node in which one of these AI models runs, uh, can often be somewhere between 50 to 30 kilowatts. That's, you know, roughly equivalent to about like you plus like 10, 15 of your neighbors all running your average daily use at once.

[00:00:17] Ed Whittingham: Hi, I am Ed Whittingham and you're listening to Energy Versus Climate, the show where my co-host, David Keith, Sara Hastings, Simon and I debate today's climate and energy challenges on February 26th.

Sara and I recorded an interview with Vijay Gadepally, senior Scientist at the MIT Lincoln Laboratory Supercomputing Center, and co-founder of Bay Compute and Radium Cloud. Vijay studies, artificial intelligence, workloads efficiency, and what's actually happening under the hood of these AI systems. We had a wide ranging conversation about the energy footprint of ai, what is driving the surge in data center electricity demand, and whether the industry's efficiency gains.

Can keep up with the explosion and use. So if you've been wondering whether AI's about to derail the clean energy transition, or whether innovation and computing might help keep the energy footprint manageable, this conversation is a good place to start. Now here's the show.

Vijay. Welcome to Energy vs Climate

[00:01:15] Vijay Gadepally: Ed. Thank you so much for having me.

[00:01:17] Ed Whittingham: Great. We're delighted to have you on the show. Maybe off the top you wear both an academic hat. And being a senior scientist at the MIT Lincoln Laboratory Supercomputing Center. But again, you also have a commercial hat and you're a co-founder, a bay commute and radium cloud.

So maybe just as background, you can talk a little bit about your work at both.

[00:01:39] Vijay Gadepally: Yeah. So, uh, it's very exciting to be able to be between the two worlds of both academia as well as on the commercial side. Academia is amazing for coming across great talent, coming up with some really innovative ideas that may not be timed for the market yet.

Uh, and then through the commercial side, it's great to be able to make large scale impact. Uh, so Radium Cloud is a cloud computing company. We make lots of AI infrastructure available to people and people can use the equipment to do things like generative ai. Interacting with the chat bot. So we provide through the company some of the hardware that people use in order to actually build these.

Uh, bay Computes a more recent company that was founded looking at the inefficiencies of power flow within the development and deployment of AI systems. And so Bay Compute is a company that's focused on making data centers where a lot of this AI happens. Making those data centers, those big buildings operate far more efficiently and that enables.

AI users, AI creators, to be able to do more with a lot less, whether it's power, ar, water, really any of these, and that's what Bay computes focused on really matching how the infrastructure is actually supporting the AI workloads.

[00:02:59] Ed Whittingham: Yeah, I noticed when you go to the Bay Commute website, at the risk of giving you free promo, the first thing is the age operating system for data centers.

And we're, we're gonna talk about, uh, age agent AI and data centers a lot in this conversation. I think, and maybe we, we can start there. So, age, AI and reasoning models. They seem to be fundamentally more energy intensive than say earlier digital tasks. AI is not just like more search right now. It's a real step change in computing and it's expanding rapidly.

It's getting more efficient, it's getting cheaper, and therefore its use is exploding. And I can just speak from personal experience of how. My workflow is very different today than it was just a couple years ago. From your vantage point, how different are generally workloads from a couple years ago? And then what are the biggest misconception people have about AI and energy use and where, where energy is actually coming from?

Sorry. I know those two questions loaded in one. Feel free to tease them apart if you want.

[00:04:10] Vijay Gadepally: So maybe I'll begin with just, uh. Very brief history of AI and what's been going on now. AI is just, or artificial intelligence is a very broad term for software systems that are learning patterns from data and making decisions, or in some cases, predictions without being explicitly programmed for every scenario.

Now, AI has been around for a while. As much as it's more common in our parlance these days, it's actually been around for almost 70 years, right? The term AI was actually first coined in the 1950s, but there have been eras or revolutions in ai, and so not all artificial intelligence or AI is the exact same, the sort of the traditional machine learning.

Which we use every day. These are things that predict things like credit risk or equipment failure. And then we have what's more common today, what we call generative models. And large language models are a good example of that, that can actually produce text code images and can even reason through problems in ways that we would, we would look at it.

So a lot of the traditional machine learning, we were often referred to these as expert systems. A human being was heavily involved with helping the machine. Understand what the rules are. In fact, some of, uh, my PhD work was done on autonomous vehicles and the earliest autonomous vehicles, and this is in the 1970s and the 1980s.

We've had autonomous vehicles. But people would actually write rules saying that, okay, in this circumstance you're gonna drive 30 miles per hour. In this circumstance, you drive 50 miles per hour. Here is when you will change from 50 miles per hour to 30 miles per hour, right? When you see a stop sign, for example.

So those were expert systems where a human being was heavily involved with guiding the ai. These systems are still used quite extensively today. Uh, your things like fraud detection, demand forecasting, predictive maintenance, a lot of these actually use expert systems where they're saying, if I see the performance of a system degrading to this level, I'm gonna put a flag and then I'm gonna have somebody come and look at it.

Now over time we got a lot more computing available and we were able to essentially dial back the amount of human. Knowledge that was required to be embedded into these AI systems. And that led to this era of what we often refer to as deep learning, which is a subset of machine learning in which we're able to replace a lot of that human knowledge that we were embedding in the original AI systems with statistical models.

And the best known example of that is neural networks. And so deep learning as much as it existed for some time was really about. Learning more from the data, using the statistics of that data, learning patterns from it. You are still inputting some data though. You're still telling it, Hey, here's the type of neural network that I need.

Here is the type of information I'm gonna be feeding it. Some of my work was on hidden markoff models and Gaussian mixture models, in which case we had to embed some underlying distributions of how the data looked. But that was what we'd call the deep learning era. Now for about the last eight, nine years, uh, we've been in this new era, which we often refer to as degenerative AI or the large language model era.

This is where we've been able to even further dial back the amount of human information we've needed to embed in artificial intelligence or AI systems and basically learn a lot of these patterns from scratch. And so the. Major hallmarks of this era are the ability for us to have a collected lots and lots of data, having the right computing available to process this, and then new algorithms or architectures that allowed us to essentially not have to label or really train this data, but.

It was able to learn by itself. It was able to look at information and self label, what it thought the information was. And so that is the, the step change that we're seeing today. And you know, you kind of mentioned this earlier about agentic systems. This is what is probably the next era or what is emerging right now.

This is where instead of. Just providing patterns or being able to generate new information. But these agentic systems are able to actually plan reason, figure out what tools they need to call, loop, iterate, and kind of keep doing things over and over. So these have all been from the macro level, sort of changes that have occurred over time.

This is not something, uh, there's been some great innovation of course, uh, but over time we've been able to. Essentially throw more compute. We had more powerful computers. We had the ability to store and collect lots more information. And these are just sort of a natural progression of the way, uh, that we've been able to build these systems.

[00:08:43] Ed Whittingham: For our listeners, Vij, it might be useful just, you know, before we go any further, just to differentiate between what we're calling a agentic AI from a reasoning model from, uh, a standard large language model.

[00:08:54] Vijay Gadepally: A large language model is, you know, I'd say the biggest aspect of that is the ability to generate content that mimics a human.

Uh, so it's learned from lots of data and it's able to generate new content. AI agents are sort of the next step of that, which is not only are they able to generate human content, but they're actually able to learn about decision making. So an agentic system, the biggest difference between that and a large language model, and remember they are actually connected.

An agentic system probably calls a large language model at some point. Uh, but the major differentiation there is the agen AI is actually actioning or performing some action beyond just generating content. So. What you'll often see in these systems that they're either planning, they're using reasoning, they're calling different tools on their own, but they're actually making actions or doing things on their own, which is what makes them a little bit different than just, uh, say your a chat bot, which would respond to a query.

This is actually going to not only respond to the query, but take that and go do something with it. And so they're now connecting these language models into physical systems to go and execute actions.

[00:10:00] Ed Whittingham: Got you. And there's a, you know, significant discussion and I think a lot of misinformation around per task, a reasoning model or an LLM or an agentic AI command, how much energy it's using and whether that's on a token basis or on a kilowatt hours basis.

Can you give us an example? So it's concrete and I've heard you on a another show, use the analogy of, uh, a dishwasher,

[00:10:26] Vijay Gadepally: what's often misunderstood. In this is the tremendous amount of energy that's used by these systems. A single computer, a node in which one of these AI models runs, uh, can often be somewhere between 50 to 30 kilowatts.

That's, you know, roughly equivalent to about like you plus like 10, 15 of your neighbors all running your average daily use at once. That's just running one copy of the model, uh, at a given time. And you can imagine how many of these copies of models there are over time. So simple tasks, uh. Each one of these has been a step change in terms of the amount of energy that's been used through each of these eras, whether it's the expert system to the statistical machine learning from statistical machine learning to generative AI and generative AI to agent.

So each one has used more and more energy or real-time power in order to execute the task. So there's been a lot of analyses that have come up. And I will just note one thing that it is, a lot of these analyses that we made are based on our best open source tools. And what we do not have is transparency about the models that we actually use every day.

Whether it's a Gemini or a GPT five or an philanthropic claw opus, we don't actually have that data. So that's one of the, I'd say deficiencies of the field right now, or one of the challenges of the field. And so a lot of this analysis that we've done has actually been based on. Equivalent open source models.

And there's a few things that we've learned, uh, first of all, that, you know, large language models themselves, even the. Ones, the versions that we had two years ago use a lot of energy and the order of the energy, depending on how many times you interact with those things, can easily be, uh, equivalent to charging your car.

Uh, you know, doing thousands of tech summarizations, for example, can lead to that one. Image generation can be as much as. Charging your phone for a day, uh, you know, charging your phone overnight. And that's just one image generation at a time. And so these things start to add up very quickly. Now we move to these newer models, which are called decision models of reasoning models.

These are actually the backbones for agent agentic systems. So now when you, when you're using any of these tools, you probably see it thinking, it shows what it's thinking. Those systems actually use an order of magnitude more energy. And the reason for that is they're often behind the scenes kind of going back and forth amongst themselves, and it's becoming a lot more difficult for us to actually measure the energy use of these systems in the, you know, what we call the standard LLMs.

A single query would pass through the model and it would ha use a. Roughly defined amount of energy for that query so we could make estimates per token. This is how much energy is being used with these newer reasoning models. It's actually a lot tougher for us to figure out because even a slight change to your query can lead to a very different energy usage.

So if you make your query ever so slightly more complex, that reasoning model may go through a lot more back and forth in order to get you an answer. So we are actually starting to fall off the place where it's becoming easy for us to say. This query positive, this much, uh, this much energy utilization and the agentic systems are essentially those reasoning models, but on steroids because not only are they calling the, but they're doing lots and lots of back and forths on this reasoning model.

Uh, we did a recent experiment where we had, uh, an agent do something very simple like, Hey, help me plan a vacation to, you know, x, y, Z country. Find me flight tickets that are inexpensive. We let that thing run and then we counted how many tokens were used to use. Nearly half a million tokens just for that single agentic workflow, which was going to different websites, pulling up information, maybe going to Google flights, looking for flight information.

But all of that combined came to about half a million tokens, which is, which is quite a lot.

[00:14:13] Ed Whittingham: Yeah. And sorry, as someone who doesn't speak tokens, can you put that into sort of a rough kilowatt hour basis?

[00:14:19] Vijay Gadepally: So it's a little bit difficult to do that, as I said, 'cause it is heavily dependent on the type of model that we're using and the type of query.

So a couple of years ago, I could have given you a, a far more straightforward answer of, it takes, you know, this many joules per token. Those, uh, a a lot of those measurements have become very difficult for us to, to say, because if I ask for a flight to Barcelona, it might look from a token count, very different than a flight to New Delhi, uh, just because the complexity of one of those might be more than the complexity of the others.

[00:14:51] Ed Whittingham: Yeah, the plot thickens. But you've answered my question because I was wondering about the unpredictability and energy consumption from prompt to prompt. So what you're suggesting is a fairly high level of unpredictability. It also lends credence to what Sam Altman has cautioned and telling people not to be polite with chat, GPT and Claude and other models like that cut out the politeness that just.

You know, creates unnecessary, uh, energy consumption in terms of actually then distilling and analyzing that politeness. One more foundational question, then I wanna open up the, the floor to Sara. So we've got with, with reasoning models and LLMs, you've got training and inference, and my layperson's understanding training is.

The however many, hundreds of millions of, uh, I don't know, I'm grasping the word, but the cycle that it would need to do in order to acquire its data set. And then the inference comes with us as users, then having it infer based on the training and that data set and. My understanding is that it used to be most of the lifecycle energy costs would come from the training, but I think that's now flipped that most of it is coming from the inference.

Is that right?

[00:16:02] Vijay Gadepally: Ed? You're correct. And this kinda leads back to, I think one of the questions that you had earlier of what is one of the misconceptions that people have about AI and its energy use, and it is that, which is that training is the main energy problem and it turns out that. That was true maybe five, 10 years ago.

But that is not true anymore. And the reality is that training is episodic. It happens every now and then. Large companies do it maybe once every couple of years or so. But inference, which is what we are now using, is continuous and now dominates. And in fact, by a number of estimates, we're seeing that what used to be 20% for inference, 80% for training, has now flipped to a point where it's.

80% for inference of the energy lifecycle energy use of a model. 80% of the inference, uh, of the energy is going towards inference, and maybe only about 20% of it is going towards training. In fact, for some of these very large models, the cost parity from an energy perspective is somewhere around maybe 500 million to a billion inferences equals one training run.

And so if you think about every chat auto complete summarization. Whole generation that is going on across this world, it's very easy for us to understand that getting to 500 million or a billion inferences can happen relatively quickly, whereas you're not training the model, you know, every few days, but you're probably hitting those numbers of inferences, uh, you know, weekly or maybe even faster, right?

We don't have actual exceed stats. So these are all just estimates based on, you know, how many, how many times does an average person use some of these tools? Coming back to the AI energy uses, we have a lot of conversation about the hardware, the GPUs, which are certainly a major part of the energy story, but there's a lot of ancillary processes.

So things like cooling power, uh, power conversion losses, uh, redundancy margins. But these are, this infrastructure layer is also adding to this energy problem in a fairly significant way.

[00:18:01] Sara Hastings-Simon: This is, this is really interesting. Um, you know, as Ed Ed mentioned, I'm definitely a bit of a, I dunno what the right word would would be, but I'm not someone who uses, uh, generative AI a lot.

Maybe just taking a step back for a minute and thinking about what is the way that this industry is moving for. Forward and, and you know, it strikes me, I, I don't know if it's totally a fair analogy, but it strikes me a little bit that the government, whether that be, you know, the government of the US or I think equally here in Canada, has been relatively hands off when it comes to regulation around this tool, which, you know, can be, I think.

Quite powerful in all kinds of different ways. Um, and so the analogy that came to mind when I was talking with somebody about this recently is actually with the pharmaceutical industry where, you know, we, we have the ability to make very powerful drugs, right? That can do all kinds of different things to, to our bodies good and bad.

Um, and that's a huge tool of course. And, you know, I certainly wanna, wouldn't want to live in a world where that doesn't exist. But at the same time, I think we've understood that, you know, this is a powerful tool that can be used in different ways and therefore should be highly regulated, right? And so you can't just go to a store and, you know, buy whatever drug you want.

You need a qualified person, a doctor, or you know, someone with, with the appropriate qualifications, a pharmacist in some cases to prescribe the medication and decide if that's appropriate for you. And so, you know, whether, whether there is an element, and maybe you can comment on that as well too, whether there's a role for regulation when it comes to actually being able to constrain somewhat or, or set, um, the AI development on a path that is more or less energy intensive as well too.

Um, but it sort of feels to me now that, you know, we're, we're a little bit. Out over our skis in the, in the Wild West. If I mix my metaphors, um, of, of sort of introducing this tool without really having regulations about it. And you know, as a university professor, my colleagues and I are seeing how that is happening in ways that I think is quite.

Detrimental to, to students, unfortunately. But, but also, of course, beyond that, I, I won't go through the, the line of stories. So, um, it a little bit of a soap up, although interested to hear your thoughts on that as someone that's working in the industry and maybe particularly around the, the element of energy that comes in there.

[00:20:24] Vijay Gadepally: I, I'd say one of the biggest challenges, the field is moving too quickly. No one wants to be caught. Not being ahead of the curve. Uh, and this is, and it is a national security thing as well, right? We have, you know, our, our peers, our, our friendly nations, but then we also have adversaries that are building capable technologies.

And so there is a very real. I will say legitimate fear that, uh, this tool is moving quite fast and we don't want to be in a position where we somehow regulated ourselves out of being competitive. And so that's, I think, a very real, uh, concern. And I do think it's, there's opportunities for efficiency that can help us achieve those goals without giving up our, uh, without giving up our innovation or ip.

And so because there have been numerous instances now where, you know, certain countries that have tried to regulate very, fairly heavily have kind of fallen behind. And that has both an economic, uh, output, right? Where maybe some of the top researchers, some of the top uh, developers are moving to places where it's a little bit easier to get access, whether it's to the compute, to the power, however it is.

Uh, but at the same time, this is causing a very real, uh. It's causing very real impact to our society. And so it's still that balancing line. I would argue that, and I'm not sure I have a very good answer of like what those should be. I do think though we've tried to regulate so many different things that power, if we were to do, do something to kind of point a finger at, it would be power or energy would be the direction that we would look.

'cause that is the fundamental physical limit to a lot of these things. So while. You kinda look at the eu, uh, AI act, they have some regulation with respect to like in how many flops or how many types of, uh, you know, which is floating point operations per second, uh, that they're trying to look at, like which model is, you know, how, whether you kind of hit this constraint of having review or not review is based on sort of the size, the amount of data that you're using.

I would argue that. If we can kinda leave that aside and use power and maybe it's more of a, you just have to prove that this is valuable to us. We're not, no, I don't think we want to yet be in the business of saying no, because we don't know what innovation is coming down the coming down the pipeline.

But at the same time, as we've seen quite recently, we have. Labs coming from other parts of the world that are generating, that are creating these models that are performing e or similar, I should say, to some of our proprietary models, but doing so with far less energy. And so clearly there are some efficiencies that people are overlooking because there isn't a strong economic incentive yet to do it.

So I think that if we could. Maybe use economics of the problem to help people. There are orders of magnitude that we could likely drop in terms of energy or power, energy utilization or power consumption. If we just kind of thought a little bit more about it. Right now, we have an incentivized that, so I think that that's, I don't know how to write regulation.

I'm certainly not a policy person, but that would be the area I would target first of like. Places where people are being wasteful and you're, you know, it's like, you know, you're, you're ordering three dishes for dinner when you can only eat one, right? So we're not telling you not to eat dinner. We're just saying, Hey, how about you just pause for a second and maybe take as much food as you think you'll need.

And if you need more, we can figure it out after that. But just be less wasteful is, I would say, maybe the initial place of regulation. I think we're already starting to see companies moving there because. It is becoming tough to find power, and so people are starting to think, how can I do more with less?

Imagine you had a lemonade stand and you had a lemon tree. You had that whole lemon tree to yourself, you could probably be wasteful with how you squeeze that lemon to make the lemonade. Right. Maybe just a, a light squeeze. And then, you know, as soon as it got a little bit difficult, you could toss the lemon and get another one.

And the lemon tree was generating more lemons at that rate. But now I imagine five people have put their stand under the same tree. Now everyone's starting to squeeze that lemon a lot harder. And I think we just need to move to that space with, you know, power and energy where. We now have the same power grid, which is the lemon tree here.

And now you have, you know, instead of one person, you now have 10 companies that are under that same power grid trying to use it. So everyone should just be a little bit more efficient with how they're doing it. And I think we can incentivize that.

[00:24:44] Ed Whittingham: We could do a whole nother show just on, I. AI and the lack of strong regulations, but also where regulations are popping up, whether it's surveillance text or, uh, attack or or facial recognition bans or New York around, uh, algorithmic hiring that governments are taking some steps.

Uh, but they are definitely baby steps and, and especially in California, in the Bay Area. The last thing they wanna do is regulate out innovation. But let's, let's, let's do a bit more on energy efficiency and then let's talk about grid implications. So we've, I, I know we've seen like about a 10 x improvement in operations per watt over the five years, and that, that sounds like good news.

But just like, and we did a show recently, VJ on Aviation where we're managing to drive down on an energy uses basis or a carbon intensity basis, the intensity on a per passenger mile basis through tech, hardware tech and software tech innovation. But it's all being overwhelmed by absolute increases and.

I think my hypothesis is that we're seeing the same, that any efficiency gains we're seeing to date in ai, that is just being outstripped by just more users using AI and those AI producers and tech companies being able to find to date the electricity that they need. But let's, let's just define efficient ai, like what does that mean in practice?

And it's efficient compared to what? Like what would be the baseline that we're looking at?

[00:26:22] Vijay Gadepally: Yeah, so when we talk about efficient ai, we're saying we're trying to compare efficiency with things like. How many tokens per Juul and for those who aren't familiar, token is it's just a unit of work, uh, that, uh, large language model does.

So we want to know how many tokens can you process per juul, which is, uh, an energy measurement or per kilowatt hour or per watt hour, however you want to define that. Um, the other could be something like inference cost per task. So. Per time I interact with one of these, uh, tools, or I provide an agentic system, a task to perform.

How much, what is the inference cost of that? Another could be what's the accuracy per wa. Of course, I could use a lower quality model, which would obviously be able to do. Maybe process more tokens per JUUL or more tokens per wat hour, but my accuracy might degrade. So that's another efficiency metric that we're looking at.

And then the other could be time to answer per kilowatt hour. And something that we've been using a lot these days is this, what we call tokens per watt. Uh, so these are all different measurements that are both incorporating. A metric that's used in the large language model or the generative AI domain, but kind of merging that, whether it's in the numerator or the denominator, but merging that with something that has to do with power and energy.

And so the real question really becomes a. How do we switch the efficiency? Uh, right now, efficiency is purely defined by what's efficient for the operator, which is for the company that is making this available to the public. And I think what we're trying to get to now, given the impact that AI has in such a wide thing, is what's more efficient for society.

And so while a lot of the, I'd say incumbents, the big players in this field are looking at optimizing the cost per token in their, which would largely be dollars and cents that they're spending to either process, to process one of these tokens. What we as a society probably care about is the energy, power emissions per useful outcome.

So I think that's the, the change that we're we need to make as a society. Somehow in coming back, perhaps there's a role for policy and regulation to help kind of move those upstream and make them also commercially, uh, valuable goals where. That's what we're thinking about. So, 'cause right now these two are not necessarily aligned with each other.

If you're looking for purely cost per token, uh, you're looking to maybe generate your own, uh, energy, natural gas. You don't wanna waste time connecting to the grid where you might have more renewables. You're looking at, you don't really care where it happens. You might find places where. There's a stressed watershed, uh, you're, but you're looking for a areas with like lax emission standards to actually go and incorporate this.

Whereas society where if we care about, you know, what's the quality of the power or the energy or the emissions that you're using. That might change the calculus in a fairly significant way. So as of now, these two are not aligned with each other. And so when we come back to what are we comparing against, it's, you know, are we making progression in these joint metrics which combine both the quality of the answers along with the energy, power, carbon emissions.

Water usage. Right. Any of these other pieces

[00:29:37] Ed Whittingham: and the hyperscalers right now, do you think, uh, do they have enough incentive pressure are to actually optimize for energy or are they, are they still optimizing for performance, for like model performance and, and for market share?

[00:29:53] Vijay Gadepally: Yeah, if you had asked me this question probably two years ago, I said, yeah, of course they have a lot of incentive to, to do that.

But just given the recent changes, and I'd say, you know, since some of these newer reasoning models, these ENT workflows started to come in, I, I know many of the people who work there. I think there are amazing people who have, you know, the right goal in mind, but there's also just commercial competition in place.

So I think those two things are. Yes, they do have the incentives, but it may not be as strong right now as the competition to who can get the bigger and better model out there, who can get this new market share that's coming in. As we're seeing this, you know, between the Anthropics and the Open AI and the Googles, right now, their incentive is to build the highest quality frontier model because that's how they're able to.

Gain market share, the efficiency will come, uh, because I know that that's obviously core to, you know, it's part of their, their ethos. But right now it has taken a backseat, I would argue in, in a lot of these companies compared to just gaining performance.

[00:30:51] Sara Hastings-Simon: Yeah, and I think, I mean, I think that's another great example, frankly, of the kind of risks of.

Letting us move forward without that regulation. 'cause there has been a lot of step backs, as you say, even on the sort of websites of these companies that that used to kind of proclaim sustainability as a core element that, that have moved away from that. So, you know, you mentioned the grids and, and sort of electricity usage and of course that is one of the, I think, probably easiest ways to, um, regulate in terms of.

You know, those powers already are directly existing. So, you know, we have been hearing projections of massive data center growth, talking about tens or hundreds of, uh, terawatt hours. Um, we have some interesting examples right here in Alberta where there is, was actually a limit placed on the amount of new.

Um, CAPA or New demand that could come on the grid from data centers. Um, and we can talk a little bit about some of the other projects that are moving ahead in a second. Um, but wanted to just start by getting your reactions to how plausible are these projections of, you know, 300 terawatt hours of additional US data center demand?

Um, and what do the utilization rates look like in terms of. What, uh, at what level are data centers operating? You know, are they able to run? Um, kind of, are they able to throttle back? What, what is the potential to, to use these actually, you know, to respond to, to supply availability?

[00:32:13] Vijay Gadepally: Yeah, so I do think that at a high level that these numbers are quite plausible.

Sort of this context, uh, 300 terawatt hours would be roughly seven to 8% of US energy, uh, US energy, total usage. And so I think there's a lot of variability of course in these. In fact, the best estimates I have are like a factor of two. In terms of their, uh, confidence window, right? So they're quite large.

The estimates are, you know, from 2030 to be anywhere between six to 14% or seven to 14%. Uh, so it's a fairly wide window they're giving themselves. Um, and so I think the cases in which it's, you know, we're probably trending to some of the higher energy or power use cases is if AI inference becomes ubiquitous.

We are starting to see some of that early stages. Another question is, does it plateau or does it continue? The second is these agentic systems. They are incredibly power hungry. They use so much more energy than the prior generations because they're constantly doing this like back and forth conversation.

They're calling different tools, they're taking actions. Sometimes that action spurs another round. And so if these AI agents start to scale, or is agentic systems start to scale. That's another direction, which I think we can start to hit the high end. And then if the data center builds that we're currently investing and continue at the current pace, yes, we may very well be at the higher levels.

Now, it is important to note that, you know, in a lot of these estimates, they kind of come from two different directions. One is like a bottom up with they're taking. Okay, here's how much hardware these companies are going to ship. Here's how much power it takes to power the hardware that they're going to ship.

Let's assume that they sell out. Um, and then here's how much a data center, you know, inefficiency adds on top of that. And that's one way of making the estimate. The o other top down approach is, you know, here's how many things we have. Uh, you know, here's how many AI users we might have. Here's what the power utilization may be, or the energy consumption for those users might be.

But some of these forecasts. It's kind of difficult, right? So they assume maybe a 24 7 utilization at high levels and it is a lot more difficult. I think this is, we're we're talking that, you know, it's been two years, maybe three years since this technology has become. Quite pervasive. Uh, you know, I always joke with people, my laptop battery lasts longer than it ever did now because it is offloading everything into the cloud, right?

It barely does anything on, on the, you know, it's, it has become a really powerful web connection, and so that's how you're getting these, you know, 12, 14 hour battery lives. It's just, it's not that, you know? Yes, the technology has also improved a little bit, but it's also, we've been able to figure out how to offload most of it to some other thing that's plugged in.

And so that's where the real world utilization part becomes a lot messier of really trying to understand what's the rate at which this is gonna scale. Like these agenda things are cool. But are we gonna use it? Are we gonna replace travel agents with this? And is everyone gonna now talk to these things 20 times a day?

Is it solving complex problems or, or do we light, say, well, if it's something that an AI can figure out, then maybe it's not important. All these performative things that we, you know, people use it to, like help write performance appraisals and all that, maybe we do away with that, right? If I, if all it did is a prompt to an AI system, then maybe I don't need to ever generate that content and I can generate it on the fly whenever I need it.

And that could be different. Um, right. So I think it's gonna come down to how we utilize these, but there, I, I would say that they're very plausible.

[00:35:39] Sara Hastings-Simon: Okay. Yeah, and I mean, it's a little bit, I think, unfair to ask for a, a projection of the future, as you say, with this uncertainty. I don't know if you saw that recent, um, graph, I think it was in the Financial Times.

We'll, we'll link it, but it, it was projecting GDP growth rate forward on the basis as a of ai. And it literally had like, you know, either text singularity, we completely end scarcity in GDP skyrockets or we like. Have human extinction and GDP goes to zero. So, uh, I thought it, I thought it was, uh, uh, pleasantly honest for a, um, uh, forecaster for once to, to kinda say that.

Um, but Ed, I think you wanted to jump in as well.

[00:36:20] Ed Whittingham: So I've heard you Vij talk about noms that we can use. To actually drive efficiency or drive down unnecessary consumption and less compute, more efficient compute, make software environmentally aware. And I'd love to know sort of of those knobs, which of the three of them do you think has the greatest potential?

And also just get some thoughts. There seems to be a lot of low hanging fruits out there and, and you just touched upon it in terms of. Getting users to understand unnecessary computation, and maybe it's getting users to understand, or it's actually using AI to reduce unnecessary computation and redundant prompts and accept excessive output length, um, overprovision workloads, like whatever it is, more than just Sam Altman going public and saying, please don't use please, and thank you in your prompts.

[00:37:15] Vijay Gadepally: It comes back down to just not using compute. That's not adding right. When we talked about what does society care about, it's per useful output or per useful task that's executed. There's a lot of compute that's just going into the ether. We do the compute. We never look at the results and we move on.

And so there's huge waste because we're doing things like even in the world of large language models, a lot of redundant prompts. Probably a lot of people are asking the same questions every day. Hey, what was the summary of the State of the Union? Right? Those type of questions, which are, uh, often very redundant.

I mean, I myself have found myself, have taught myself asking the same questions. I've already asked this question, uh, but I'm asking it again 'cause I'm too lazy to go. Look for where I asked this question the last time. So, um, another one is like the overly long outputs. We have the wrong economic incentive when you use any of these APIs, right?

Or when you connect to these things programmatically, you're paying per the number of tokens that you're using. It's not tokens per watt, not tokens per juul, but it's number of tokens. So these systems are almost incentivized to give you very long answers. Uh, and we see that, right? I mean, how many times do you get like the first sentence of your queries, probably all you needed, and then it just gives you like, would you like me to, you know, talk about this and this?

And like, I didn't need any of that. I didn't need that first line. But you're often paying for it. Uh, so they'd rather just give it to you, right? It's like the, uh, getting the extended warranty. When you buy a car, you didn't need it, but you know, it's a high margin thing so that it's gonna give it to you, right?

They already got you. Uh, and then there's a lot of like models because there's like, everyone wants to give you the best answer because they know people are shopping around. There's a lot of just maximum reasoning modes. They're just giving you the best answer every time. Even though your query might be simple.

It might just be who's the president of the United States? That's not a complicated query, but they'll still use this big fancy model to give you that. Right. I sometimes, uh, you know, I liken it to like someone trying to use a chainsaw to like cut a hamburger or something like that. You might just be able to use a small knife and get your job done, but there's this just not.

Optimal mixing of different tools. And then the other piece is, you know, kind of stepping back, and this is some of the research that we've done at MIT, even on like training these models, we spend a lot of energy on doing things. We actually did a really interesting use case where we were using a technique called Monte Car Montecarlo Simulations.

So this is the series of simulations in which we run different parameter sets on. Model to see which one provides the best output. Now, in a typical setting, you might run thousand or tens of thousands of these different configurations in order to find which is the best model. And as you can imagine, once you find the best one, all the computing that you did on the other 9,999 models, you essentially toss.

You keep the one, maybe two models that are doing really well. And so we used AI to help improve that process. So what we did was we actually started to track how the model is performing, and we were able to very quickly eliminate nearly 80% of the runs that we were doing. So let's say you're doing 10,000 runs, we could remove 8,000 of those runs very quickly.

Often within about an hour or two of running on say a multi-day, uh, training process. And the way we were able to do that was by actually tracking how the loss was progressing. And we could tell very early like, alright, that's probably never going to work. So we were able to run the system and you know, we were able, we did not lose the best performing model we did.

We always, uh, achieved the highest performing model. So we never pruned or removed any of the models that weren't performing well. But we did get rid of nearly 80% of the energy use for training that model, and we still ended up with the same answer. So it's that type of approaches, right. So we just eliminated a lot of compute that wasn't necessary.

It was not adding, it did not discover that new drug. It did not discover that new material, which is, we were looking at material discovery, uh, application. It didn't change any of the outcomes, but it did reduce a lot of the compute that was needed. So that concept that we think can, um, make a, uh, a, a fairly quick impact, right?

80% in one shot's, fairly significant amount of, uh, energy to reduce.

[00:41:26] Sara Hastings-Simon: So bringing this like energy question now back to the grid. I mean, I guess I'll share more of a story about what, what's going on here in Alberta. Um, so I mentioned that there is a limit that's been placed on the amount of new generation that can be asked of the grid for these data centers.

Actually, interestingly, there was a, uh, a story recently. Completely forgetting where I saw it. Probably C, B, C, um, about how a data center had, or a proponent had sold their capacity allowance to a new data center proponent. Um, and I think the article was saying it was sort of, at least in Canada, no one had.

Obviously heard of that happening before. So also interesting. We always find a way to, um, to create a market for, for anything no matter what. We also have some examples of large data centers that are planning to come, um, off grid as well too. So, um, I imagine this is similar to elsewhere, but the biggest one that we have, and maybe you can tell me how that compares to what, what you see in the states or in other areas is not too ha far from me here in Calgary, it's in the town, uh, of olds.

Uh, and it's about a. Depending on what you read, one to 1.4 gigawatt, um, size data center. Um, so that's for Alberta listeners. It, it's roughly the size of the power demand from Edmonton. So pretty significant if you think about it on the scale of, you know, our grid, um, overall, um. They are planning to, as far as I understand, um, have the data center built with natural gas generation on site.

Um, interestingly, I'll note going back to the renewable, uh, moratorium, which is again a very Alberta thing, but, uh, the, the land, um, that it's being placed on is farmland. So it seems like it's okay to convert farmland to a data center and natural gas. Can't put solar on it, but you can do this. And of course there's, there's concerns in terms of, you know, what is this going to mean for the emissions from the plant and so forth.

I know that, uh, you know, I don't, I don't know how accurate this is now, but I've seen, um, commentary that, you know, it's up to like seven years now based on the supply chain in order to get equipment, say, to build a new combined cycle natural gas plant. So I'm not sure actually if this, if this, uh, facility is planning on just building simple cycle plants or they, you know, have a longer run.

Away. Um, but this is not, you know, I tell this story to say that this is not just theoretical. We definitely see this happening, um, close to home. You know, there doesn't seem to be any indication that this plant is, um, going to be doing any onsite renewable generation. And I would actually argue that that's.

Some went more of a problem for a year round type, um, off grid project, like a data center here in Alberta where we suffer from this like very, uh, relatively larger seasonality of solar than say, if you're gonna try and to build this thing down in Phoenix where, you know, it might be more imaginable to build it off grid with, um, solar and, and batteries together.

So, um, that's, that's the close to home story. But maybe you can just comment on, you know, what you see happening sort of across the states and how this compares.

[00:44:29] Vijay Gadepally: Sometimes it's really valuable to kind of follow the money and there is huge economic incentives for data center bills. There's significant capital available.

Uh, as much as there are some moratoriums, people have figured out ways to work past that, you know, off grid, uh, or behind the meter power being one of the, the ways that they've done it. And so I think this is a trend that's gonna continue, at least in the near term. You know, the challenge is right, why this is happening is if you look at what companies are sort of incentivized to do energy as of now has not really become sort of that KPI that they're looking at or that, you know, key performance indicator that they need to, you know, move very quickly.

So. I'd say that, you know, there's still just an incentive, misalignment, or mismatch between these different communities right now. Right? You're a cloud provider. Is, you know, a vast amount of their capital is going towards ensuring that they have the right hardware and enough power. Your data centers are just trying to stay up operational, be able to expand any way that they can.

'cause they have, they're sitting at record, low vacancy rates, and then your energy grid on the other side is saying, what is going on here? I don't understand what's happening downstream. I think this is, this does not have to be the future. Um, I think we have actually, if these groups could actually start to work together.

I think the opportunities for data centers and cloud providers to actually be a net asset or a net benefit to the grid is actually quite high. They have the ingredients to do it. We just don't have, you know, everyone's kinda operating in their silo. The incentive of growth for the person in the middle is kind of at the cost of people on this side.

The, the, you know, KPI for this one's at the cost of this one, you know, at the cost of someone in the middle. So I think it's just, if we can almost like align, like, Hey, you all get promoted if the following thing occurs. You know, a lot of these problems can happen very quickly, that we might be able to move to a place where like, oh wow, I'm so glad a data center came in my neighborhood.

'cause last time we had a storm, they were able to use the big batteries that they had on site to help power the community through that storm. Uh, rather than right now it's like I just got, uh, brown out and we think it's the data center right? Which is seems to be the, uh, sort of the default stance today.

[00:46:50] Ed Whittingham: Yeah, and it's, I mean, that's the role of the electricity system, O operator and policy makers. So it's entirely within the realm of possibility. But I think I share your skepticism or concern that we don't have the different actors talking to each other, to Sara's local example. If you look at here, the Alberta electricity grid from data centers alone, they're projecting an additional 12 gigawatts.

Of load that will come, which is equal basically to, to the capacity of the current grid right now. So we're gonna have to double the grid. And that's just based on the data centers that are on the books. And I don't think anyone, at least I haven't heard the discussion of data centers being used for demand response.

Which they couldn't entirely be, especially when you get this behind the fence generation of the kind that Sara talked about. Nor do I hear any discussion or real discussion of AI systems becoming aware of real-time electricity prices or what we'd really want aware of carbon intensity and being able to time their generation or time their consumption based on both of those factors.

[00:47:54] Vijay Gadepally: Yeah, and I think that is the opportunity, right? We have treated, we, you know, the royal we here, consumers, developers, everyone has treated AI as a static SLA bound activity. And due to that we have essentially given up many, many areas of development. And so I always like to tell people that. The biggest thing we could do is start to be flexible, and that flexibility can show up in many different ways.

The flexibility could be, you know, on this highly carbon intense or this extremely bridge strain day. I'm willing to take the shorter answer. I'm willing to have you just gimme, you know, a hundred tokens and I'll ask for more if I need it, but just gimme a little bit, right? I, I want to keep using the tool, but I don't wanna, like, or the flexibility could be, I'm willing to wait a few seconds or a couple of minutes.

For this, you know, I'm doing this travel planning for next summer to go somewhere. If you don't gimme the answer in the next five minutes, I'll, I'll live. Right? You can give it to me in the middle of the night maybe when things are much better. Right. So being able to shift things, the flexibility from the consumer perspective, and this could also be, I'm willing to deal with a slightly worse answer because at scale that starts to add up and we've done sort of experiments on all of these and shown that it has tremendous impact from the the cloud side.

The flexibility could be. I'm willing to throttle my hardware for a little bit. We've also shown in our research that, you know, being able to say, reduce the amount of power, anything like that can have a fairly large impact to, uh, energy consumption without really changing the, it's not changing the answers at all, but without even really changing the, the performance in which it happens, which is why all of this, you know, the energy ai, a lot of the energy going to inference is actually a really good thing.

We have a lot more. Angles or levers that we can adjust then with training where we didn't have as many of these, uh, as many of these to go. So I think that, you know, some of the signals that we think we can bring in are things like electricity prices, which hopefully are being driven by the grid and strain and carbon, carbon intensity, grid congestion.

Like all of these are signals and our AI can start to. Use this to become flexible, we can then integrate these into our software and then actually incentivize using things like demand response and more, uh, to have pricing models that are reflecting the true energy costs. And I think Texas and the United States is one of the first states to have this, that if you're trying to build a new data center and you can demonstrate that your ability to say, curtail your load for some time.

There, there's some fairly large economic incentives for people to think about that. So I think it's a good start. It's not the end, but it's definitely getting us at least talking about the right things.

[00:50:38] Ed Whittingham: Yeah, and I think just like with electricity consumption generally there for the individual user, there isn't a lot of meaningful agency right now.

It seems like it's mostly a systems level design problem, and the operators themselves are incentivized for the wrong thing with AI growth, it doesn't seem that energy and emissions growth follow necessarily. I mean, you've talked about it seems they're real and there are viable ways to grow and decrease emissions in energy intensity at the same time.

Through smarter software design, through grid modifications, through that demand response. Uh, we had Amy Myers Jaffe on the show, uh, a couple seasons ago, and she talked about everything that we could be doing to stabilize our grids with ai. But in the absence of any kind of incentives or standards or market pressures, then we could just stumble into the jevons paradox, where, where we're, you know, going to overwhelm any efficiency gains we made with absolute consumption.

Listen, Vij, I'm aware of the time, I know you've got to run. Um, uh, you're at MIT, uh, you've taken precious time out of your day. We're really grateful for you to help us put some shape around a topic that is still unfamiliar to us at EVC, and I think I could say that safely about most of our listeners, but, um, is becoming more and more front of center and more of a pressing concern.

Then next time we have you back, we can talk a bit more about regulation and even step into existential risk around ai, which is always a fun conversation. But listen, thank, thank you very much. We're really grateful for your time, uh, today, and we've all benefited from your expertise, ed

[00:52:20] Vijay Gadepally: and Sara, it was a pleasure to be a part of this, uh, and enjoyed the conversation as well.

Thank you.

[00:52:25] Sara Hastings-Simon: Thanks.

[00:52:26] Ed Whittingham: Thanks for listening to Energy versus Climate. The show is created by David Keith, Sara Hastings-Simon and me, Ed Whittingham, and produced by Amit Tandon with help from Michael Edmonds. Our title in show Music is The Windup by Brian Lips. This season of Energy versus Climate is produced with the support of the North Family Foundation.

The Consecon Foundation. The Trottier Family Foundation and you are generous listeners. Sign up for updates and exclusive webinar access at energyvsclimate.com and review and rate us on your favorite podcast platform. We'll be back later this month with a special Climate book reviews interview with Roger Thompson as always, and author John Valliant, he of the bestselling book, Fire Weather.

See you then.

About Our Guest:

Vijay Gadepally is a Senior Scientist at MIT Lincoln Laboratory and a Co-Founder of Radium Cloud and Bay Compute. His work focuses on AI infrastructure, data center optimization, and high-performance computing systems. He has led numerous large-scale research and commercialization efforts at the intersection of AI, energy, and scalable systems, and is particularly interested in how vertically integrated compute and infrastructure can unlock the next generation of efficient AI.

About Your Co-Hosts:

David Keith is Professor and Founding Faculty Director, Climate Systems Engineering Initiative at the University of Chicago. He is the founder of Carbon Engineering and was formerly a professor at Harvard University and the University of Calgary. He splits his time between Canmore and Chicago.

Sara Hastings-Simon studies energy transitions at the intersection of policy, business, and technology. She’s a policy wonk, a physicist turned management consultant, and a professor at the University of Calgary where she teaches in the Energy Science program, and co-leads the Net Zero Electricity Research Initiative. She has a particular interest in the mid-transition.

Ed Whittingham isn’t a physicist but is a passionate environmental professional. He is the founder of Advance Carbon Removal, a coalition advancing demand side solutions for carbon removal in Canada. He is also the former CEO of the Pembina Institute, Canada’s widely respected energy/environment NGO. His op-eds have been published in newspapers and magazines across Canada and internationally.

Produced by Amit Tandon & Bespoke Podcasts

Energy vs Climate: How climate is changing our energy systems
www.energyvsclimate.com

LinkedIn | Bluesky | YouTube | Instagram | Twitter/X

Keep Energy vs Climate on the air - Donate Now

podcast Season Seven