Andy Thompson ATI Research Director, Advanced Technology Marketing CPU: What does ATI do in its runtime compiler or drivers to compensate for code that has not been optimized or developed on ATI hardware? Thompson: The function of the driver compiler is to use detailed knowledge of the R3xx hardware shader architecture to optimize the applications shaders. For example, the driver compiler will try to maximize parallelism in a given shader and concurrency across hardware threads. This uses algebraic rules to reorder computations to fully utilize ALU resources, finds and eliminates dead code, etc. In all cases this maintains the applications requested precision and does the exact computation requested, just in a more efficient way. In general, what we try to encourage developers to do is just write good general-purpose code. Ideally it's structured reasonably well and there are enough clues in the code that would allow us or NVIDIA or anyone else to take that code and have our runtime compilers do a good job optimizing it. So in general we spend most of our time just trying to help them write clean code from a hardware and an API perspective. The APIs are obviously very complex. They rev [go through revision] very quickly, every year. Now that there are multiple shader models out there, new developers are coming in all the time, and a lot of them are just completely overwhelmed. So a lot of them are doing things that are very inefficient without even realizing it. While the stuff that gets the most PR, I guess, is ‘Oh, there's some shader that favors ATI or favors NVIDIA,' what both companies probably spend the bulk of our time doing is just helping [developers] with their general optimizations and performance of their code and good techniques for writing code that takes advantage of graphics hardware and things like that. But from an overall strategy, I have a couple of different teams that focus on different areas of this. Initially some years ago, we just sort of focused on ‘Hey, let's just have some engineers available to talk to developers.' We've evolved to become quite mature in the last couple of years, to the point now where we do a full SDK [software developer kit] that we developed. We have gobs and gobs of samples that are full source included that are part of that [SDK]. We have full tool sets that we provide. One of our big tools, called RenderMonkey, is an environment that allows people to prototype [stand alone] shaders so they don't need to write their own shell or have their own game engine or anything. They can just code up a shader, play with it in real-time, and see the changes. What we try to do is provide tools that allow people to develop shaders and related graphics technologies faster than they could do otherwise. Hopefully we're removing stumbling blocks in the process there. Then on top of the background work, we have people that are working directly with the developers. That's a combination of top engineers that are on a plane every week, working on-site with developers on things, to guys who work back in the office here full time and predominantly are working through email and things like that. That interaction starts all the way from working with Microsoft and providing them feedback on the compiler that they have because the better the job that compiler does, the less work that we have to do at a driver runtime compiler level. And we are working with developers to make sure they understand the way that compiler is going to work and to make sure that they understand the way the hardware works. Because if they are not native graphics guys, they might do something [where] they keep a bunch of temporary data around or something like that and they don't realize it is eating up on-chip storage and we suddenly have to go fetch all this data out of system memory because they were needlessly tying up the on-chip register storage. There are a million different things like that where we just try to do a lot of best practices stuff. We have tons of white papers and technical articles that we have written, and all of that stuff ends up on our Web site that anybody can freely download. We host developer events. We just finished seven events in September. We did four in Europe and three in North America, and those are very hands-on. There is no marketing or PR. It's just engineers presenting to rooms full of game developers on technical topics so it really covers the gamut. Then specific to what we do with each developer, it really just depends on what they need. There's a wide range of developers out there and some of them really, really know what they are doing and they need very little [support]. When they are 98% done, they say, ‘Could you see if we've done anything here that could be optimized any further?' things like that. Then some developers say, ‘Hey we're stuck using this old technology. We'd love to use the new stuff but we need some help.' We do whatever we can to help them, whether it is provide them [with] tools or point them to resources that are on the Web or have somebody go on-site for a few days and work with them. CPU: If someone had developed on NVIDIA hardware and written their code with instruction ordering that caters to NVIDIA, are there things that you need to do to reorder the code in your drivers to make it efficient on ATI hardware? Thompson: In a general sense: no. In the evolution of how good an ATI or an NVIDIA compiler are compared to something that the CPU guys do, for instance, we're nowhere near that good yet. We're at the earlier stages there. If we had been working with somebody after [NVIDIA], maybe there are a couple of instructions we would ask them to do some hand tuning on or something like that, but it tends not to make that big a difference. One of our design goals for the R3xx hardware was to have a good general purpose shader architecture that could take any DirectX 9 shader and run it efficiently with a minimum amount of optimizing by the driver. So in general, we feel that our architecture is quite good at most all standard-type cases. There is the ‘How do you get 5% more?' but we have never found where we are in a situation where, if somebody does something that is optimal for NVIDIA or anybody else, we feel we're at a 20% performance detriment or something like that. That situation just has not happened. Part of it is because almost all the new code that's being developed is in HLSL and so those people are compiling down through Microsoft's HLSL compiler and Microsoft obviously has good compiler background and they spend a significant amount of time with us and I'm sure they spent a significant amount of time with NVIDIA and everybody else trying to make sure that their compiler is as good as it can be, and that's a moving target. They continue to evolve it and have updated releases. The biggest thing that we tell developers is ‘Try and write decent code and use Microsoft's compiler because it does a good job.' We find that is a great starting point for us. So ideally, maybe we have to work on some co-issue that the compiler isn't doing for us, but in general, it hands us something that is quite optimal already. Really you might only come into cases like you're describing if people were hand-coding assembly code. In DirectX 9 there are not a whole lot of people doing that now. Still, obviously in DX8 it is a big deal but the DX8 shaders tend to be just a couple of instructions, so it's really no big deal even if somebody felt they did have to have two versions of a shader in DX8 because their shader might only be eight instructions. CPU: Obviously you can't speak for NVIDIA, but you may be familiar with their architecture and the differences it has from your own. Is that [ease of portability] just because you guys followed the DirectX 9 specs closer than they did? They seem to have some pretty serious performance issues if they don't, for example, reorder instructions and do things like two texture accesses at a time as opposed to alternating between other types of operations. Thompson: Ultimately, it may come down to the design choices that were made. Our design decisions were [based on the goal that] if somebody had an app running DX8 code and they added new DX9 features to it, we essentially wanted that to run at the same performance as it did if it was DX8 or even DX7. The test results would show that NVIDIA is a good bit faster when they are running DX7 or DX8 applications compared to when they are running the new DX9 applications. I don't know whether that's an accurate assessment of their architecture, but that's sort of if you view it from an external benchmark results standpoint. I suspect that maybe they put the emphasis on apps that were shipping for most of 2003 and figured, ‘It's OK if we have some performance tradeoffs using the new features.' We didn't make those trade-offs, so apps run very quickly on our hardware whether they use old features or new features. From an architecture standpoint, it's not, ‘Did you follow the spec or not?'—they are perfectly compliant with the spec as are we—it's really just, ‘Where do you spend your gates?' and our decision was that you sort of have eight pipelines that run everything at the same speed and it doesn't matter whether it's a basic DX7 instruction or DX8 or whether it's a more complex DX9 instruction. You have enough parallel resources there to keep the speed up no matter what's thrown at it. CPU: Do you think that part of this difficulty is that HLSL has made writing DX9 shaders a faster proposition? That people can actually jump in and use some of these features sooner than in previous versions of DirectX? Thompson: It lets them generate more complex cases quicker. Keep in mind that a DX8 shader might only have been 5, 6, 8 instructions, so from that perspective I guess they weren't overly complex. But there was a mental hurdle to get over in moving from fixed-function texture accesses to this whole programmable thing. And there's a bit of that transition [still] because you can think of DX8 as sort of not really programmable. If you want, you could look at it as sort of more flexible fixed function. DX9 really is programmable, and so some developers are still getting over that mental leap. But yeah, HLSL makes it a lot easier to think about things in the constructs that they are accustomed to and not have to think about how to write efficient assembly code. Because writing efficient assembly code means that you have to have some understanding of the underlying hardware and we are revving the hardware so often that it's a nightmare for developers. Only the top developers could really try and tune for the hardware on their own, so HLSL takes away many of those issues by saying, ‘Write code at a level you are accustomed to. Write it in simple human-readable form and let the compiler and the driver and the runtime compiler deal with everything. So that's made it easier for them, but I don't know that it has anything to do with the performance differences on one architecture versus the other. CPU: Obviously this has gotten into a dicey issue with all of the Valve stuff happening, which is not what this article is about but unfortunately we're getting into a lot of those issues because those are the questions that are coming up. If the shoe were on the other foot and there was a developer who had worked closely with NVIDIA on their code and ATI came out as being very slow, would you work directly with the developer to find out what the reasons were? Thompson: Absolutely. We fully expect that every good developer is going to be working with at least ATI and NVIDIA both, if not with other people as well. Some of it comes down to: Do we as hardware companies have enough resources to work with them? But I'm just going to presume that NVIDIA has plenty of resources to do that and so I'd be surprised if any good developer was not talking to both of us on a very regular basis. Ultimately, the decisions are up to [the developer]. All we can do is say, ‘Here's how to implement some new effect that you're looking to implement in an efficient way,' or, ‘Here's how to make this run faster for our hardware,' and if somebody else is delivering them a different message for whatever reason, then they're going to have to decide what they want to do. It's their product. It's up to them. We can just do as much as possible to make sure that they're educated. But as I said, most of the stuff is generic and it's in our best interest to ensure that the games get out and that they work well. If somebody's got a problem on NVIDIA hardware and our guys are there working with them, we will help them debug that, and I suspect that NVIDIA's guys would do the same thing. We're not trying to introduce bugs so that something doesn't run on their platform or something like that, because in the end, we benefit from new technology being shown off in cool ways in the marketplace. If there are cool games out there that are taking advantage of new technology, that helps us sell more product. So we don't want to do anything that could sabotage that and sort of hold the market back. It's in our best interest, even if it helps a competitor, to make sure that stuff works. CPU: Are there new features of version 3.0 pixel and vertex shaders that you guys support and NVIDIA doesn't, or are you strictly 2.0 at this point? Thompson: There is nobody that has a product shipping today that has any 3.0 features exposed, NVIDIA or anybody else. CPU: ‘Exposed' meaning they could be in the hardware but not exposed in the drivers. Thompson: I wouldn't know, but there is no product out there that is, from an API standpoint, setting the checkmark that says, ‘I support 3.0 shaders,' either vertex or pixel, and you can use them. I suppose there is a chance somebody's got something in hardware that's not being exposed, but I kind of suspect that's not the case. CPU: Do you think that means we won't go through the version 2.1, 2.2, 2.3, back and forth? Thompson: I would be shocked if we [did]. Ultimately, it's up to Microsoft but I think one of the intents of having the 2.0 and 3.0 shader model both be in DirectX 9 at the same time was to provide some stability to developers so that they knew they could count on that being there for a while and they could write their apps to it and not have to worry about exactly the situation you just described happening. There's no guarantee, and obviously in the end, it's Microsoft's call, but I believe that everyone's intent was to provide some longer stability to the platform. CPU: We are seeing different documentation coming through that notes ‘ps2_a' or ‘ps2_x.' Thompson: They added [that] to the HLSL compiler. I don't know if flag is the right word for it, but there is a mode that takes advantage of the partial precision support in NVIDIA's hardware. So a developer can set that and say, ‘Go ahead and use partial precision, it's OK.' That's what that other mode is. That came out in the recent update to the SDK. CPU: So that is an add-on, but it's not additional features, it's an accommodation for partial precision? Thompson: Right. Partial precision is something that is supported in the core API. I don't recall exactly how or if it got exposed in the first release of the compiler, but you can now officially specify to use partial precision in the HLSL compiler if you want. Obviously, to us it doesn't make any difference. We would just ignore that and do everything at full [24-bit] precision. CPU: The whole 32-bit vs. 24-bit situation: I'm assuming that DX10 will be IEEE32. Where will that leave this generation of hardware for ATI? Thompson: I can't comment specifically [on] what DX10 is going to have, but certainly you would expect that as graphics evolve, things will get to be 32-bit float everywhere. Our feeling is that 24-bit is more than enough precision for the complexity of features that are available in DirectX 9, the current API. You can't come across situations now where you can write shader programs where you say, ‘Gee, 24-bits just isn't enough for this. I'm getting precision errors,' or things like that. It's vastly more precision than you're getting in 16-bit, for instance. But as you move forward, you will get more complex features. You will get people running shader programs [that] instead of being 20 instructions, will be thousands of instructions a few years down the road. So [with] the accumulation of error, sure, you'll want more precision. But at the same time, whatever the API looks like a couple of years from now, the products that we are shipping today aren't going to support that future API. They're still going to support 2.0 shaders and DirectX 9. We believe that the precision that's in the API is very well balanced to the features that are in the API right now and would expect that next time around that will continue. CPU: Any comments on at what point ATI decided to go for 24-bit? Was there some confusion about when the pixel precision spec for DX9 was finalized? Thompson: There was a clarification on the partial precision stuff that came out after the spec was released, but that had nothing to do with the 24/32-bit thing. There was a little bit of stuff that wasn't quite clear in the spec, but I can't say that that actually reflected a lack of understanding behind the scenes. CPU: So you feel that it was very clear from Microsoft that it was always going to be 24-bit? Thompson: Yeah, certainly for some reasonably long amount of time, I don't know exactly when that decision came, but yeah, it was very clear that, (quote) "full precision" was anything 24-bits or above. CPU: OK. What happens when developers are developing on your hardware so they are shooting for 24-bit and than in order for it to run on NVIDIA's hardware, they have to choose between 32 and 16? But if it's originally 24, 32 is kind of a waste and 16 means you're losing something. Doesn't it seem that either of those choices is the worst of both worlds? Thompson: Obviously, knowing what the API was, we felt that the best choice was to match what the API said. So if the API said 24+, then we implemented 24 because to do more than that was essentially a waste of silicon that could be better used for higher performance, for instance, in our case. Ultimately, NVIDIA's got to say why they made the choice they did in that way, but you won't provide developers unexpected results. If they're coding for (quote) "24" (and it's not like they're coding for 24, they're just using full precision), it happens that the internal hardware calculations are at 24 or 32, they're not going to see any difference. That is not something they have to worry about. Now if they decide to go down to partial precision, that is something they have to worry about. They are actually going to have to test those cases to make sure that there is no banding and that there is enough precision for texture addressing or whatever it might be. They just have to test those cases and decide, ‘Hey do we get more performance, and if so, are there any visual anomalies that we have to worry about?' But 24 to 32, they are getting all the precision that the API is guaranteeing them and we've literally had zero people say, ‘You know what, I'm just out of precision.' We've never had one person [make] that comment to us. Now people certainly have said things that [for] things they want to do in the future, they would want more precision. But the API will evolve in the future, as well and will provide them more capabilities at the same time. CPU: It seems that ATI has timed this very well. You were early with the hardware and so you've had time to tweak and perfect things and add features. How much of that is making good decisions early on that panned out and how much is it that ATI has changed the way it approaches hardware development? Thompson: A legitimate decision that we made was that we didn't want to have a quirky piece of hardware. We wanted something that was a very straightforward design, that would not have wildly unexpected performance differences. One of the biggest gripes if you talk to some of the top developers is, ‘I'm doing stuff and my game runs at 60 frames a second and I made this one little change and now it runs at 30 frames a second. What the hell did these guys do? There's this one operation in their hardware that runs 10 times slower than everything else.' And developers hate those kinds of surprises. To be honest, in some prior generations of hardware, we had some of that kind of stuff. In the end what happens then is that it impacts on your drivers because then you try to cover up the quirkiness in your hardware with your drivers and so that makes your drivers harder in development and harder to QA and to keep bugs out of and all that stuff. So we fundamentally tried to have a very clean hardware architecture so that it made our driver development easier and so that there were no surprises to developers. I think if you went out and asked developers about any of our R300-based products, how they perform, one of the biggest things they would say other than, ‘Yeah, it's faster, it's great,' is, ‘Hey, it performs the way I expect it to perform. It's very consistent.' So that has worked out phenomenally well for us. From a competitive standpoint, it's great that it appears people are having more challenges on other platforms, but realistically, obviously we had no idea what anyone else would do. Our goal was just to have an easy platform for developers to work with. CPU: How much have you had to ramp up your developer relations teams to cope with the fact that you guys are leading now? Thompson: Pretty significantly. Obviously it takes time to hire and get people up [to] speed, so I wouldn't say that we said, ‘Hey. Wow. We're successful. Let's hire.' They were sort of happening at the same time. We realized a few years ago, and I think really 3dfx was the trendsetter in this area, that having those developer relationships was really important, even if you look at the expenses. Everyone talks in business about how much customer support calls and things like that cost. If we have a good group that at least ensures that every game released to the market got tested with our hardware before it got released, think about the amount of calls we can save. So it's very easy to justify that and once you have a basic support like that in place, then you say, ‘Now that we have established connections and relationships, what could we do if we got early hardware into people's hands and they actually took advantage of the capabilities of our products to make a better game or whatever the application may be?' and we've really built up to that. So in the last two years, we've gone from what you arguably could call zero ISV support: one or two people that sort of considered themselves doing that work but not something that you'd be proud of. I have maybe 15 people now roughly, doing direct developer support, and then if you start getting into tools and SDKs and other things like that, it's maybe 25, and that's worldwide. So it's a reasonably sizable group and we keep up with an enormous number of developers on a regular basis. CPU: Is there anything you'd like to say about the very aggressive competitiveness that has manifested itself between hardware companies and all the accusations flying back and forth? After the Half-Life 2 announcement, Valve was hacked and their source code released onto the 'Net amid rumors like, ‘Oh, ATI tweaked the game code to not support NVIDIA hardware' because an ATI programmer's name was found in the comments. Is there anything you'd like to say? Thompson: There are obviously lots of conspiracy theories on that stuff, especially with the big developers, like a Valve or an id or somebody like that. Gabe has made a lot of public comments and I think that all of the facts point to the fact that he is being very straight up on it. That is that any amount of work that we do with them, either engineering support or money for co-marketing deals or anything, would pale in comparison to them having a successful game. Keep in mind that with any of these guys, it's not only their game but then they're making game engines that they license to other people. So their success is not in partnering with a hardware company, their success is in having a phenomenally great game. So it would be incredibly foolish of any of these guys from a business perspective to sabotage their product on one platform or the other. In Valve's case, they gather statistics from their users' platforms on what CPU, what graphics, etc. are being used in their system, and because of the way the market looked a couple of years ago, the bulk of the people that were playing Half-Life (one) and the related games, had NVIDIA hardware. That's the crux of his comments where he said, ‘It wouldn't make sense for me to ignore or not optimize for the NVIDIA platform,' because that's what the bulk of their current users have. So it just doesn't add up to say that they would do something to sabotage one platform or the other. CPU: Is it your opinion that any developer who's in that high of a profile position would have to get acceptable performance on both major platforms before they could release the game? Thompson: I would suspect that it would be to the point that if they felt they were not getting good enough performance on one platform, they would actually go back and change the game. CPU: I actually asked Gabe that at shader day and was kind of surprised at the response. I asked him if they had considered that and he said no. Thompson: Different companies are motivated by different things. Valve has always been very motivated by the technology and what that can do to the gaming experience so they're on the forefront of people who are using DirectX 9 features. They're pretty much using everything and they are doing it in cool ways. So from Gabe's perspective, for them to just say, ‘We're not going to use DX9 because we don't think it's fast enough on every DX9 piece of hardware out there,' is not something they would consider. But what they are doing is scaling back on some of those features for some pieces of hardware so that it still looks good although maybe not quite as good as it could with some of those extra features, but it runs at good interactive frame rates. Obviously, Gabe would have to tell you specifically about the tradeoffs they made and why, but they are a company that is very motivated by doing cool things with the technology. Obviously, if it's a mass market title—the Sims or something like that—they're going to be looking for a lower bar for their mainstream user and they aren't going to have an expectation that everybody running the Sims is going to have a $400 or $500 graphics card. It's really just about what type of title they have and who their target audience is and things like that. Of course that said, I believe that Valve is actually scaling back to DirectX 6, so they will not [be] alienating people that have older hardware, but obviously you'll get a much richer visual experience if you've got newer hardware. CPU: In that regard, Valve obviously started Half-Life 2 development before DX9 was released because they have been working on this for quite awhile, so at some point all they had was a DX8 target. Do you feel that the DX9 artwork and shaders that they have added are just the tip of the iceberg of what people will eventually be doing with DX9 and programmability? Thompson: Absolutely. They certainly had an eye to what was coming. I suspect that they had probably talked to Microsoft for quite awhile. We had certainly been giving them insights into what was coming in future hardware, as I'm sure NVIDIA and others would have been. So from that perspective, it's not like they designed a DirectX 7 game and then went, ‘Oh my gosh, we need to redesign this.' I think they went in knowing that. They've talked about the flexibility in their engine so that they could easily adopt future shader models and things like that as they move forward. I think that for the next little while, they will have set a new bar for image quality and things like that and what they do. But that said, I think they have a good platform to keep building on for 3.0 shaders and beyond, as well. CPU: Gabe's really strong message to journalists was ‘You guys need to change the way you benchmark because benchmarks are being targeted and we are getting bad data.' Do you think that if hardware reviewers use custom benchmarks and that sort of stuff, that the practice of targeting benchmarks for optimization in drivers will stop? Thompson: No. The level of optimization that people will go to, that's a moving target, as well. If somebody is optimizing something because they know it's on a canned path and is always going to be the same, and if you change the path, maybe you had 10 optimizations before and now only eight of them are still valid. There's still open debate as to whether you should optimize like crazy for a known application or whether you should have your drivers be sort of more general-purpose and identify general optimizations that can happen in the runtime compiler. I think in general we [ATI] favor you should only be doing general-purpose stuff. Certainly some of the stuff that Gabe highlighted is really bad, but in general it just gets really sticky when you start identifying specific sequences in apps and optimizing them. There are very rare cases where an app does something that just breaks on hardware or something that's just incredibly inefficient. That happened a lot a few years back. It's pretty rare that that happens now. So very infrequently would we do something where we just really have to optimize something in an app because it became a very popular app out of nowhere and there's something bad in it. But traditionally, I think it's up to the people that are running the tests to keep changing things up and to have a critical eye on what they're looking at. One of the challenges is that [for] a lot of the people that are doing this, it's become so technical that it's hard for them to even keep up and understand what's going on, which is one of the reasons that we did the shader day event—to hopefully provide some more background info for people, at the very least, to get them asking more questions and hopefully to have educated them a bit further on things. But I don't think it's going to slow down. It continues to be a competitive marketplace. I suppose if the way customers made their buying decisions changed, in the long run maybe it would change some of this. But what tools do they have to make better buying decisions than they have at their disposal right now? I'm not sure. It's a tough one. There is certainly no obvious answer. CPU: Do you think that it will just stay an "under the covers" very competitive thing? If there is an app that comes out of nowhere and is very popular and you know there is something very backwards for your hardware, will it ever be OK for ATI to say, ‘Look here's the optimization we did for that and here's why we did it.'? It seems like the biggest thing that is poisoning things right now is the loss of credibility. That kind of straightforward presentation might fix that. I think it also might educate consumers that coders aren't gods, they don't do everything right the first time, and hardware has to run. Thompson: So it should be OK to do that. When all the stuff around 3DMark03 came out and people started to disable all the optimizations and we had a couple percent in performance [decrease] and we said, ‘Look, we optimized this shader by hand so that it would do co-issue.' And we still stand behind the fact that that's a good thing to do. That's a good optimization. That does not change anything about the mathematics, the intent, or anything about the shaders. But we backed it out of the driver because it was a touchy one, because it was a synthetic test and we said, ‘It's not worth arguing about. We'll just back it out.' But in the long run you want to be in a position to do stuff like that because it's a good thing. When you start doing things like saying features that we've had around for five years like trilinear mipmapping, when you're not using those features anymore because you can save bandwidth and get slightly higher performance, that seems to be a touchy line there because you're really just saying, ‘I'd rather have 200 frames per second than enable basic features that improve image quality.' So how do you police that? How do you justify it? How do you make those decisions? I don't know. In the end, it's just people making the choice to buy product A or product B instead of somebody who's really enforcing things. I think that is part of what Gabe touched on where he said, ‘It's up to developers to keep an eye on their own games.' Put good benchmarks in the game because in the end, people are trying to use those benchmarks to know what hardware to buy to play games or to run a workstation app or whatever it may be. So if there's good benchmarks built into the apps, that's a help. And if developers keep an eye on what's going on, ultimately that's the best thing because it sure helps to have source code at your disposal. A developer can play with their own source code and look at things and say, ‘Gee something looks weird, I think something funky's happening,' or what have you. If there is a third-party guy, a Web site guy, trying to look at it by comparing screenshots, it's much, much harder. So I think that's a good trend if we get especially the top developers to take a stronger interest in the benchmarking at the app level and to put good benchmark capabilities with lots of flexibilities in the apps so you can compare things 50 different ways. I think that's a good trend. CPU: Any other problems that you see emerging at least for this Christmas season? Have we had our big loud noisy controversy? Thompson: Well it's always hard to tell what's going to become a big loud controversy. I didn't think it would have been quite as controversial as it was already. No, I would hope that now we'll just ship product and everybody's happy, but I guess we'll have to see. CPU: Were you aware of what Gabe was going to say at shader day? Thompson: No, not really. We knew about his feelings on some of these topics. Obviously, we had laid out a general flow for the day. We wanted to cover shaders and their importance and their use from a hardware perspective, an ATI perspective, and a software perspective. But we really left it up to each of the presenters to, within reason, do what they wanted with that. So we didn't know exactly what Gabe would say about that. CPU: I did get the feeling that there were a lot of nervous people in the room at the point when he started talking. Thompson: [laughs] But I think, in general, it's great if the personalities in the industry do talk publicly about things because a lot of times there is all this sort of industry gossip going on behind the scenes and there is a lot of confusion and depressing stuff happening out in the open. So I think it's good, even if it fuels public debate. At least it sort of gets some stuff out there in the forefront and sort of fuels the debate and I think that in the long run, that's probably a good thing. David Kirk NVIDIA Chief Scientist First Session, Sept. 25, 2003 CPU: I'd like to talk a bit about NVIDIA's shader-optimizing compiler technology. What kinds of optimizations does NVIDIA do to compensate for code that has not been developed on or optimized for NVIDIA hardware? Kirk: The observation to make about programmable GPUs is [that] where we are is where CPUs were in the early 1980s. When there were a variety of different CPU architectures, there was a lot of compiler technology being developed, and one of the things about the early CPU technology was that the performance of code that was generated for the CPU can be vastly different depending on instruction ordering, on register usage, [and] on a variety of other factors. And so we're at a stage now with the GeForce FX processor technology, it's really our first generation of fully programmable graphics pipelines, and the pipeline is quite sensitive to what are, from the developer's point of view, irrelevant nuances of ordering of computations. One example of that, a very simple one, is that the hardware is much, much more efficient at doing a sequence of two texture accesses and then two math operations than doing a texture access, a math operation, a texture access, and then a math operation. In other words, interleaved. So, the first thing we need to do with compiler technology is essentially smooth off the rough bumpy edges and, frankly, the cliffs in performance between details of how the code is written. So without even changing any intent of the developer, we have a lot of work to do in terms of choosing better instruction ordering, doing things such as combining a multiply and an add into a mulad hybrid instruction that we have and scheduling instructions so that they can be issued in parallel and rearranging the register usage so that we don't used up as much internal memory space on the card. The next level of optimization that we need to do is (as you pointed out developers aren't perfect) sometimes people write code that is in sequence but doesn't get executed. Sometimes people write code that calculates values that don't get used. Sometimes people set, in an example from before programmability, attributes to the same value multiple times before using it. So we do optimization much like a CPU compiler does. We build a code tree; we parse the DX9 pixel shader code into a directed acyclic graph that describes the execution of the program. From that graph we can do various transformations which are synonymous; they produce exactly the result to a half of a least significant bit. (Because of math ordering, if you change it around, you may change the least significant bit.) And all those transformations happen without changing the result of the computation. The next thing we can do is we can look at the result that's been created. If the shader, for example, is taking a color value and multiplying it by a lighting value and writing it to an 8-bit frame buffer component (32-bit, 4x8-bits) then that calculation does not need to be done in FP32. In fact, it does not result in a different answer if it is done in FP32. It can actually be done in 8-bit integer, and you get the same exact answer. So those are other kinds of optimizations that are a little bit more aggressive that we can do that change the computation but again still produce the same result. I'm not aware of optimizations being done by the optimizing compiler that change the shader so that it skips steps or removes work, and I am philosophically against having NVIDIA do that. If there are optimizations like that to be done, I would rather have them be done by having our developer engineers work with game developers to produce alternate code paths for different pieces of hardware. CPU: Along those lines, how familiar are you specifically with the compiler? How closely are you familiar with it? Kirk: I am not involved in the development, but I am a member of the team that does code reviews and does what we call concept reviews. Some of the things that I've just described to you are guidelines for honest development and honest optimization, and we've adopted a set of guidelines to help the individual programmers and the team members to decide what are reasonable things for them to be doing. One of the difficulties I think you are alluding to by asking how am I involved in it is we have over a thousand software developers at NVIDIA working on various drivers and tools and other pieces of technology, and each of them may make choices which I or someone else might not agree with philosophically. So we've adopted a set of optimization guidelines along the lines of what I've described that we can use for the review process to help people to decide and make sure that what they're doing is a valid optimization. CPU: Is this something that NVIDIA is interested in making public? Do you have the intention of sharing this compiler with developers so that they might optimize themselves and see how their code gets changed so that they can then write code that is more optimized for your hardware? Or at this point is this strictly a driver thing: this compiles at runtime and is a driver issue and it will stay that way? Kirk: There are multiple levels where this comes into play. One of the things is that the machine language—the actual bits that the GeForce FX hardware consumes—are not the same exact instruction formats as the DX9 pixel shader, so there [are] multiple stages of reinterpretation and compilation that happen. For example, if someone has written HLSL code, the HLSL compiler from Microsoft has to generate pixel shader 2.0 instructions, which are then passed to the driver, and they can generate instruction sequences that are optimized either for NVIDIA or for ATI hardware or they can generate a generic instruction stream if identification hasn't been made of what hardware it's running on. Then [for] that instruction stream, even if we weren't doing any optimization at all, we have to consume the pixel shader instructions and emit hardware instructions. And that part of that process is always going to be core driver technology that we're probably not going to publicize. But I think the kinds of things we can talk about, and I'm working on putting together some material to present to people such as yourself, and you can work with Brian to schedule a time over the next few weeks to talk about it. I would like to communicate our optimization strategy more widely and openly and get comment and feedback from people such as yourself and some developers so that you understand what we're doing and so that you're comfortable with it, and the parts that people aren't comfortable with, we're willing to consider changing, but I think that what we're doing is very, very conservative. We've talked to a few developers and we've talked to Microsoft, and so far everyone we've talked to is happy with it. Really most of the techniques that we're doing are the same things that the compiler does for the Pentium 4, just regular compiler techniques. CPU: Do you regret the decision of going with 32-bit with the option of dumbing down to 16 when ATI took the straight 24 and in some ways that gives them a short term advantage? Do you still feel that it was a good choice to make and for longevity that's going to be a benefit to you in the next go round? Kirk: One of the ways to answer that question is you should ask ATI what precision they are going to use for the next generation that they're building. In the short term I have to say that I regret that we and Microsoft and ATI were not able to settle on a standard precision as a consistent answer. FP24 is a really funny choice to me intellectually. It's way overkill for a purely color calculation, and it is underkill for texture math or geometry or other kinds of processing, so it's kind of an orphan precision. In the process of development of DX9, we had an issue with discussion of precision, and the precision proposal changed over time. At one point in the spec, FP16 was the required precision for DX9 and FP32, which we added, was a bonus. At another point in the spec, FP24 and FP32 were the standards. In the final version when DX9 was first released, it was left unspecified, and the thing that happened than was precision guidelines were clarified after the release of DX9, and so we are in the situation now were with 20/20 hindsight, I wish we had made a choice that was faster. The ATI choice of 24, I think, is unfortunate and it will be with us for a while, but it does benchmark faster than 32. End First Session Begin Second Session October 3, 2003 Kirk: People think what's happening now in no way has ever happened before and sometimes that's true, but there are a lot of things in technology that we see repeatedly. It's very exciting what's happening with programmable GPUs, but in terms of some of the difficulties we're having with the press and the consumers understanding and measuring what we're doing, these are not new problems. We've seen exactly this kind of turmoil before, and I lived through it. CPU: Is there a similarity to what the optimizing compiler does that's comparable to out-of-order execution for a CPU? Kirk: Absolutely. In the case of CPUs, out-of-order execution has come about to make the hardware able to more efficiently process the instructions and in the case where (with the CPU program) there's a sequence of instructions that either can't be executed at full speed in that sequence or due to some external constraints, like memory. Like you're trying to do a multiply between two values that you're getting from memory and one of them is not in cache so you take a cache miss and you stall. It gives the CPU the ability to go ahead and do some useful work, kind of get ahead, while it waits for that resource to become available. On the GPU side, we do not yet do very much out-of-order execution. We do some. But some of the instruction sequences that are either not pipelineable back to back due to some resource conflict, or if we get stalled because, for example, a piece of texture information is on the other side of AGP and not in the cache, or in the graphics memory and not in cache, in both of those cases we would stall and we would waste computing time. So what were doing in the compiler is better ordering to get better throughput for the cases where we can't do the instruction reordering and out-of-order execution in the hardware. That's what compilers have done on the CPU side, as well. They still do that, but there's just not as much support for that in the hardware yet on GPUs. I expect there will be in future generations. CPU: There is much more discussion this year about a GPU being an all-purpose processor and people using it for all kinds of things that don't have anything to do with Doom III. That's starting to be a very interesting proposition for scientists and researchers. Has NVIDIA been approached, and have you thought about doing presentations along those lines in terms of how complex GPUs are getting and how general purpose they are getting? Kirk: In this generation we're still in the realm where it's really only for very sophisticated programmers to try to take advantage of that. We are aware of some universities and some research labs that are trying to use GPUs to do general purpose kinds of number crunching and scientific computing. It is a fact that in terms of gigaflops per dollar, the GFFX is the best buy on earth. It's the most floating point per dollar. At this point, though, there is some amount of contortions that the programmers have to go through to access that. The programming model that GPUs have through DirectX 9 and the associated APIs and OpenGL are not really as general as people are used to for scientific computing, but we're working on that. The next generation of hardware will be a little more flexible and a lot more powerful, so we should see more of that. I'm also looking forward to the day because we're the most cost-effective floating point. I think that very soon you'll start to see there are these competitions where people try to get the fastest performance on standard scientific computing benchmarks and the national labs publish a list every month or every quarter. I'm waiting to see when one of those top 10, hopefully the top performer, is an NVIDIA GPU doing the math. Right now, we're the only GPU that can really be used for this because we have 32-bit floating point. The 24-bit floating point that ATI has just isn't useful for any kind of real computing. CPU: DX8 vs. DX9, are there starting to be difficulties in running out of space on the die when you design a new generation and you have to keep the backwards compatibility, or are you migrating operations over to the new architecture but just doing it in a simplified fashion? Kirk: The first thing is the way that our compatibility works on the hardware. It's really not a big hardware burden to keep the previous generation's functionality. The new things, like [when] you compare DX9 vs. DX8, the hardware [DX9] requires is so much bigger and more complex than the hardware that's required to do DX8 that keeping compatibility for DX8 is relatively small for those functions. The way that the evolution of DirectX works, DirectX 9 is a superset of DirectX 8. You can't support DX9 and not support DX8, so it's really not a question. In each generation we fit as much performance as are able to fit in the square millimeters we have. CPU: Some of the buzz has been a little bit about ‘Why is new hardware slower in DX8 than old hardware was; is there some penalty for having all this extra DX9 hardware onboard? Kirk: I'm not aware of any cases where the new hardware is slower for the same price/performance class. I'm not aware of any situations where the older API is slower. Another thing, just in terms of how products are introduced and how they're perceived: The GeForce FX 5600 when we announced it, we talked about its better performance than the previous high-end Ti 4600, and [that's] probably true in many cases for DX8, but it's probably not true in every case. The other thing [is] people might say ‘Oh, yeah, it's getting slower in DX8,' but it's also getting half as expensive, so it's not a step back. It's a big step forward in price performance, but maybe in some cases there are small steps back in absolute performance. CPU: On the hardware side there is obviously a big difference between DX8 and DX9 in terms of what it had to be able to do. Can you pinpoint some of the potential differences between DX8 and DX9 that users will readily perceive? For a modern game, there are definitely times when I don't see a lot of difference between a gorgeous DX8 and a gorgeous DX9 rendering of the same DX9 game. Kirk: That's a very good observation, and I think it's indicative of how the developers are building DX9 games. One of the difficulties of making a modern game is there's a tremendous investment that's required in the artwork. The creation of the textures, models, objects, characters; artists are expensive, and programmers—I won't say they're not as expensive—but it doesn't take as many programmers to improve the shading from DX8 to DX9 as it does to have artists completely change all the art for the entire game. So I think what you are seeing now is just the dawn of DX9 games, and so primarily what you're seeing is basically DX8 content with spiffed-up programming. With that approach it's not possible to really show off the full potential of DX9. CPU: So you think that there will be very large visual differences once the artists can leave a legacy title behind and start from scratch working in DX9 and start thinking in terms of what DX9 can do? Kirk: Right. What you are seeing now are games that were begun with DX8 and then upgraded to ship with DX9 support at the end, so they weren't conceived with DX9 in mind because DX9 wasn't available when they were started. The typical product cycle for a high-end game is a couple of years. So what we're starting to see now behind the scenes as developers is some really aggressive adoption of DX9 shaders with some really new looks. I'm pretty sure that almost none of these [games] are going to make Christmas, but I think there are some pretty exciting things that are going to come out in the springtime. CPU: With DX9, HLSL, and Cg, how has the ease of use for the developer impacted NVIDIA's design and product cycles? Apps are going to arrive sooner that stress the hardware, and also maybe less capable coders are using shader languages to write code that they don't really understand. Kirk: [laughs] Well yeah, that's the process that happened and it happened with CPUs, also. [With] GPUs (and before GPUs, just graphics accelerators), the programmers on the game teams were very, very close to the hardware. They really understood every cycle, every operation, and they really twiddled all the bits by hand because there weren't that many bits to twiddle and they could really control—like a musician in a symphony orchestra—every little nuance of every little thing that was happening in the pipeline. The evolution of programmable GPUs has made the spectrum of possibilities of what they can do with the hardware so much more broad. Longer shading programs make the specification of what the shader processing is doing so much more complicated. It's no longer possible to do it by hand. The higher-level shading language lets them, to a higher level of abstraction, express their creativity without getting so much down into the details. This has advantages and disadvantages. One of the advantages is that their expression of what they want to have happen in the shader is nearer to their conception of it; how they think about it. The higher language is more conceptual and it fits in more with how they think about creating a special effect or rendering a material. The downside is, by moving that abstraction up and moving the representation closer to the way they think, it's moving the representation further away from the way the hardware thinks, and from the actual bit-for-bit operations in the hardware, and that causes inefficiency in execution and it causes the developers to have less ability to really target specific hardware well. It's a more complicated problem. The development cycle for them shouldn't be made longer by that because it should be easier for them to write their code. The performance tuning part of it at the end is a little more difficult because they really have to focus on making more things fast because there's more complexity. CPU: Is performance tuning something that a coder who is capable of writing shaders can do or does it require somebody who really understands the hardware? Kirk: Or the third case is an optimizing compiler that has a model of what's fast and what's slow [so it] can go juggle the instructions around to get better throughput. CPU: So is an optimizing compiler going to be available to developers to help them optimize? Kirk: Yes. The first real instance that we're offering is with our release 50 drivers. There is already some optimization in our previous drivers, but not nearly as sophisticated. There is also optimization that Microsoft is doing with their HLSL compiler. They have the ability to recognize what hardware you're running with and they actually do generate different code paths for NVIDIA and for ATI hardware. All of this technology is relatively young and immature, and it's really just starting to bear fruit in the release 50 timeframe. CPU: So does the Microsoft HLSL compiler reorder instructions and operations to reflect NVIDIA hardware vs. ATI's hardware? Kirk: Yes it does. In fact the HLSL compiler will [also] take advantage of other kinds of things that we can do or that they can do in terms of parallel execution of instructions or compound instructions like we can execute a multiply-add as one instruction rather than as a multiply and then an add. So it has very specific knowledge of the hardware capabilities and it takes advantage of those things. CPU: So if a title has been written and optimized on NVIDIA hardware, do you think that ATI is going to need a compiler in their drivers that would do something similar? Kirk: I'm sure they do have that, in fact. Because the fact is that the pixel shader instruction set from Microsoft is not a hardware instruction set. So [for] ATI and NVIDIA both, the hardware runs a different set of instructions then are expressed in the programmer-visible pixel shader instruction set, and they have to interpret one set of instructions into the other. As long as they're doing that, it makes a lot of sense for them to do things like reordering instructions or in fact just eliminating instructions that don't produce any results. That's one of the biggest things that happens. I'll tell you this because I'm a programmer too: Programmers are careless, and they write code and parts of the code just don't have anything to do with the answer, and it's possible for the compiler to analyze and remove that code so that you just don't waste time calculating things that don't even show up in the final picture. CPU: Do you have any kind of a specific reference that I can give for when DX9 artwork really comes onto the scene with DX9 programming; something that people would see that they don't see now? Kirk: All of this comes in terms of in the old days. Way, way back in DX8, people really programmed the graphics hardware as a configurable set of functional units. They had a very limited pallet of choices for what they could build in terms of the shader or the look. The early part of game development is what they call "look" development. They brainstorm and they figure out ‘I want this game to be all shiny and sparkly and metallic' or ‘I want this game to be really soft and pastel and painterly.' That's a part of creating the look and feel of the game they are going to create. The pallet of choices that they had with DX8 was relatively limited. One of the things that's difficult for me to predict is what the artists are going to think of with DX9. The way you can paint a picture for your readers is with DX9 and the programmable hardware of the GeForce FX, we've brought the set of choices in the pallet to a much wider, a much higher degree of flexibility. It's much closer to what you've seen in movies with movie special effects and with computer-generated films like Pixar films. So you can now have the creativity of the game artists doing the same kinds of things. Really the sky's the limit. If I could name what it was all going to be, it wouldn't be creative. CPU: Has the final spec for pixel shader 3.0 been set? Kirk: I believe that the 3.0 spec was part of the DX9 release. CPU: But 3.0 was not required to be DX9-compliant? Kirk: That's right. Well, DX9-compliant is kind of an interesting expression. If I'm not mistaken, you could have a DX8 piece of hardware with a driver that just doesn't signal capability for any of the new features in DX9 and that would be [a] DX9-compliant hardware and driver combination. It just wouldn't have any of the new capabilities. So in DirectX, new capabilities are signified by what are called caps, or capabilities bits. For hardware that has the new capability, the driver will set those capability bits so if an application queries the driver, it will find out what are the new capabilities that it has. So it's possible, I think, to be DX9-compliant without actually supporting any new DX9 functions. Because of that, there's also all the different things that can be signaled as capabilities. You can support the different pixel shader models and the different vertex shader models independently by signaling that you have that capability or you don't. CPU: Do you think that incremental DX9 pixel shader versions, sort of like what we saw with incremental DX8 versions of hardware, is with us as a marketing thing from now on? Kirk: Where we are now is we have a big market presence of pixel shader 2.0 hardware. The next big thing coming is going to be hardware that can do pixel shader 3.0, and that will happen well before there is a DX10, for example. CPU: Do you believe it will be coming out in DX9 versions: DX9.1 or DX9.6, and we'll have pixel shader 2.1 or 2.2, that sort of thing? Kirk: I believe it has already been made public in the DX9 specification, what the definition is. It's just a matter of somebody actually producing it. There just isn't any pixel shader 3.0 hardware yet. But the spec has been solidified for some time. CPU: It seems like one the big things that is being hurled at NVIDIA currently is ‘You're not really DX9.' Do you want to say anything at all about what constitutes a DX9 part other than what you've already said? Kirk: The only thing that I'm aware of that anyone has said about us not really being DX9 is one developer who had not really had much experience with our hardware or our drivers made a comment about the performance and said ‘The performance is so slow, that you can't really call this DX9,' and due to the stature of this developer and how popular their games are, that statement has echoed with a lot of noise around the community in spite of the fact that it has no basis in fact. CPU: So they were just taking a completely subjective approach to it? Like saying ‘That's not really a Ferrari because it doesn't go 200 miles an hour'? Kirk: Yeah exactly. In fact, they were saying that simply because they had not ridden in it and ridden at 200 miles an hour and they had no knowledge of whether it would go that speed. CPU: So is this getting rather frustrating? To have some legitimate technical difficulties that you're facing at least in terms of adoption and then to have stuff come out that is really off the deep end, must be really frustrating. Kirk: Yes it is frustrating, but we're not really having any difficulties with adoption. We've seen no decline in market share in this product generation, and in fact, in terms of DX9 products, if you want to classify the GeForce FX family as DX9 products, we're seeing a gain in market share over the previous generation, so we're having no difficulty at all with adoption. Consumers love the products, and in fact, many developers are really enthusiastic about the products. I do find it frustrating when people who have the public's attention use what they call the bully pulpit to just say stuff when they're irresponsible. You have to treat those kind of statements as irresponsible if they're not backed up. My favorite quotation on this topic is something that Mark Twain said: ‘It's not what you know that's the problem, it's what you know that ain't so.' That's the problem. People talking about this when they don't really know what they're talking about. CPU: In one of your presentations, there has definitely been the point made that you haven't dropped market share especially in DX9 parts. The hurdles I'm thinking of are really having to do with things like the Half-Life 2 situation where they have developed for a certain code path and under those conditions, yes is doesn't run as fast as it does on ATI hardware and that there are things that need to be changed for that to be remedied. Kirk: Well let me just address your comment because I don't think it's actually correct. You're repeating what Gabe said, and I don't think it's true. It is true that with old drivers, with unoptimized code, things ran more slowly. I don't know, is it true with new drivers? None of us know because we don't have their code and they don't apparently want to talk about ours. So you don't know. CPU: So is that still the case, they still have not dealt with you guys about it? Kirk: I'm not aware of any communication that has happened to resolve this. The situation is such that there just isn't any information to be talking about yet. Also it's a beta game. It's still under development. CPU: Do you know if they were ever sent a version of the drivers that did not turn off fog? Because at the shader day presentation, that was their main beef with the new drivers. Kirk: Let's talk about that. The drivers they had were also beta drivers, and we've not had a chance to test their application with our drivers and see fog or not, so we are not able to tell what their problem was because we are not able to reproduce it here. The problem that we have is when we haven't released a driver yet, it's still in QA and that doesn't necessarily get all the right answers. And actually fog wasn't turned off, it was just rendered incorrectly, and fixing the bug did not make the code run more slowly on Half-Life or on any other hardware or game combination. ["We actually picked up a tick" interjected by Brian Burke from NVIDIA PR.] The problem here is again irresponsibly: talking about something that they didn't know to great negative effect. It's been an issue for us. It's cost [us], though I get to talk to people in the press a lot more and that's not all bad. I think that in the end, the biggest loser is Valve's reputation because they've really spent a lot of integrity in terms of these things that they've said and now can't back up really. CPU: A lot of this will come out in the wash. As soon as people can test this, the waters will become clear; it's just a matter of waiting until that happens. Are you concerned about how much damage will happen in the meantime vs. how vindicated you may be when you can actually see what your hardware does with their code? Kirk: I don't really see a lot of further damage happening. I've talked to a collection of people. I think in general Web site editors, magazine editors have become pretty skeptical of the claims from Valve and they're waiting to see. We've heard that we're all going to have to wait three months to see what the real answer is because the game isn't going to be released until after Christmas. I think it's a bunch of noise. The future will take care of itself. I'm confident that we will be providing great performance on Half-Life with no image-quality issues by the time it ships. Until then, it's just noise. CPU: Is there anything else that you want to say about that situation? Kirk: The No. 1 thing I have to say is I don't think anybody knows any facts yet. We'll all have to wait until there's a real version of Half-Life and real release drivers and we can test everything and talk about it and there will be facts. CPU: Are there any regrets for how NVIDIA has handled "unofficially" releasing high-performing drivers in the past? Kirk: I don't know what you mean. CPU: There would be a miraculous, anonymous release of an NVIDIA driver that suddenly made performance really amazing that would coincidently show up on the same day that your competition would be releasing something, but then those drivers would never end up being supported or public or official. Kirk: I don't think that we can always control driver leaks. When we have release candidates for our drivers, they go to hundreds of partners. We do tag each individual driver that goes out to Web sites or reviewers or customers so when there's a leak, we know who did it, and they just don't get the driver the next time. But there keep being new violators each time. People aren't very smart. They don't realize that they have a serial number on the version they get, so leaks keep happening. If we had our choice, we would not give anything but release drivers to anybody, but with the modern Web site-based reporting, and again just back to the situation with Valve, the amount of noise there is in the world, some of which is true and some of which is not. We just have to be responsive to all that and if people say ‘Hey you're a lot slower' we say ‘Well, here's what we have under development. Take it with a grain of salt. I'm not sure where it's going to end up. Just hang on to your hat there.' And sometimes that stuff gets out. I'm not really sure what you can do about that. CPU: So it seems you're saying that this is not a marketing technique that NVIDIA has used before. Kirk: No, I'm not saying that. I'm not aware of it if it is, but I'm not saying it's not. But I don't think it is a marketing technique that is really effective if in fact it's happening. I really have to say that what I have been more involved in is tracking down leaks. And that's a far bigger problem. Most of the drivers that get out are not ones that we intended to get out. We do not intend to have consumers use prerelease drivers. That's why we have releases: So that they get drivers that are QA'd and stable and good quality. CPU: Let's shift gears here . . . 0.13-micron: the benefits and pitfalls of shrinking early. Anything you'd like to say about that? Kirk: At this point in time, you could observe that we were aggressively early with 0.13-micron and paid some high prices in terms of ability to yield production and in terms of the cost that we had for development. In the end, 0.13-[micron] is a great technology, when it's done, but not before then. CPU: Any comments about Cg? Cg vs. HLSL? What motivated the expending of resources to create Cg? Does it have advantages for your hardware over HLSL? Kirk: Yeah. I think there is a very popular misconception about Cg: that we created it to compete with HLSL. And the fact is that we created Cg before there was an HLSL and we created Cg before there was an OpenGL shading language. Because we recognized in the course of development of GeForce FX that a high-level language was going to be required to program these devices effectively. And we began development approximately a year before any HLSL work started. Just from a point of view that this technology was necessary and if no one else was going to build it, we had to start off and build it. At this point the commitment that Microsoft has made to HLSL is such that we're enthusiastic about sharing the work with Microsoft and supporting HLSL fully on the DX platform. We really don't have any burning need to be in the language-development business. However, we had to kick-start it when there wasn't any. There's absolutely no reason why we would have to compete with HLSL on the DX platform, and in fact, we do not intend to. CPU: That's such a hard path to take: to try and kick-start something that big in an industry. Obviously, if your DX9 hardware had been early, that would have been a huge advantage because people could have programmed on it and developed on it and it could have set the standard for a lot of things to come. Whether that's a primary factor of NVIDIA creating Cg or just a great byproduct of NVIDIA deciding that the industry had to go that way doesn't really matter. You still would have been in a really good position. Looking back on how things eventually came about, would you still make that decision to expend the resources to push Cg early just to make sure that somebody did? Kirk: Sure. I don't think the industry would be where it is in terms of high-level languages if we hadn't pushed. CPU: Is there an ongoing relationship between Microsoft and NVIDIA to take advantage of that development or have you pretty much transferred that knowledge base over? Kirk: We continue to work very closely with Microsoft on HLSL technology. There's still work that we're doing that we make available to Microsoft for the HLSL compiler and we make sure that they can make the best compiler for our hardware that is possible to make. I'm sure that I can't clear it up, but I just want to say categorically that there's no competition going on between our efforts and Microsoft's efforts on HLSL. We are absolutely completely behind it for the DX platform. CPU: I think that's actually one that can be cleared up. Kirk: [laughs] Good I hope so. You wouldn't believe how many times I keep getting asked that question. CPU: Well, my question was ‘Does Cg have any advantages for your hardware over HLSL?' You designed it and they are designing for more generic stuff. I was looking at it more along the lines of if somebody's going to write for your hardware; is there an advantage to them using Cg? Kirk: Even separate from our hardware, one of the differences is the Cg language is more strongly typed and understands data types and precisions better. I think HLSL will get there, but it hasn't started from that point of view. I don't think that's really specific to our hardware. I'll just be a little disparaging about the ‘approximately 24- to 32-bit precision' decision that was made. When you are programming on a CPU, you don't just say ‘Oh yeah, give me whatever precision you want. 24 or 32, whatever's OK' You write a very specific program with a specific data type, and I think being sloppy about that has been painful for the whole industry. CPU: Along those lines, do you find that Microsoft tends to be arbitrary? Do they favor one vendor over another or is it a matter of relationships at the point in time when Microsoft makes the decision? How much are these technical decisions and how much are they influenced by lots of other factors? Kirk: I can't speak for Microsoft, but from my point of view, I find that our dealings with the technical folks at Microsoft are very straight up and we're very open. We're very happy with how that goes. I'll leave it up to them to say how they feel about the interactions [with us], but I think we have a very good partnership. I don't find any partisan behavior. I think they are in a position where they really have to be very even-handed, and I think they do a pretty good job of it. CPU: Is there anyone at NVIDIA I can talk to about the compiler in more depth? The biggest question is still completely open. Some people definitely think that the compiler is going to be optimizing shaders to the extent of simplifying them and some people definitely think it is not going to be doing that. Is there a definitive answer? Kirk: There is no question in my mind that us changing shaders so that they draw a functionally different picture than the artist and the programmer intended is wrong. I don't think we are going to do that. That's a definitive statement. We're not doing that as far as I know, and if I find out about it, I'll make it stop. CPU: Aren't there some circumstances where that would be a desirable thing? I'm not trying to be tricky. I'm trying to see if there is a legitimate case to be made where that is sometimes the right path to take. Kirk: Would it be desirable to make the wrong picture intentionally? I don't see how. CPU: If the sole intent is accurate rendering, no. If you're taking eight hours to render out a frame for a movie, no. But if it's going by at a sixtieth or a hundredth of a second and the goal is to play a game in real-time and to not get killed by that monster that's coming around the corner, I don't know. Kirk: Well, I know actually. We're not in the game authoring business and it's not up to us to reinterpret the artist's intent. If they've written a game that doesn't run fast enough, all we can do is try to make the right picture as fast as we can. I don't think it's something we should be doing to make the wrong picture at some arbitrary speed target. That's nonsense. There's no value in that for the consumer. CPU: Can I hypothesize where it is of value to the consumer? Somebody already has your hardware, and a title shows up that is very playable on your competition's hardware and it does not run at playable speeds on your hardware. Kirk: So if that is an issue and there is enough market share that that's significant, that could be resolved through a patch from the game developer that we would mutually agree on and they would change the game to take advantage of something different. It's not up to us to patch the game. CPU: So part of the problem is developers not wanting to take multiple code paths? Kirk: I don't think there is a problem. If the situation that you described happened, the game developer would absolutely do a patch no question in my mind. That's why I think all the noise about Half-Life is just nonsense, because in the end, Valve is not going to abandon 20 million customers. They are not going to make a bunch of their best customers unhappy. It just doesn't make any sense. So I don't think it makes sense for any game that has a really significant market share. Because of the market share that we have, they're very likely to have a big overlap with us. It doesn't make any sense for them to shoot themselves in the foot. They're going to do a patch or, even before the game ships, they're going to work with us to get the performance on the platforms already out there. CPU: Along those lines, you'd be very happy to work with Valve or someone in that situation? Kirk: Absolutely. CPU: One of the things they did say was that they had worked with you guys to optimize and they had spent quite a bit of time doing a code path where they used mixed mode and some other things. Is that just not true? They did not work with you? Have you guys never had their code in your labs to test and work with? Kirk: We have worked with them, although I have to question the logic of the statement. I wasn't at the press day where Gabe spoke, but I understand he said they made an optimized code path and it was slower. Well I'd have to comment ‘Fire your programmers and get new ones if you optimized and things got slower.' CPU: That is not what he said. I don't have my notes, but when they optimized and dropped down to partial precision, that was faster, that was not slower then their initial code. Kirk: So I have confidence that when we get to the end, when we've had good communication, when we've had a chance to look at what they're doing and help them out with it and we've had a chance to do more work on our optimization, then we'll get good results. There's just no magic here. There's nothing funny going on. A lot of times in the news there's more going on than meets the eye. Other times there is less. There's just not much going on here. But we're all hungry for news, so it's news.
|