Open Source ARM Mali Driver Runs Q3A Faster Than the Proprietary Driver 71
An anonymous reader writes "The lima driver project, the open source reverse engineered graphics driver for the ARM Mali, now has Quake 3 Arena timedemo running 2% faster than the ARM Binary driver."
There's a video showing it off. Naturally, a few caveats apply; the major one is that they don't have a Free shader compiler and are forced to rely on the proprietary one from ARM, for now.
Have Mercy! (Score:1)
Been humpin' on your mom all day!
Re: (Score:3)
The frame rate dropped into the mid-40s during some parts of the Quake 3 timedemo. What you consider a relevant benchmark would be a useless slideshow. Old benchmarks are quite suitable for demonstrating what you can expect from low-powered hardware.
Re: (Score:2, Informative)
We are vastly overcommitted on the fragment shader, and we also have limited CPU cycles to spare on this hw. Bodyparts flying is dragging us down significantly :) Having said that, our average FPS is in the mid-40s, just that lima is 47.2 and the binary is 46.2 :)
--libv
Re: (Score:2)
and you've just degraded it to the second most useless.
Not much in this driver? (Score:4, Interesting)
Based on the article, it seems like they first ported Q3A from OpenGL ES1 to OpenGL ES2, and then they used the closed source shader compiler to do most of the work (OpenGL ES2 forces most of the code to be in the form of shaders). It seems like they really didn't make much of an actual driver and just off-loaded most of the work to the shaders (I could be wrong though).
Re: (Score:2)
That sounds like a feature to me, so long as all the pieces are there. I'd sure love a completely-open-to-the-microcode platform, but what I need is something sufficiently open for there to continue to be drivers.
Re: (Score:3, Informative)
Hey there, I'm Connor Abbott, the lima compiler guy. No, porting from GLES1 to GLES2 was not necessary, it was just to debug a performance issue. While it is true that the demo uses the binary compiler, we *do* have the knowledge to write our own shaders - it's just the compiler that's lacking, and maybe my laziness :). For fragment shaders, we could pretty easily write our own shaders in assembly, it just hasn't been done yet (when I get around to it ;) ). For vertex shaders, we can't write anything in ass
2% isn't "faster", it's a measurement error (Score:1, Insightful)
AIUI the FOSS codebase is based on reverse-engineering the binary driver. So, there would be almost no reason to expect it to be faster. There may be some CPU time saved if they can create the command buffer quicker than the binary driver manages, but it's highly unlikely they can create a general solution that makes the GPU time reduce, since they're going to have to send the same commands to the hardware anyway. A better shader compiler might achieve something but ... they don't have that.
Ergo, 2% is a
Re:2% isn't "faster", it's a measurement error (Score:5, Interesting)
Quite often binary drivers are written by people who, either ported the code from other Operating Systems, or must maintain the code in such as way as to be able to share the code base with operating systems having different driver models. A pure free driver can lose a lot of cruft and can often have things like memory management better tuned for the system or interact with the hardware in more efficient ways.
The NVIDIA Ethernet driver from a few years back was a good example of that. The Linux people created a free driver that ran a lot faster than the binary driver forcing NVIDA to abandon their driver.
Re: (Score:2)
"There may be some CPU time saved if they can create the command buffer quicker than the binary driver manages, but it's highly unlikely they can create a general solution that makes the GPU time reduce, since they're going to have to send the same commands to the hardware anyway"
Or, they don't have to send the same commands, and have implemented a wrapper that actually works more efficiently than the native graphics code.
Re: (Score:3, Informative)
The numbers are in the blog post, which you haven't bothered to look at.
This is an ARM Cortex A8, running at 1GHz, with a Mali-400MP1 at 320MHz, and with 1GB DDR3 at 360MHz. Timedemo is fully consistent, every time. 46.2 for the binary opengles1 driver, 47.2 for the open source driver.
We are getting close to a shader compiler of our own, yesterday we had our first stab at compiling the few shaders needed for q3a, it failed though, but we are creeping closer on this insane and massive task of reverse enginee
Re: (Score:2, Informative)
Furthermore, this blog extensively explains how well the hardware behaves and how this 2% is mostly due to the fact that the prototype driver has less checking to do than a proper driver. No special tricks were used, especially none which are Q3A specific, this is how fast the hardware is, and we succeeded in using it just as efficiently as the binary driver, which is unbelievably significant for a reverse engineered graphics driver.
Re: (Score:1)
2% is significant. Other tests run up to a third faster (spinning companion cube) now. I am sure that when i get my hands on a more powerful mali and a more powerful CPU, the numbers will improve. But for now, we are faster, but only by a small bit, and this is a massive victory for free software in itself.
--libv
Re: (Score:1)
Thanks man, you really know how to spur people on. Manyears of hard labour, all done in spare time, and there is this to show for it, and all you can do is complain that it is only 2% faster.
I hope you feel real good about yourself now.
--libv.
Optimized Code (Score:4, Funny)
if (Quake3) show_fps += 30;
Re: (Score:2, Redundant)
Re:Optimized Code (Score:5, Funny)
It's happened in the past that certain drivers have claimed better performance while at the same time completely ignoring certain things they were supposed to be doing in order to get the framerate up. Do the frames end up looking exactly the same with both drivers? What exactly is making it faster. Did they improve a specific part which only helps for Q3A demo files and doesn't actually make any difference when playing a real game.
All interesting questions. If only there was a long block of text which covered those points. I've never heard such of a thing though. But, I'm going to coin a new term, "TFA" to refer to the hypothetical object.
Anyone with me on this?
Re: (Score:2)
Re: (Score:2)
Sometimes the driver uses special optimized paths depending on the name of the executable. That was known in the past, so they could optimize for benchmarks and games. Even certain configurations of GL function calls were faster than others eg. glDrawArrays
http://www.spec.org/gwpg/gpc.static/Rulesv16.html [spec.org]
Re: (Score:1)
Yes, and no. We are more aggressive in scheduling jobs. This might be completely legal, and it might be that the falanx guys decided that the higher cpu overhead and increased context switching was too much for lower power, and single, ARM cores. This does get us the 2% increase in framerate, and gains us insanely more for other, less PP intensive, tests. Like 30%. And this for comparable CPU usage, which in case of the spinning companion cube is around 10%.
The reduced cpu usage is most definitely because w
The next step (Score:1)
Re: (Score:2)
Sure, I will have your answer ready next fucking century when that type of hardware is available.
Re: (Score:2)
Oh thanks for that insightful post, I had no idea that multicore CPUs existed and we can do GPGPU with CUDA and CL!
Attention dumbass: I was referring to the lack of CL drivers for mobile GPUs. The Mali T604 does not yet have a CL driver. There are no consumer available mobile GPUs that ship with CL drivers.
And then on top of that, the OP wanted a CL device based on the T604 to be driven over Thunderbolt. LOL! Like I said, I will get back to you in 10 years when that shit is actually available for purchase
Re: (Score:2)
That's wonderful that a driver exists in a lab somewhere, I'm very happy for you. But they've had that for years now. Those drivers are STILL not in the market. I would love to be wrong as I have the cash to make a purchase.
But no mobile device out there has a working CL driver that developers like myself can use. Not even the arndale board (which uses T604) has a working driver.
Getting Samsung or some other ARM licensee to then put the T604 (or any other mobile GPU for that matter) into a thunderbolt inter
Re: (Score:2)
Re: (Score:2)
"I have the serial port set to 9600 8N1 since I read somewhere that is standard... I don't know why it's so slow and unreliable.."
Your ATH string is fucked. Perhaps you should look up some old BBS documentation to get up to speed.
"It's super awesome and plays Duke Nukem 3D way better than plain old DOS..."
FTFY.
Re: (Score:2)
ARM having drivers doesn't do anything for me as a developer if I cannot use said drivers. If you actually read the link you posted you would learn that the CL driver you speak of is NOT AVAILABLE.
Re: (Score:2)
If you actually give a shit and aren't just trolling me, then the arndaleboard.org forums will demonstrate that the driver is not yet available.
Re: (Score:3)
Posted Today, 11:40 AM
Hi JimV
Currently the only developer board I am aware of with an OpenCL compatible Mali GPU is the Arndale board. Drivers for this board would have to come from Insignal, but I am not sure what the current status of this support is. The demos themselves will run on desktop however, if you modify the platform.mk in the root directory to use gcc rather than a cross compiler. Provided the necessary libraries are installed on the host machine, the demos will run. The Nexus 10 tablet also co
Re: (Score:2)
That is an android library that has no API exposed. I'm sure google is working hard on getting us there, but currently its not yet done. The arndale CL driver that will come first is the vanilla linux version.
To answer your question, I am not sure. I don't know if any mobile SoC has I/O fast enough to feed the 4 PCIe lanes needed for thunderbolt. It would be cool if there was, but honestly id rather use a desktop GPU over thunderbolt for CL work...
Sorry for being a jerk, was having a really bad day. Thanks
Re: (Score:1)
"While most GPUs use 8 or 16 lanes, 4 lanes via thunderbolt is viable for compute bound tasks right now."
Not really, seeing as many newer applications are so poorly coded that they need every gigabit of bandwidth possibly available.
We moved from AGP to PCI-E, didn't even saturate the AGP 8X bandwidth, suddenly, everything runs like shit on AGP.
Should have stuck with AGP and let CPUs take up the slack. Even today's newest CPU can't compete with the power of a GPU using the same power/TDP.
All it takes is the
Re: (Score:2)
hey, look, I get modded down for stating a fact.
http://tinypic.com/player.php?v=2il1ydc&s=7 [tinypic.com]
Got a problem, people?
Hmmm (Score:2)
While its quite nice to have a quake III bench, and be on a mobile platform that in fact means some great fun could be had amongst friends, its an old bench, and an old game.
It used to be something Amiga people benched against in later years to try to implicate an idea on relevance.
Having capable GPU's in mobile stuff (Hi Intel Atom based netbooks!) is a great idea. All for it, and you have to love the low cost of the platforms making it available to more people.
Re: (Score:2)
While its quite nice to have a quake III bench, and be on a mobile platform that in fact means some great fun could be had amongst friends, its an old bench, and an old game.
and yet it is the best looking usable game on tablets/phones right now :)
2 whole percent? (Score:5, Interesting)
Re: (Score:1)
It isn't within random fluctuation levels. I would assume the tests were run with a large enough sample size to make random fluctuations statistically insignificant. Just that 2% is not a significant change for gaming. If we were in the world of high frequency trading, 2% would be worth billions.
Re: (Score:2)
So it's a value that's well within random fluctuation levels?
Now compare it to the performance before this update, and get back to us on whether it's news, at least to people who care about this chip.
Re: (Score:1)
The news is that there _is_ an open source, reverse engineered, driver which is matching the binary driver in performance. Matching as there really is little more to gain from this hardware without hacking Q3A itself. This is as fast as the hardware is, and we actually manage to use it just as well as the binary driver, without any Q3A specific hacks.
--libv
Texture switching (Score:4, Informative)
Re: (Score:2)
>mfw models and textures shouldn't be shit on a more modern system like an ARM core.
Re: (Score:2)
This is probably because Q3 had to work on Voodoo cards, which had very limited texture sizes (256x256 I think? something dumb...) and so you couldn't atlas textures to the extent that you do on today's GPUs.
Re: (Score:2)
Q3Test was worse in this regard, the Railgun model used around 12 textures
Re: (Score:1)
Hey Luc, why not drop round the Raspberry Pi forum and tell them about this. As you know they are a friendly bunch of guys and will want to offer you their congratulations.
For the benefit of those who don't realize it, this is sarcasm. Read this and see both Eben and Liz Upton at their "charming" best and you'll understand: http://www.raspberrypi.org/archives/2221 [raspberrypi.org]
It's a pity the mainstream media haven't mentioned these sorts of events which have occurred numerous times on their forums. The Raspberry Pi Foundation and the Raspberry Pi apologists ought to brace themselves though, the PR bubble and hype surrounding the Pi won't last forever. Eventually reality will prevail.
Re: (Score:1)
Yeah. I am bracing myself already for when the rpi trolls learn about the content of my talk. They seem worse than some /. users ;)
--libv
Great progress, and thanks for working on this! (Score:2, Insightful)
Your work is appreciated!
Ignore all the idiots who hate their lives that lurk around /. criticizing every accomplishment of others. /. is starting to suck. Your work though is great!