This page covers Benchmark test results using MacBench 5, Throughput 1.5, CineBench 2000, Walker 3D, RaveBench and "Let1KWindowsBloom". (The latter reports the time required to create/close 1000 windows in OS X.)
MacBench 5.0 Results The graph below shows the results of MacBench's Graphics and Publishing Graphics scripted tests (emulation of Mac applications graphics calls, scrolling, zooming, searching/replacing, etc.). Resolution for both tests was 1600x1200 (millions colors)
Graphics Primitives Results: In MacBench's Quickdraw primitives tests, the Nvidia cards were often 10-15% faster than the ATI cards in most functions except text/character related ones like DrawChar, DrawString and DrawText, in which the ATI cards were faster. I don't know how much weighting they give to some primitives like Blends, but the Nvidia cards were 3,900% faster at Blend primitives than the ATI cards. I've written to my ATI contacts to note this and ask for their comments. (I saved screens of the primitives test results with each card also. If there's enough interest I will post a separate page of those results. The list of tests is very long and would require a separate page.)
Create/Close 1000 Windows (OS X)
A reader sent a Carbon benchmark some time back via email called "Let1kWindowsBloom". Since there are so few benchmarks for OS X I used it with both cards. The time to create/close 1000 windows with each card is shown in the graph below.
Dual Display Tests: I ran the same test with the 8500 in a Dual G4/533 with one and two displays connected. The test was run 3 times and the average time noted in the graph below. (1600x1200 CRT used for Appleworks, 2nd display was a 1024x768 VGA LCD monitor. Both monitors set to millions colors.)
As you can see above, for this simple benchmark the 2nd LCD display had no effect on performance.
(Graph updated 3/17/2002 to show results with OEM GeForce3 in DP533.)
RaveBench 1.11 Test Results
I used VillageTronic's RaveBench 1.11 Benchmark (only available on the CD supplied with their graphics cards to my knowledge). I ran this comparison at the highest resolution supported by RaveBench - 1024x768. Desktop (Monitors Control Panel) was set to 1280xx1024, millions color mode. (The desktop must be set higher than the Ravebench test window.)
By watching the window of each test, you can also see how well the card/drivers support the feature tested (as far as image quality). The only apparent issue I saw with was that the GeForce4MX, GeForce2MX and GeForce3 (all use the same driver set in OS 9) showed far less actual transparency than the ATI cards. See the image comparisons below. (The blue water should be fully transparent, showing the ocean floor below.)
Nvidia Card's Transparency Image Quality
Although a JPEG image isn't the best for illustrating this, you can see in the ATI image below that the ocean 'floor' is visible through the transparent water.
ATI Card's Transparency Image Quality
Although I've not tested every RAVE app., as a FYI the water in the Unreal Tournament Tutorial appeared fully transparent even with the Nvidia cards.
For more info on Ravebench's tests (and sample scenes) see my 1998 Illustrated Guide to Ravebench.
Walker 1.2 QD3D Tests
I measured the minimum and maximum framerates using Lightwork's Walker program (no longer available at their web site) using the highest polygon scene; Corridor (49,002 polgyons). Graphics mode was set to 1280x1024, millions colors. Scores are in Frames Per Second (higher is better) based on min/max rates displayed during ten spins (rotations) of the scene. (In my opinion the most important figure is the lowest framerate during the test, as that indicates how the card handles the most demanding part of the scene.)
On close inspection, all cards show gaps at the edges (seams) of objects in some areas of the Corridor scene. (For example around the window frames/wall seams.)
CineBench2000 Tests: Maxon's Cinebench 2000 benchmark (available here) is a cross-platform 3D application simulation benchmark. Cinebench reports a score for Software shading (no OpenGL hardware acceleration), OpenGL Shading (hardware accelerated), a single CPU raytracing score and a Dual CPU raytracing score. (I show both single and dual CPU results just as a FYI as the Software and OpenGL scores are the main point of graphics card tests.) The test was run under OS 9.2.2 at the recommended 1024x768, millions colors graphics mode.
As you can see from the scores - the graphics card made very little difference in this benchmark. For software that supports dual CPUs, you can see the benefits in the rendering score.
ThroughPut 1.5 Tests: Rene Trost's ThroughPut 1.5 benchmark was also ran. Here's how Rene describes this 'pure' benchmark:
"
ThroughPut is a little application, written in PPC assembler, that tests how much data your Mac can
push through the PCI or AGP port to feed your graphics card with data."
See his web page for more details, but basically his benchmark tests the amount of data the system can deliver to the graphics card using several modes - CPU (32bit stream), FPU (64bit stream), Altivec (128bit stream) and CopyBits (256bit busmaster). [Altivec requires a G4 CPU of course.]
Also see below for comments from an ATI engineer in reply to my questions about this benchmark from last year. (I had kept that reply but forgot to post it here originally.)
The CPU/FPU scores of the Radeon 8500 and GeForce4MX card are the highest I've seen by a factor of about 2. I can only guess these cards have fast writes (direct to vram) or write combining enabled perhaps, but will try to find out more. (Some of the results are near the theoretical max 4x AGP bus rate of 1066.67 MB/sec.) The pre-release OS 9 drivers for the 8500 however show slightly lower CopyBits performance than previous Radeon cards.
As you can see in the other tests here, these scores do not translate anywhere near linearly in actual applications or game tests as shown in those test results elsewhere in this article. (For more info on AGP, see this Intel AGP Technology page. Fast writes and software write combining are covered on this page.)
Here's comments from an ATI engineer in reply to comments I had on this benchmark from about a year ago. (When the Benchmark was first released I had questions about the scores during previous review of GeForce2MX vs Radeon cards here last March.) Also read Rene's reply to his comments (Rene wrote the Throughput benchmark)
"
Mike, here's the response from another engineer here:
Throughput does not really measure AGP performance because:
A) Only the AGP Master, the graphics card, can initiate AGP transfers and
this application does not do that
B) The test actually is measuring RAW slave cycle linear write performance
across the bus (PCI 66 (same for AGP 1x/2x/4x/etc) or PCI 33). This does
not represent real world performance since how often does the CPU start at
the top of the display and update the entire frame buffer? Almost never!
C) The CopyBits test scan be considered valid as it is copying a huge amount
of data from the GWorld in system memory to the video cards frame buffer.
However, the likelihood of this type and size of CopyBits operation
occurring under normal application use is almost never.
D) The one thing that this test can determine is whether the graphics card
(ASIC) and the North Bridge support Fast Write and whether that feature is
enabled. This requires a system that is one of the new AGP 4x Apple
machines. However Fast Writes do not directly translate into superior 2D
graphics performance. It can only increase the speed of data being moved
across the bus when the data is at linear addresses, using consecutive
writes. CopyBits (SrcCopy) of large images, as this test does, will show a
significant performance increase but in a real world application, there are
many individual copies to account for the pitch of the image versus the
frame buffer and thus this advantage of Fast Writes isn't fully taken
advantage of.
"
The author of the Throughput benchmark replied to the above comments:
"
I really can't accept some statement he has given. [the comments above-Mike] Sure, ThroughPut does
not reflect real world performance (in most cases) but the reason is not
the kind of the data block being transfered, it is not a huge linear
block, there is a break after each single line of at least 32Bytes. The
data block is just 640 x 480 pixels - I'm sure MANY desktop redraws are
larger.
The routines I used in ThroughPut are nearly identical to them I used in
MacMoorhuhn 1 & 2 to draw all the graphics (in MMH2 most objects are
LUV-AntiAliased - still unique on the Mac - even current GFX-Cards do
not support that kind of blitter). In both games nearly every graphic
block is just 64*12 pixels large, copied out of a 1920*480 large bitmap
(a real cache-line killer!) and that for up to 6 layers + sprites. The
game easily reaches far over 200fps on a G4/400 with a GeForce2MX but
only around 20-30fps if the graphics is done via CopyBits or
DrawSprockets!! This means a speed difference nearly factor 10! Even
with current OpenGL drivers you won't be able to reach such a
performance if you draw 6 1920x480 semi-transparent bitmaps and
hundreds of sprites (especially not with just 2MB VRAM ;)
Everyone who ever traced through CopyBits, to see what is does knows why
(regardless if you have a NVidia, ATI or what ever for a graphics card
installed.) it is so "slow".
One reason why current (and past) system drawing routines are "slow"
compared to the routines used in ThroughPut, MMH1&2 or Pixelstorm (my
first demo project - not very optimized but still fast), it that they
written in higher languages (C/C++ or similar), compiled with compilers
written in higher languages. I've done a lot of performance tests in the
past three years and I didn't find any C compiled routine (CW,MPW or
GNU) that wasn't at least 30% slower as the same routine written in
(unoptimized) PPC-assembler, in fact, in most cases the speed gain was
far behind 100%.
Most 2D drawing routines on the Mac are MUCH slower than similar
functions on the PC, just because they are not that optimized.
Sure, I'm an assembler-junkie, I love it to get all out of my machine
and I see that it is time intensive to optimize code to get most out of
the system architecture but in my eyes it is worth to do, for user
experience and my satisfaction as a programmer. I know assembler code
isn't as portable as C/C++ or other higher programming languages but I
don't need (and I don't want) to care about, regardless of what the target
system is.
I really wish that more developers think that way and would learn more
about PPC-Assembler - not that easy - even most assembler examples on
Apple's site are useless because they are totally out dated (most are
still 68k).
Greetings from Cologne,
Rene Trost."
The next page covers the 8500's Software Control Panels and TV/Video out options. (OS 9 and OS X examples)
Related Links: For more info on graphics card performance, reviews and other related articles - see the main www.xlr8yourmac.com site's video cards page.
|