Click for MacBook/Pro Upgrades!
A Click shows your site support to my Sponsors
![]() The Source for Mac Performance News and Reviews |
|||||||||||||||||||||||||||||||||||||||||
|
How the PowerPC 7400/G4 differs from the PowerPC 750/G3 Published: 9/08/99 |
|||||||||||||||||||||||||||||||||||||||||
Note! This is a 1999 written article. For the latest info and performance tests of G3 vs G4 CPUs - see the (1999 article follows) Several readers commented on my first G4 vs G3 performance tests; those that were aware of the G4 and G3 Motorola specs didn't seem surprised at the results of tests in applications that didn't use Altivec (G4 specific) instructions. One reader in particular wrote with details from Motorola documents on the CPUs that I'm listing here to help explain the similar performance when running most of today's current (non-Altivec) software and to highlight the differences in the two processors. Eric Harruff wrote (emphasis mine): "This is from the MPC7400 RISC Microprocessor Technical Summary (available on their web site):" 1.13 Differences between the MPC7400 (G4) and the MPC750 (G3) The design philosophy on MPC7400 is to change from the MPC750 base only where required to gain compelling multimedia and multiprocessor performance. MPC7400's core is essentially the same as the MPC750's, except that whereas the MPC750 has a 6-entry completion queue and has slower performance on some floating-point double-precision operations, MPC7400 has an 8-entry completion queue and a full double-precision FPU. MPC7400 also adds the AltiVec instruction set, has a new memory subsystem (MSS), and can interface to an improved bus, the MPX bus. The following sections discuss the major changes in more detail.
Core Sequencing
FPU
AltiVec AltiVec is designed to improve the performance of vector-intensive code, as can be seen in such applications as multimedia and digital signal processing. AltiVec-targeted code can speed up many two-dimensional and three-dimensional graphics functions 3-5 times, especially core functions in 3-D engines and game-related 2-D functions.
Memory Subsystem (MSS)
Load miss folding For example, on the MPC750 if a load or store (access A) misses in the data cache. Then a subsequent load (access B) to the same cache block must wait until the critical word for A is retired. Because of this, any subsequent loads or stores after access B also cannot access the data cache until the reload for access A completes. On the other hand, with MPC7400, load or store access A misses in the data cache, and while the data is coming back, up to four subsequent misses to the same cache block can be folded into the LFQ, and subsequent instructions can access the data cache. Loads are blocked only when the reload table or the LFQ are full.
Store miss merging
Cache Allocate on reload If access A misses in the cache, the MPC750 immediately identifies the victim block (call it X) if there is one and allocates its space for the new data (call it Y) to be loaded. If a subsequent access (access B) needs this victim block, even if access B occurs before Y has been loaded, then it will miss because as soon as X is allocated it is no longer valid. After Y has loaded (and, if X is modified, after X has been cast out), X must be reloaded, and B must wait until its data is valid again. MPC7400, on the other hand, delays allocation/victimization until the block reload occurs. In the example above, while Y is being loaded, B can hit block X, and a different block is victimized. This allows more efficient use of the cache and can reduce thrashing. On MPC7400, allocation occurring in parallel with reload which uses the cache more efficiently.
Outstanding misses
Miss under miss
L2 cache
Fewer sectors per tag allows the cache to be used more efficiently MPC7400 and the MPC750 also have different cache reload policies. On the MPC750, an L1 cache miss that also misses in the L2 causes a reload from the bus to both L1 and L2. On MPC7400, misses to the L1 instruction cache behave the same way, but misses to the L1 data cache behave differently: data is reloaded into the L1 only. Thus, with respect to the L1 data cache, the L2 holds only blocks that are cast out; it acts as a giant victim cache for the L1 data cache. This improves performance because the same data is duplicated in the L1 data cache and L2 less often.
60x bus/MPX (MaxBus) bus For example, MPX bus supports data intervention. On the 60x bus, if one processor does a read of data that is marked Modified in another processor's cache, the transaction is retried and the data is pushed to memory, after which the transaction is restarted. MPX bus allows the data to be forwarded directly to the requesting processor from the processor that has it cached. (MPC7400 also supports intervention for data marked Exclusive or Shared.) The MPC7400 can support up to seven simultaneous transactions on the bus interface (60x or MPX bus) while the MPC750 supports only two.
Related Links:
|
|||||||||||||||||||||||||||||||||||||||||
|
|
|||||||||||||||||||||||||||||||||||||||||
|
|
|||||||||||||||||||||||||||||||||||||||||
|
Copyright © Mike, 1999. No part of this sites content or images are to be reproduced or distributed in any form without written permission. Users of the web site must read and are bound by the terms and conditions of use. |
|||||||||||||||||||||||||||||||||||||||||