Low cost cluster

Tue Feb 25 09:28:34 CET 2003

On Tue, Feb 25, 2003, Jonathan Morton wrote:
> > > Thus I obtained an effective performance figure of 2.0 MFLOPS/MHz,
> >> versus Athlon-XP figure of 0.5 (for both x87 and SSE) and Pentium-4
> >> figure of 0.4 (for SSE).  This is not hype - this is me reading the
> >> documentation and doing the maths.
> >
> >This is an estimate, not a benchmark. It would be interesting what the
> >Apple C compiler would make from your code. Your code seems to map to
> >AltiVec very well, this is pretty rare.
> 
> My algorithm is made up of standard matrix operations, such as 
> multiplies and inversions.  By transposing one of the matrices 
> beforehand, the individual operations are mostly sequential in terms 
> of memory access and uniform in terms of operation, which means it 
> does map very well to vector code.

Hi,

Have you looked into atlas for matrix operations?  It gets a very high
percentage of peak performance on the ia32 class (as well as on other
architectures), and may very well change the assumptions you are
making.  We use it extensively on our 50 node (100 AthlonMP 2000,
100Gb RAM) cluster to do Density Functional Theory.  Just FYI, FFTW
(Fastest Fourier Transforms in the West) is also a self-optimizing
codebase that gives very good performance.  Both are easily found via
google.

HTH,
Daniel

-- 
Daniel A. Freedman <freedman at physics.cornell.edu>, Graduate Fellow
  Electronic Structure Calculations, LASSP, Cornell University
Free University Project:   http://www.freeuniversityproject.org
  Help build an accredited open-admission, free-tuition online university!