Re: Low cost cluster

25 Feb 2003


      On Tue, Feb 25, 2003, Jonathan Morton wrote:
...
...
...
Thus I obtained an effective performance figure of 2.0 MFLOPS/MHz,
versus Athlon-XP figure of 0.5 (for both x87 and SSE) and Pentium-4
figure of 0.4 (for SSE).  This is not hype - this is me reading the
documentation and doing the maths.
This is an estimate, not a benchmark. It would be interesting what the
Apple C compiler would make from your code. Your code seems to map to
AltiVec very well, this is pretty rare.
My algorithm is made up of standard matrix operations, such as 
multiplies and inversions.  By transposing one of the matrices 
beforehand, the individual operations are mostly sequential in terms 
of memory access and uniform in terms of operation, which means it 
does map very well to vector code.
Hi,
Have you looked into atlas for matrix operations?  It gets a very high
percentage of peak performance on the ia32 class (as well as on other
architectures), and may very well change the assumptions you are
making.  We use it extensively on our 50 node (100 AthlonMP 2000,
100Gb RAM) cluster to do Density Functional Theory.  Just FYI, FFTW
(Fastest Fourier Transforms in the West) is also a self-optimizing
codebase that gives very good performance.  Both are easily found via
google.
HTH,
Daniel
-- 
Daniel A. Freedman freedman@physics.cornell.edu, Graduate Fellow
  Electronic Structure Calculations, LASSP, Cornell University
Free University Project:   http://www.freeuniversityproject.org
  Help build an accredited open-admission, free-tuition online university!

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: Low cost cluster