INSTALL QUESTIONS:
POST-INSTALL QUESTIONS:
Who uses ATLAS?
ATLAS can be used by anyone needing fast linear algebra routines. ATLAS
is used directly by a great many research scientists. Because of the
open nature of ATLAS, we have no way of knowing how many users of ATLAS
there are. In the following paragraphs, we indicate some of the users
that we know about, but this is far from a complete list.
ATLAS is used, or is planned to be used, in the following PSEs:
Additionally, ATLAS is included in some way by the following OS distributions:
What are the academic references for ATLAS?
The academic references for ATLAS are given in bibtex format below. If
you want to reference one paper only, probably the newest (first shown)
is the best, as it references the others. The first two papers
contain the bulk of the needed information. Referencing the homepage can
help other researchers find the software.
Note that there have been quite a few subsequent papers that discuss ATLAS (with varying degrees of accuracy and detail) written by people not directly involved in ATLAS's production and design. While these papers may be about ATLAS, they are not, obviously, primary sources, and should not be cited as such. If the paper is not authored by Whaley or Petitet, it is not a primary-source ATLAS paper.
@ARTICLE{whaley04,
AUTHOR = "R. Clint Whaley and Antoine Petitet",
TITLE = "Minimizing development and maintenance costs in supporting
persistently optimized {BLAS}",
JOURNAL= "Software: Practice and Experience",
volume = "35",
number = "2",
pages = "101-121",
month = "February",
YEAR = "2005",
NOTE = {\verb+http://www.cs.utsa.edu/~whaley/papers/spercw04.ps+}
}
@ARTICLE{WN147,
AUTHOR = "R. Clint Whaley and Antoine Petitet and Jack J. Dongarra",
TITLE = "Automated Empirical Optimization of Software and the
{ATLAS} Project",
JOURNAL = "Parallel Computing",
VOLUME = "27",
NUMBER = "1--2",
PAGES = "3--35",
YEAR = 2001,
NOTE = "Also available as University of Tennessee LAPACK Working
Note \#147, UT-CS-00-448, 2000
({\tt www.netlib.org/lapack/lawns/lawn147.ps})" }
@inproceedings{atlas_siam,
AUTHOR = {R. Clint Whaley and Jack Dongarra},
TITLE = "{Automatically Tuned Linear Algebra Software}",
BOOKTITLE = "Ninth SIAM Conference on Parallel Processing for
Scientific Computing",
NOTE = "CD-ROM Proceedings",
YEAR = 1999 }
@inproceedings{atlas_sc98,
AUTHOR = "R. Clint Whaley and Jack Dongarra",
TITLE = "Automatically Tuned Linear Algebra Software",
BOOKTITLE = "SuperComputing 1998: High Performance Networking and Computing",
YEAR = "1998",
NOTE = "CD-ROM Proceedings. {\bf Winner, best paper in the systems
category.}\\
URL: \verb+http://www.cs.utsa.edu/~whaley/papers/atlas_sc98.ps+"
}
@techreport{atlas_wn97,
AUTHOR = {R. Clint Whaley and Jack Dongarra},
TITLE = "{Automatically Tuned Linear Algebra Software}",
INSTITUTION = "University of Tennessee",
YEAR = "1997",
MONTH = "December",
NUMBER = "UT-CS-97-366",
NOTE = "URL : \verb+http://www.netlib.org/lapack/lawns/lawn131.ps+"
}
@UNPUBLISHED{atlas-hp,
TITLE = "ATLAS homepage",
AUTHOR = "{See homepage for details}",
NOTE = "http://math-atlas.sourceforge.net/"
}
Does ATLAS run on my platform (OS/hardware)?
ATLAS should produce optimized libraries on almost any platform
possessing an ANSI/ISO C compiler, and some Unix-like command-line tools
(eg., make, cp, etc). ATLAS runs on pretty much all Unix variants
(including embedded systems), as well as Windows (Windows users must install
the free cygnus tools).
What software license does ATLAS use
(AKA: in what ways and for what purposes am I allowed to use ATLAS)?
ATLAS uses a BSD-style license, without the advertising clause. ATLAS's
license is taken almost verbatim from the example given at
opensource.org. Here is the relevant portion of the license,
as taken from an ATLAS source file:
* Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions, and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 3. The name of the ATLAS group or the names of its contributers may * not be used to endorse or promote products derived from this * software without specific written permission. * * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED * TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR * PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE ATLAS GROUP OR ITS CONTRIBUTORS * BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE * POSSIBILITY OF SUCH DAMAGE.To see the exact license, simply edit almost any source file in the ATLAS tarfile (eg., ATLAS/config.c).
There is one way the ATLAS license may be different. If you are on an alpha, and you say "yes" to using the Goto GEMM, this add-on is licensed under the terms of the LGPL, which then "infects" the entire library with the same license. Note, however, that if you don't want this license, you must simply answer "no", and the library will retain the above license instead of the LGPL.
How do I get help/technical support with ATLAS?
Your first resource for should always be
the ATLAS errata file. This file keeps track
of all discovered errors in ATLAS, and their workarounds or fixes. It
also contains workarounds for common system problems (eg., compiler errors,
non-standard commands, etc), as well containing advice necessary to get
the best performance on various machines.
If you have downloaded the ATLAS source, your ATLAS/doc directory contains some useful documentation, though it is often more dated than the info in the errata and online.
If (and only if) neither of these sources provides the information you need, you can can submit a support request to:
Do not, under any circumstances, post your support request to the "bug" tracker. As documented on the tracker itself, this is for developer confirmed bugs only. All users should use the support or feature request trackers. Things that turn out to be bugs will later be escalated to the bug tracker by the confirming developer.In addition, please understand that the tone of your support request is important, as described here.
You do not need to create SourceForge account in order to use the tracker (that persistant plea to "please log in" can be ignored), though it makes things easier if you do. In particular, if you don't log in, you won't be able to later attach extra files, etc (you can attach a file in your initial report, but afterwords, it is unsure you are the original poster, so it won't allow it). So, if you think you may need to do this kind of thing relatively often, it may be worth doing.
Note that you should upload the error_[ARCH].tgz file as well. If the error killed the ATLAS install before it succesfully created the error tarfile, create it yourself by issuing the following command from your ATLAS/ subdirectory:
make error_report arch=[ARCH]
Note that the [ARCH] of the above directions should be replaced by your architecture string that ATLAS is using (eg., Linux_P4SSE1 or SunOS_SunUS4, etc).
What documentation is available (usage info)?
ATLAS's main job is to provide optimized libraries, so most of the
documentation is on the appropriate APIs. ATLAS does provide some
executables, but these are merely testers and timers for the provided
libraries. A very rough description of the operation of these executables
is given in ATLAS/doc/TestTime.txt in your ATLAS source directory.
Here's some pointers to ATLAS documentation:
What mailing lists, archives, and so on does ATLAS have?
ATLAS has the following tracker lists:
ATLAS also has various mail lists and archives. Anyone can sign up or post to these guys. They are:
Can I download a prebuilt binary instead of installing
from source?
We provide prebuilt ATLAS libraries for some selected
architectures/configurations. The naming scheme for ATLAS binaries is:
Because of the maintainence costs, binaries tend to lag behind the source code in features/error correction. Therefore, even if you see a prebuilt for your system, it is critical that you scope the errata file for the bugs that have been found since that release. If bugs that effect you have been discovered since the binary was compiled, you will need to compile from source so that you can apply the fixes, even if a binary is available.
If there is no prebuilt for your architecture or SMP configuration, just compile from source.
Can I get ATLAS in rpm or .deb or some other format?
Our only supported format is a gzipped tarfile. If you really feel the
need for .rpm or .deb versions, other parties (eg, Debian, SuSE) provide
them (note that we can't answer questions on ATLAS installed in this way,
however, since we don't know much about them).
What does the version number of ATLAS mean?
ATLAS version numbers look like:
<major number>.<minor number>.<update number>.
The meaning of these terms is:
So, 3.2.1 would be a stable release, with one group of fixes already applied. 3.3.12 would be the 12th update (13th release) of the associated developer release.
How can I tell what version of ATLAS I have?
For ATLAS version 3.3.6 or newer, you can find out version and build
information via the routine
ATL_buildinfo. The following complete program will give build
information (including version number) when linked against version 3.3.6
or later libatlas.a's:
main()
/*
* Compile, link and run with something like:
* gcc -o xprint_buildinfo -L[ATLAS lib dir] -latlas ; ./xprint_buildinfo
* if link fails, you are using ATLAS version older than 3.3.6.
*/
{
void ATL_buildinfo(void);
ATL_buildinfo();
exit(0);
}
If you are using an ATLAS version prior to 3.3.6, there is no easy way to find the version information without looking at the source. If you have the source tree around, the easiest fix is to examine pretty much any source file (eg. ATLAS/config.c); the major and minor version number will be given in the copyright notice at the top. To find out the update number, you'd have to consult the actual routines updated by the particular update, as given in the ATLAS errata file.
What's the difference between stable and developer
ATLAS releases?
The vast majority of ATLAS users should download and use only the stable
version of ATLAS. Stable versions of ATLAS are released roughly once
a year. The most current stable release has an associated errata file,
which details all errors found in the release. Stable releases are very
well tested, and are the only release that the ATLAS team answers support
questions on.
Developer releases, on the other hand, are meant to be used, as the name suggests, by ATLAS developers, contributers, and people happy to live on the bleeding edge. Developer releases are meant to allow access to the newest ATLAS sources, and may represent a simple snapshot of the internal developer tree. As such, they are essentially untested, and may not build, much less run, correctly. They are also completely unsupported. So, while they may possess features not available in the current ATLAS release, only the most experienced of users should consider utilizing them.
Developer releases are available from the
developer site,
while stable releases are available from the
ATLAS main page. Stable
and developer releases are also distinguished by their version numbers,
as explained here.
What LAPACK routines does ATLAS provide?
The only way to be sure you have the most up-to-date list is to examine
the source in ATLAS/interfaces/lapack/F77/src/. It is pretty
much a foregone conclusion that any documentation, this page included,
will eventually become out of date. ATLAS3.6 provided C and Fortran77
interfaces to these routines:
Since LAPACK has no official C API, ATLAS provides its own in ATLAS/interfaces/lapack/C/src/.
What header files does ATLAS provide?
The official header file for the C interface to the BLAS is available
as ATLAS/include/cblas.h. The header file for the
C interface to LAPACK is ATLAS/include/clapack.h.
How can I get dynamic (.so) libraries rather than ATLAS's
default static libraries (.a)?
We have several users who have built ATLAS into .so, so it can be done.
However, we presently don't support it.
What's the best hardware for running ATLAS/what machine
do you recommend I buy for this kind of work?
This is another question that is pretty much impossible to answer
generally or keep up to date. To answer for Level 3 BLAS based algorithms,
as of release 3.2, the clear budget winner for
double precision FLOPS was the AMD Athlon. For single precision, SSE
(which is IEEE compliant, unlike 3DNow!) makes Intel attractive. For
raw performance, the 1.5Ghz P4 is fairly impressive. Clock for clock,
of course, there are a lot of classic RISC computers that do better
(IBM's Power series, Compaq/Dec's alphas, etc), but these machines
are usually a lot more expensive, and tend to lag behind in the Mhz
race.
Can I use ATLAS with CLAPACK?
Yes. CLAPACK gives you the option to compile CLAPACK to use the standard
C interface to the BLAS, which ATLAS provides. If you run CLAPACK's included
BLAS tester, be sure to turn off error-exit tests, since it can't properly
test the error exits returned by the CBLAS. ATLAS provides essentially the
same testers in ATLAS/interfaces/blas/C/testing, which do
correctly test the error exits, if that's important to you.
How well optimized are the various routines in
ATLAS?
All of the routines in ATLAS tend to be competitive with the machine-specific
versions for most known architectures. However, ATLAS is not just about
working well on known architectures, but also tries to be optimal for
unknown machines. When it comes to the generality of the optimizations
ATLAS uses, there is a definite heirarchy:
I need routine/architecture X optimized, can you do
it?
If you have a particular operation and/or architecture you really need
optimized, you may want to post a mention of that to
the ATLAS feature request tracker.
We don't do optimization on request, but when we have to choose the next
set of operations to support, user input can certainly influence things.
To maximize your chance of swaying us, you'll want to include what percentage
of your application time is spent in the particular operation, etc.
A quicker way to get action is to do it yourself. ATLAS is open source,
and The developer homepage
explains how you can use ATLAS to optimize various operations.
How is ATLAS funded?
The short answer to how ATLAS is funded is that most often, it's not!
I originally started ATLAS development when I worked at the Innovative Computer Laboratory at the University of Tennessee. I got enough of it working to convince Jack to give the development go-ahead on my own time. After that, ATLAS was written into a variety of grants, but was never funded (to my knowledge) solely on its own grant. I believe some of it's later development took place under the NSF grant "Linear Algebra Algorithms and Tools for Emerging Computing Environments and User Communities", Grant Number ACI-9813362.
Both Antoine and myself (the two full-time ATLAS researchers and developers) left ICL in 2001. After this date, we have normally worked on ATLAS pretty much unsupported, which has, of course, slowed development considerably. However, in 2003, Advanced Micro Devices funded a year of my graduate studies in return for some Opteron tuning. This has allowed me to spend quite a bit more time on ATLAS than previously.
Who provides infrastructure support?
Obviously, Sourceforge provides
the the ATLAS main page,
including CVS services, tracker, etc. Also
netlib provides
access for a large part of the mathematical community through
ATLAS's original homepage.
As far as machine access for tuning:
Who wrote/contributed to ATLAS?
Note that this question addresses package design and code contribution,
not money,
infrastructure or
testing.
R. Clint Whaley founded the
ATLAS project. After the initial release, he was joined on the project
by Antoine Petitet. Between
them, these two individuals are responsible for 95% of the code in ATLAS,
along with pretty much all of the design. That is not to say that others
have not made substantial contributions, however.
In particular, ATLAS has been designed to allow for outside contribution such that a user can provide only a very small kernel, and thus speed up large portions of the library. Many people have contributed in this manner, and this has resulted in extremely large performance improvements for ATLAS on certain architectures. These contributers (in alphabetic order), and a rough sketch of what they have done, are:
There are many places in the search where I could prune things back and
have no effect on performance on any known architecture, but since the
speed is adequate, additional search options are left on in case an unexpected
architectural change is found. I could also utilize more sophisticated
sampling techniques, but these would then need to be validated to work
on the vast array of machines (the present search having been tested
for over seven years, and on innumerable architectures). All this is to say
that speeding up the search is not a bad thing, it just is not that helpful
to the core usage of ATLAS, and so it is not worth the cost/risk of change
at this time. If additional tuning capabilities are added, so that the
search time becomes more critical, then of course the search will be
updated.
Adding a second search for users with greater system control is on my
(almost endless) To Do list. In this search, the cycle accurate walltimers
available on most modern architectures would be used instead of cpu time,
allowing us to sample smaller operations, and repeat them and take the
minimum to get better results. This greater accuracy of install results
would be helpful for machines without architectural defaults, and I hope
to provide it one day.
Empirical searches, when ran on real machines experiencing unrelated load,
are almost never strictly repeatable, even in the best of cases. The default
ATLAS search is far from the best case: the sampling and timing mechanisms
are crude, made to work on the lowest-common denominator setups, up to and
included embedded systems. So, when run in this mode, the search is designed
to give you a library that isn't bad, but is often far from the best.
To get better results (which are then saved as architectural defaults),
I usually run the search multiple times, and if necessary, intervene by
hand to probe promising transformations. Thus, the architectural
defaults can be thought of as a save of several installs + some user
intervention. Also, the architecural defaults are synergistic with the
default compiler flags, so you want to leave both alone for best results.
The first thing to check is that the results you are getting with your
install match those given elswhere (found for instance on the
atlas timing page,
one of the atlas lists, or via google). If you suspect your performance
is suboptimal, open up a support request and ask.
It is often better in these cases to install an older compiler than to do
your own search. If you do, however,
follow this directions.
There are three main factors why even true asymptotic speedup from large
blocking factors are a bad idea:
Note that points (2) & (3) are very important: GEMM is one of the most
studied performance kernels in the world not for its own sake, but due
to the wide variety of applications whose performance can be improved by
speeding it up. Thus, speeding up GEMM at the expense of application
performance is something that only someone interested in benchmarking
GEMM (as opposed to building a usuable library) would want to do.
Therefore, the ATLAS search limits NB to 80. We occasionally relax this
limit (manually, never blindly in the search) when it is absolutely necessary.
For instance, on SPARCs, large NB have proven necessary for decent performance,
and on the Pentium 4 (not P4E), the floating point unit does not make use of
the L1 cache, and so we block for the L2. However, in these cases we first
verified that the win is true and substantial, and we then hand-tuned the
cleanup to ameloriate the effects of large NB as best we could. Even so,
these systems can display very bad performance due to point (3) above, and
we actually do not use the best NB for GEMM performance even so, as we
increase it only large enough to get adequate asymptotic performance.
Without examining this tradeoffs, you should never increase NB, unless you
are tuning for a large GEMM benchmark.Can I vary the number of threads ATLAS uses dynamically?
No. The maximum number of threads to use is determined at compile time.
ATLAS will never use more than this, but may use less if the problem sizes
are too small to get speedup from the additional parallelism.What's the deal with the RHS in the row-major factorization/solves?
Most users are confused by the row major factorization and related solves.
The right-hand side vectors are probably the biggest source of confusion.
The RHS array does not represent a matrix in the mathematical sense, it is
instead a pasting together of the various RHS into one array for calling
convenience. As such, RHS vectors are always stored contiguously, regardless
of the row/col major that is chosen. This means that ldb/ldx is always
independent of NRHS, and dependant on N, regardless of the row/col major
setting.Why don't you speedup/improve ATLAS's search
There are several questions here, handled in their own sub-questions:
As you will see by reading each, only the last of these actually would be
helpful for ATLAS's main use, and it has stayed on the backburner for quite
some time because its almost always more useful to expand ATLAS's other
capabilities. Note that the majority of users should use the provided
architectural defaults, thus avoiding the search altogether. The search
is there only for exploration by the expert user (in a user-controlled
fashion), or to enable a naive user to get an adequate library on a
truely new architecture (in its fully automatic mode).
Why don't you improve the type of ATLAS's search?
There has been quite a bit of research on fast search techniques. ATLAS
uses a relaxed 1-D line search, where the `relaxed' comes from the fact
that interacting transforms are usually handled by restricted 2/3-D searches.
This is a very basic search technique, and many people wonder why a more
advanced algorithm, such as hill climbing, simulated annealing, or
genetic algorithm isn't used. The real answer is that it is overkill.
Because I understand the transformations ATLAS attempts, and how they
interact, I am able to target the relaxed line search appropriately.
More advanced techniques are more appropriate when you know do not understand
good start values for transforms and less about the
interactions between optimizations and how to resolve them. The modified line
search has some nice properties: it is easily guided by hand by the
expert user in order to expore spaces more fully, and it is easy to
understand and maintain.
Why don't you improve the speed of ATLAS's search?
I occasionally get suggestions on how to speedup ATLAS empirical search.
I know of a multitude of ways that I could do this. In my view, however, they
are not worth the effort/risk at the present time. Most users should use the
architectural defaults, skipping the search altogether. The only speed
criteria that went into
the search design was that it needed to be tolerable. The main purpose of
ATLAS is to provide an optimized library, and once the search could produce
that in a period of time O(1 day), that seems good enough. Many architectures
are much faster than that, of course.
Why don't you improve the accuracy of ATLAS's search?
This is the search problem that I am most tempted to fix. The present
search is mainly designed to be usable by an installer with no system
priveledges, who must install on stock systems that are experiencing
unrelated load during the installation. Thus, by default ATLAS uses
CPU-time for all non-threaded installation decisions, which is extremely
innaccurate. This often leads to the search going awry (i.e., failing
to find a more optimal kernel), which is why the architectural defaults
are so important.What's the deal with the architectural defaults?
I split this into several seperate questions:
Is using the architectural defaults important, rather
than doing my own search?
The short answer is definitely. As described elsewhere
the search is designed to be used only when architectural defaults are
unavailable or have become non-optimal due to compiler change. To understand
this, you need to understand the nature of empirical searches in general.When should I not use architectural defaults?
As previously mentioned, architectural defaults are usually the result
of several guided installations, and thus represent best of breed installs.
They can become a barrier to performance occasionally, particularly when
a compiler goes through a major release. For instance, if the architectural
defaults were for gcc 2.9, and you are presently using 3.2, it might be
possible that things have changed enough to require new defaults.If I don't use architectural defaults, how can I
get better performance?
First, make sure your defaults are better than the architectural defaults
by comparing the timings of a default install against your search install
(you can find some timings, including a table of percent of peak, at the
atlas timing page).
Play with the different compiler and flags to find things that better
match both the defaults, and your output flags. Be sure to do all the normal
post-install tuning, including tuning
CacheEdge.
Finally, if your install is indeed faster than the arch defaults,
report it.
Why does the search limit NB to 80?
The default ATLAS search limits GEMM's blocking factor to at most 80.
On systems where larger NB actually blocks for the L2, blocking for the
L2 prevents ATLAS from using it's multilevel blocking parameter,
CacheEdge.
In this case, larger blockings may result in superior kernel timings (which
do no L2 blocking), but if an L1-contained NB is used, similar or superior
performance may be obtained in full GEMM with a tuned CacheEdge. In this
case, the GEMM speedup is illusory, but the application and small-case gemm
slowdown (discussed below) is quite real. On machines with large L1, or
very fast L2, GEMM may indeed get a asymptotic speedup from larger blocking
factors, but it is still almost always a bad idea, as outlined below.