When a computer system is expensive to use or is not often available, one may want to tune software for it via analytical models that run on more common, less costly machines. In contrast, if the host system is readily available, the attraction of analytical models is far less. One instead employs the actual system, testing and tuning its software empirically. Two examples of code scalability testing illustrate how these approaches differ in objectives and costs, and, how they complement each other in usefulness. Concurrent computing requires scalable code [1.8, 12]. Successes of a parallel application often fuel demands that it handle an expanded range. It should do this without undue waste of additional system resources. Definitions of scalability will vary according to circumstances - when looking for speedup, problem size is fixed and the host system grows. In another case, one evaluates an enlarged problem together with a larger host . The discussion that follows assumes no particular scalability metric. As others have commented, We report our research results only in terms of execution times, leaving the choice of a scalability metric to the user . SLALOM - the Scalable Language- independent, Ames Laboratory, One-minute Measurement - is a code used here as a concrete example SLALOM ranks computer systems by the sccuracy they achieve on a realistic image rendering problem in radiosity . Accuracy is defined as geometry patches computed during a test, which SLALOM adjusts automatically to one minute of execution. By fixing time, SLALOM accommodates a very broad spectrum of host systems. SLALOM's original patch generation - used here - is O(N3), a non-linearity that makes interpreting distances between distinct patch ratings less intuitive. An O(N log N) patch generation improves comparisons between systems, however, this variant is not so easily ported to new systems. A sequel benchmark HINT, is linear in answer quality, memory usage and number of operations .