|
Reliability analysis |
|
Reliability analysis is an engineering discipline that applies various mathematical techniques to the measurement and prediction of the reliability of components and systems. The components under study may be mechanical, electronic, software, or other types. "Systems" could include anything from computers to rail transit. Measurements include failure rates, cumulative failures, and component lifetimes (time until failure). A variety of techniques are employed, drawn mainly from probability, statistics, and the theory of stochastic processes. A closely related subject is risk analysis, which includes methods for the assessment, characterization, and management of risk. "Risk" is generally taken to be the product of the probability of an event and the loss caused by the event (in financial or other measurable terms). Risks are often associated with failures of systems (including natural ecosystems), and thus the quantitative treatment of risk has much in common with reliability analysis. The mathematics of reliability analysis, even in terms of specific models, has broad applicability. A reliability model with failures and repairs to failed units may have the same form as a population model in which "failures" become deaths, and "repairs" become births. A probability distribution of survival times may be used in reliability analysis to model component lifetimes, in medical research to model patient survival in a treatment group, and by actuaries to compute insurance premiums. This generality qualifies reliability analysis to be included in that fuzzy set of things known as "systems theory." People all over the world are dependent on complex technological systems for things they take for granted in daily life. Other things being equal, increased complexity leads to reduced reliability. (Though other things are not always equal - there are systems, ranging from living organisms to the worldwide Internet, where new forms of organization have more than kept pace with increased complexity.) In most cases, failure of technology (automobiles, kitchen appliances,
etc.) is inconvenient but not dangerous. In other cases (aircraft, hospital equipment, etc.)
it can be life-threatening. And in a few cases, such as nuclear reactors, failure can be
catastrophic on a large scale. As we become more dependent on technology that is developed
in shorter design and test cycles, high-quality reliability analysis beccomes more important
to everyone.
Software reliabilityOutback Software is particularly interested in the reliability of software systems, or systems in which software is a significant component. Software, particularly in "embedded systems" that are part of everything from automobile engines to kitchen appliances, is increasingly pervasive, and its failure is increasingly consequential. Reliability models for mechanical or electrical components and systems implicitly rely on the existence of large, homogeneous universes of components for each type, and on the repeatability of experiments - e.g., running a set of ball bearings in some standard environment until they fail. They also typically assume that failures are due to either "wear-out" or random material failure. These assumptions are not valid for software. Failures are all essentially due to design and production problems, not wear-out or materials. Thus we are really concerned with the reliability of the process used in software development projects. Except for tiny "toy" projects, the universe of such projects is heterogeneous, and individual projects are non-repeatable. In addition, results from small projects don't scale well to larger projects. Thus software reliability analysis tends to be primarily qualitative, and quantitative results will seldom stand up to generalization. The fact, expressed above, that reliability models have broad applicability is perhaps an example of what Eugene Wigner called "the unreasonable effectiveness of mathematics." It is also, we think, an example of the fact that disparate physical processes in the world can be governed by isomorphic mechanisms and laws. Software failure, apparently, is not governed by the same mechanisms as processes like ball bearing failure - but it may be governed by other laws that have been studied in various fields, on the behavior of complex systems.
Though it's tempting to believe that chaotic dynamics, catastrophe
theory, or cellular automata will provide a quantitative lens for understanding software reliability, any
breakthrough, however small, is more likely to come from a new technical tool
for analysis and testing of programs. Meanwhile, managers will live with a
qualitative, "seat-of-the-pants" approach to wringing reliable software out
of quirky, unreliable human development teams.
Use of MathematicaWhere quantitative methods are appropriate, we are interested in the use of Mathematica as a tool. This Mathematica notebook provides some basic illustrations. If you do not have Mathematica, the notebook can be viewed with MathReader, a free reader for notebooks; it provides a read-only view of the notebook that is compatible with full-function Mathematica. You can also view the notebook in HTML format. (Mathematica has built-in capabilities for converting notebooks to a variety of print and display formats.) Typically, reliability engineers use a hodge-podge of tools: Statistical packages, special-purpose software tools, nomograms, probability plotting paper, tables, hand calculators, etc. Though many of these are optimal for one facet of the analysis, a general-purpose tool with mathematical depth, such as Mathematica, may be a superior approach for developing an integrated approach to reliability analysis - especially if Mathematica is supplemented with packages containing functions tailored for reliability analysis, along the lines of those developed in the notebook mentioned above.
This downloadable
Mathematica package contains many of the functions from the reliability
analysis notebook. To see how it's used, download this
Mathematica notebook, which tests the package functions.
LinksWhitepapers, etc. from Outback Software and business associates
General information, whitepapers, etc.
Risk analysis
Logistics, spares provisioning, etc.
Companies and organizations
Reliability analysis product downloads
CARMS reliability/availability model QuART (Quanterion Automated Reliability Toolkit) is available in free or commercial versions. MIL-STD, MIL-HDBK online search/archive (requires registration for first-time users) Rome Laboratory Reliability Engineer's Toolkit (from Quanternion) |