_________________________________________________________________ **NOTE** This article was posted on the TQM BBS for public download with the express permission of Joiner Associates Incorporated. But the text is copyrighted and may not be reproduced without permission. For permission, further information, or a list of other publications available from Joiner Associates, call or write: Susan E. Reynard Senior Editor Joiner Associates Incorporated 3800 Regent Street Madison Wisconsin 53705 Telephone: 608-238-8234, extension 232 Fax: 608-238-2908 ________________________________________________________________ [Editor's note: In this paper, Dr. Little uses several mathematical symbols not available in ascii format. The Greek letter lambda I have indicated by /\. Superscripts I have rendered by the phrase "to the power of," and subscripts by the use of "sub" before the number--e.g., sub-one, sub-two. --Tom Glenn] THINKING ABOUT SAFETY Kevin Little Abstract All large industrial enterprises (and many small ones, too) have had to gather and submit data on accidents according to the requirements of the Occupational Health and Safety Act of 1970. These data often show that reportable accidents are in statistical control when plotted on a suitable control chart. What are the implications for managers when such accidents exhibit stability in the control chart sense? This simple statistical perspective provides a basis for this paper. We can analyze both accident data and the ways organizations work to improve safety to illustrate several principles of modern management. Such an analysis when conducted within an organization can serve as a case-study for in-house process improvement training for managers. I review basic models of accidents from the statistical literature. Starting from the Poisson model and a basic generalization proposed first by M. Greenwood and G.U. Yule in 1920, I discuss what assumptions these models make about the circumstances of accidents. I also review the basic understanding of accidents from the industrial engineering literature and relate the engineering models to modern management principles, starting with H.W. Heinrich's domino model, first proposed in 1931. I summarize basic statistical and engineering approaches to accidents and safety. I relate these approaches to modern management principles and show how these approaches fit into the modern view of an organization as a system that can be improved. I outline how to build a case study for internal training, using an organization's safety data. I. Introduction What is an accident? We hear and see reports about events that capture widespread public attention, like the explosion of NASA's Challenger in 1986. Auto accidents in our home towns are reported regularly by the local press. Lacerations, sprains and eye injuries sustained in industrial enterprises are recorded in internal documents and reported to government agencies. What do these examples have in common? An unintended sequence of events leads to injury or death; patterns of work are disrupted. In general, accidents may be defined as "unplanned and uncontrolled events in which the action or reaction of an object, substance, person or radiation results in personal injury or the probability thereof" (Heinrich, Petersen and Roos 1980, 23). In this paper, I examine the nature of unintended sequences of events, especially those that lead to accidents and injuries. I review briefly the basic statistical and industrial engineering models that have been used in the past to describe or explain accidents. The next-to-last section outlines the implications for managers that we can derive from the general study of errors. The final section ties these implications to the study of accidents and injuries. II. The Occupational Safety and Health Art of 1970 The Occupational Safety and Health (OSH) Act of 1970 requires that: "All employers must furnish to employees employment and a place of employment free from recognized hazards that are causing or are likely to cause death or serious harm to employees. Employers must comply with occupational safety and health standards issued under the Act." [1] This requirement, known as the "General Duty Clause," places a general legal responsibility with the employer. Penalties for violation include fines. In the case of "willful violation resulting in death of an employee," the act provides for criminal penalties for managers held to be responsible. The OSH Act, administered by the Occupational Safety and Health Administration (OSHA), both reflects and promotes a basic change in industrial safety. [2] Traditionally, writers and practitioners emphasized the control of unsafe acts of employees. The changed approach emphasizes indirect causes (poor design of workplace or tools, poor match between worker and job, poor compliance with established safety standards, lack of management or worker skill). The OSH Act of 1970 puts the responsibility for safety squarely with management. The act emphasizes that managers are legally responsible for the safety of the systems that comprise an organization. How managers carry out their responsibility depends on their ability to analyze their business as a web of systems. It also depends on their ability to organize their employees to help the managers to study and improve these systems. The OSH Act also imposes uniform reporting standards on manufacturing companies as well as those in primary industries like mining and agriculture. I refer to the information in the standard reports in section seven. III. Basic Statistical Models for Accidents The familiar Poisson model provides the starting point for many statistical descriptions of accidents. There are two different ways to justify the form that this model takes. [3] Each way justifies a representation of "rare" events and leads to this formula: Pr(k accidents in a reporting period) = exp(-/\)/\to the power of k/k! This formula describes the Poisson distribution with parameter /\. The symbol /\ describes the theoretical average number of accidents we would see in a reporting period. By suitable scaling, we can transform this average number of accidents into a standardized rate of accidents. Of course, the Poisson distribution provides the basis for the c-chart. The simplest form of the c-chart exploits one of the remarkable properties of the Poisson distribution: the mean of the distribution, /\, is the same as the square of the standard deviation. An additive property of the Poisson distribution is useful in describing combinations of systems. If we have two systems that independently produce accidents according to the Poisson distribution (with parameters /\sub-one and /\sub-two, respectively) can we characterize the variation in accidents of an aggregate that is a combination of the two original systems? The variation in accidents of the aggregate is again given by the Poisson distribution, with parameter /\ =/\ sub-one + /\sub-two. There are two ways to apply the Poisson distribution to a study of accidents (or, in fact, to any kind of error). The first leads to the form illustrated by the c chart. We track the number of accidents for a particular organization (that has a consistent number of individuals at risk) over time. The second way looks at the organization across individuals (or parts of the organization) for a given time, ignoring patterns that may result over time. Such a study may be termed "cross-sectional." Here are the implications for a system described by a control chart that shows no signals of special causes: 1. We can make at least short-range predictions about the number or rate of accidents, based on the past. The awareness of accidents in the past does not affect the occurrence of accidents in the future. 2. If we wish to reduce the number or rate of accidents. we must work on the conditions that promote the accidents. All of the accidents are relevant to understanding the nature of the causes of accidents. We will be more effective in improving safety if we study common conditions across all the accidents than if we seek to distinguish why a particular accident is different from other accidents in the same series and from the same organization. The number of accidents and injuries suffered by employees will vary over time and will vary among employees. The basic Poisson model will not always describe this variation adequately. M. Greenwood and G. U. Yule (1920) were among the first to investigate how well the Poisson formula described injuries to workers. They examined data from a World War I munitions factory. They found that a common rate of accidents did not apply to all workers. Indeed, some workers were "accident-prone." [4] Greenwood and Yule worked out a simple generalization as they tried to characterize the frequency of accidents for an individual worker. They supposed that each individual's frequency of injuries could be described by the Poisson formula but that the value of /\ varied from person to person. To arrive at a description of the distribution of injuries that any individual worker might suffer, they essentially took an average over a set of different values of /\. [5] Greenwood and Yule conducted a cross-sectional study and found that the values of /\ varied from individual to individual. This variation was greater than we would expect if the individuals shared a common injury rate. (In terms of a c control chart, some individuals are outside the upper control limit derived from an average rate over all the workers.) Here are some implications for a cross-sectional study that shows Poisson variation for individuals and evidence that one rate will not account for each individual's variation: 1. We can increase our knowledge of causes of accidents by studying the differences between individuals with different rates. 2. If we wish to reduce the overall rate of accidents, we can work to reduce the accident rate of the "accident-prone" individuals or to identify the conditions unique to these individuals. Once we have determined that individuals have differences in rates of accidents beyond that predicted by the simple Poisson model, we must study to find deeper causes for these differences. [6] We will make poor progress if we merely identify the accident-prone individuals and remove them from their jobs. How will we improve the way we select new employees? How will we improve the way we design new jobs? We also can study job requirements and methods in order to reduce the impact of individual differences on the safety, quality and productivity of the work produced. Reports of industrial accident data usually show data aggregated over individuals and organizations. The aggregated data will show Poisson variation when the variation of the components is Poisson and independent of the other components. This is a consequence of the additive property of the Poisson distribution. To the extent that we monitor and react to aggregated data, however, we are in effect treating the aggregate as a single system. The implications listed for a system showing statistical control apply to the aggregate, when it shows statistical control: 1. We can make at least short-range predictions about the number or rate of accidents, based on the past. Awareness of accidents in the past does not affect the occurrence of accidents in the future. 2. If we wish to reduce the number or rate of accidents, we must work on the conditions that promote the accidents. All of the accidents are relevant to understanding the nature of the causes of accidents. We will be more effective in improving safety if we study common conditions across all the accidents than if we seek to distinguish why a particular accident is different from other accidents in the same series and from the same organization. In addition, when we study aggregated data, we can learn by making plots and tables that correspond to the components of the aggregate. We may find differences among components of the aggregate that are worth investigating. IV. A Basic Industrial Engineering Model H.W. Heinrich developed basic theories from industrial accident data that have shaped much of the subsequent industrial engineering work on accidents and injuries. Heinrich was among the first to point out that the conditions that lead to accidents and injuries are in fact those that lead to excessive costs in production and poor quality. He viewed as an axiom that "methods of most value in accident prevention are analogous with the methods required for the control of the quality, cost, and quantity of production." (Heinrich, Petersen and Roos 1980, 21.) While many industrial engineering models of occupational accidents and injury have been proposed over the years, [7] Heinrich's "domino model" has been the model most widely discussed and applied since the 1930s. The domino model proposes that any injury consists of a sequence of factors: 1. Ancestry and social environment 2. Fault of person 3. Unsafe act and/or mechanical or physical hazard 4. Accident 5. Injury "The occurrence of a preventable injury is the natural culmination of a series of events or circumstances, which invariably occur in a fixed and logical order. One is dependent on another and one follows because of another, thus constituting a sequence that may be compared with a row of dominoes placed on end and in such alignment in relation to one another that the fall of the first domino precipitates the fall of the entire row. An accident is merely one factor in the sequence. If this series is interrupted by the elimination of even one of the several factors that constitute it, the injury cannot possibly occur." (Heinrich, Petersen and Roos 1980, 23.) Heinrich's work persists in a variety of models subsequently proposed and in safety management applications. Heinrich proposed eliminating the third domino (unsafe acts and/or mechanical or physical hazard) as the way to prevent accidents. He also claimed that the unsafe acts of persons are the dominant source of accidents (up to 90% of accidents are so caused, in his view) Heinrich's work implies that we should train individuals to avoid unsafe acts. His coauthors Petersen and Roos in the 1980 edition of Industrial Accident Prevention repeat the emphasis on unsafe acts of individuals: It seems almost unbelievable that with the knowledge that people cause most accidents, knowledge that has been available since the early 1930s, so much time and effort since that time have been spent by industry with primary, often total, attention on physical conditions. It is even more unbelievable that in 1970, some 38 years after this knowledge was available, the United States would turn to a national approach based almost entirely upon the control of physical conditions: the Occupational Safety and Health Act (OSHA). Unbelievable or not, this is precisely what transpired. With almost universal belief in the principle that safety is primarily determined by people, the principle was almost totally rejected by the Congress, who chose to legislate a law based upon a totally opposite principle: that accidents are caused by conditions--by things. (Heinrich, Petersen and Roos 1980, 60.) It is important to view conditions as distinct from things, however. The conditions for which management are responsible do contribute to accidents and injuries, even if the precipitating act is human error as expressed in an unsafe act. In general, it is true that people primarily determine safety but we must remember the role of managers and not focus exclusively on line employees. Managers have the job to design, maintain and improve effective systems to produce goods and services. V. Studying Errors and Accidents Accidents are the result of an unintended sequence of events. The same can be said of any kind of error. (In this paper, I consider errors to be discrete mistakes or omissions, not observational error measured on a continuous scale.) In this section, I describe general problems with studying errors. I apply these general ideas to the study of accidents in the next section. We often assume that we can find a cause or set of causes for any error. We feel we can construct a plausible sequence of events, based on experience and technical knowledge, that will account for the error that we saw. Here are five problems with our attempts to explain an error: 1. Purposes that may have inherent conflicts; 2. Human memory; 3. Large number of potential causes; 4. Data sparsity; 5. Confusion arising from our experience in studying errors. First, we may have conflicts in our purposes. We may need to: a. determine precise circumstances to satisfy legal and insurance requirements. For example, we may need to establish evidence that will support or deny claims of negligence on the part of individuals or organizations. b. show that we care by acting decisively in the aftermath of an error, particularly those errors that have serious consequences. Errors that result in employee injuries or injuries to people outside the organization are among the errors that demand the attention of managers. c. understand the error in order to prevent its recurrence. Purpose (a) may require us to assign blame to individuals. Assigning blame usually makes it more difficult to get at all the information surrounding the problem. People will defend their own interests and may not volunteer help in studying the circumstances of the error. Thus purpose (a) conflicts with purpose (c). Conversely, the effort to understand the error may conflict with allocating blame decisively. Similarly, purpose (b) may lead us to ask for a quick answer rather than devote enough time to understand the circumstances of the error. Thus, purpose (b) conflicts with purpose (c). Conversely, working to understand the set of causes that generated the error may take time. Such a study may not give quick satisfaction to those demanding an official resolution. Second, our memory fades quickly, so our impression of the circumstances surrounding an error discovered today will be richer than our recollections of circumstances a week or a month ago. Memory unaided by records that include adequate detail, especially graphical records, may lead us to think that today's error is more unusual than it really is. Furthermore, designing records that capture adequate detail requires at least a rudimentary theory of the potential causes of the error. The frailty of unaided memory is compounded by the next problem. Third, many errors are caused by a union of circumstances. It is easy to construct reasons for an error because we have so many potential contributing causes. Yet, it is easy to stop before reaching deeply enough into the sea of causes because simplicity is beguiling. Rules of reasoning lead us to prefer simplicity in our theories of cause and effect--but there is a difference between simplicity as a guide in selecting the best current theory and simplicity in conducting the research that creates the theory. Fourth, if we see an error only rarely, we may think that there must be something special about the circumstances of the error. We feel we can learn something about the error by concentrating on the circumstances of the error. Fifth, we may feel we can consider every error as a special event because sometimes this is a successful perspective. We can build theories of cause-and-effect from our knowledge about how the world works and from answers to questions based on "why?" In the aftermath of an error, we can assemble experts and have them ask "why?" relentlessly. They can match the answers they obtain with other experiences. One reason we can learn about errors is that so many situations are unexamined. We do not usually spend enough time asking "how can we prevent errors?" before any have occurred. That is, we are content to use inadequately designed systems. Many times we can find easy answers. We can learn to prevent reoccurrence simply because no one had thoroughly studied the system before. So, the retrospective question "why did this error occur?" while not as good as the prospective question "why would errors occur?" is better than never asking "why?" at all. We face a subtle problem, however. There is a danger in asking "why?" only after an error occurs. We may focus on particular circumstances that are confounded with the occurrence of the error but need not be causally related. We can ignore similarities with other situations. We can focus on the particular circumstances of the error under study, especially if we feel that there must be something special about the error we observed. If the kind of error we study is rare, we are encouraged to think this way by the very fact that the error is rare. W. E. Deming summarizes the issue this way: "A fault in the interpretation of observations, seen everywhere, is to suppose that every event (defect, mistake, accident) is attributable to someone (usually the one nearest at hand), or is related to some special event. The fact is that most troubles with service and production lie in the system." (Deming 1986, p. 314.) VI. Improving Safety: Management Options in the Context of Continuous Improvement Myron Tribus has pointed out that you can get an important clue about the quality of an organization by studying how employees and managers treat safety. In judging an organization as a potential vendor, Tribus suggests that visitors look for adherence to safety procedures by all employees; this provides a sign of how well employees use standard methods. Visitors should also find out how employees are involved in studying and improving safety. As Heinrich pointed out 50 years ago, conditions that promote quality and safety match. OSHA requirements only reinforce management's role in promoting a safe work environment. For a company working to improve its operations continuously, managers and employees will have methods and tools to learn from accidents and other kinds of errors. They will be able to tackle the difficulties outlined in the preceding section. First, managers will understand that there may be conflicting requirements in the investigation of accidents. Nonetheless, managers will work with their employees to design and use safer methods to get the work done. The work to prevent accidents will be on-going and not driven by particular accidents. Second, as employees study conditions that lead to accidents, they will use simple plots and tables that will aid their analysis. Furthermore, employees will look for predictive indicators of accidents and injuries. These indicators will correspond to unsafe acts and unsafe conditions. Also, employees will know how to ask "why?" relentlessly; they will dig through several layers of causes. They will be able to describe their current theories about accidents in cause-and-effect diagrams or other displays. Such displays make visible the current best known theory about accidents in a work area. "Make it visible" holds for studies of safety as well as for studies of product quality. The official accident records reported to OSHA ought to show improvement over time but these data, by themselves, are too late and too limited to provide a basis for improving safety within a company. As managers and employees work on methods to eliminate or reduce accidents, they will be misled less often by circumstances confounded with particular accidents. They will be adept at using simple control charts to help them gauge how special particular errors or accidents are. The approaches just outlined seem simple but most organizations are not yet at the stage where all managers and employees can use the methods and tools of continuous improvement. What are some steps in the right direction? 1. Begin to use a statistical approach to think about accidents. If you are investigating an accident, you can ask some basic questions: * Could this accident have involved people other than those involved? * Could this accident have happened yesterday instead of today? Tomorrow instead of today? * Could this accident have happened in a different place than where it occurred? [8] If most of your answers are "yes" then you have some evidence that you are better off thinking about this accident in terms of a set of causes common to the workplace rather than treating the accident as a unique case. 2. Recognize that employee safety programs that promote contests for no accidents are likely to do more harm than good. They may lead employees not to report minor injuries that could be signs of more serious trouble. To the extent that accidents are generated by a system of causes common to all employees, such contests are very much like lotteries. Such programs rarely promote direct work on conditions that would improve safety. They concentrate on improving the results (accidents) without work on methods to reduce unsafe acts and conditions. 3. Recognize that relying primarily on employee awareness to reduce accidents is a weak method. We will have better results if we improve designs of our tools and methods so that dangers are eliminated or prevented at the source. This advice matches that of the National Safety Council in their Accident Prevention Manual for Industrial Operations. [9] 4. Help teams to incorporate safety into work methods as you involve employees in creating and maintaining standard work methods. VII. Building a Case Study for In-house Training To help managers and employees learn the ideas and methods of continuous improvement, in-house training in statistical methods and thinking can use accident data and the form of the current safety program in a case study. There are two aspects of a case study that can be developed: 1. An exercise that uses the injuries and illnesses reported to OSHA on OSHA form 200 over several years. You will need monthly figures and the total number of hours worked each month in order to standardize for employment level. 2. An exercise that studies the current approach to safety within the company. You will need memos and other documents that describe current safety programs. Exercise 1. This exercise emphasizes the use and limitations of the OSHA data. The intent of the exercise is to review these basic ideas and techniques: a. operational definitions and the purpose of the data b. statistical control c. aggregation and stratification. The instructor should understand the issues involved and draft reasonable answers to questions like those that follow. Individuals or small groups can study the questions and then present their conclusions to each other, guided by the instructor. Operational definitions and data quality: * What determines whether or not an injury gets reported? * Do you think the official definitions of injuries and illnesses are equivalent to operational definitions? * How could you determine if two supervisors agreed in practice on recording injuries? * Are there circumstances that make you think the data do not reflect the actual incidence of reportable injuries? * How could you use the official numbers in efforts to improve safety? * What other data would you seek to improve safety within your organization? Statistical Control: * What would be a suitable control chart for these data? * Under what circumstances would a c-chart be appropriate? * What about a u-chart? * Is there any a priori reason to suppose that the underlying Poisson model is not appropriate? * Construct a suitable control chart and examine it for signals of special causes of variation. Aggregation and stratification: * Under what conditions is it appropriate to study the aggregated figures? * What are some consequences of looking only at aggregated figures? * How could the overall accident rate for shop workers and office workers together show improvement yet the rates of shop workers and office workers separately show deterioration? Exercise 2. This exercise reviews the current approaches to safety. Again, the instructor should prepare appropriate answers and then help individuals and small groups review. To what extent do the current approaches: * Encourage managers and line employees to study the conditions that cause accidents and injuries? * Differentiate between accidents and injuries that are the result of causes common to a group of employees and those that are the result of special causes? * Involve managers and line employees in incorporating safety into current standard methods? * Help managers and employees apply basic tools like control charts and cause-and-effect diagrams to reduce the possibility of accidents and injuries? * Focus on methods to reduce accidents and injuries (other than employee awareness)? * Indicate that the executives and managers who have approved the current approach understand special and common causes of variation and suitable methods to improve safety? VIII. Conclusions What responsibility do managers have for the errors generated by their company? Accidents that lead to employee injuries are just a particular kind of error. Yet the OSH Act of 1970 implies that managers have the legal responsibility to understand and prevent injuries. The methods and ideas of continuous improvement (as outlined by W.E. Deming, for example) provide managers with an effective way to fulfill these responsibilities. Heinrich's insight that the conditions that promote safety match those that promote quality and productivity is as true today as in 1930. We can update his view by an understanding of special and common causes of variation. Then we are able to use accident prevention as a model for managerial actions for all kinds of errors. The exercises sketched in the last section serve as a basis for investigating other classes of errors. ____________________________________ IX. Endnotes 1. OSHA notice 2203, "Job Safety and Health Protection." 1985. 2. Miller (1982), 6.14.5 3. Feller (1968) 153-164 discusses the two derivations of the Poisson formula in relatively non-mathematical language. He also presents several interesting applications. 4. Greenwood and Yule's investigation inspired research in Great Britain by the Industrial Health Research Board; four papers, published between 1926 and 1939, investigated psychological aspects that contributed to accidents. The term "accident proneness" first appeared in these reports. See Chambers and Yule (1941). 5. The technical details of their work are interesting. They derive a plausible statistical model, reasoning from data and mathematics. They use a Gamma distribution to represent the distribution of l and then integrate out l to obtain a distribution that has the form of the negative binomial distribution. The negative binomial distribution also arises when one uses a certain model of contagion, developed by Polya; note that Greenwood and Yule's derivation rests on statistical independence while Polya's derivation explicitly incorporates dependence. See Feller (1971) 57-58. 6. It is possible to use techniques related to regression analysis to represent cause-and-effect knowledge. Such techniques could be used to study systems that generate accidents according to the Poisson formula. Cox (1970) and McCullagh and Nelder (1983) are basic references that describe the techniques known as logistic regression and log-linear models. McCullagh and Nelder give an analysis of ship accidents that uses several factors to account for the observed variation. 7. See Chapter 2 of Heinrich, Petersen and Roos (1980) for a survey of several classes of models in addition to variants of the domino theory invented by Heinrich. 8. I first heard this sequence of questions expressed by Dr. Gipsie Ranney. 9. "The basic measures for preventing accidental injury in order of effectiveness and preference are: i. eliminate the hazard from machine, method, material, or plant structure ii. control the hazard by enclosing or guarding it at its source iii. train personnel to be aware of hazard and follow safe job procedures to avoid it iv. prescribe personal protective equipment for personnel to shield them against the hazard." (cited by Miller (1982), 6.14.17). This recommendation matches modern approaches to product quality that emphasize designing in quality rather than relying on inspection ("awareness"). X. Bibliography Chambers, E.G. and G.U. Yule. Theory and observation in the investigation of accident causation. Supplement to the Journal of the Royal Statistical Society. 7: 89-101. (1941). Cox, David R. Analysis of Binary Data. London: Chapman and Hall. 1970. Deming, W. Edwards. Out of the Crisis. Cambridge: MIT Center for Advanced Engineering Studies. 1986. Feller, William. Introduction to Probability Theory and Its Applications, Vol. 1. New York: John Wiley. 1968. Introduction to Probability Theory and Its Applications, Vol. 11. New York: John Wiley. 1971. Greenwood, M. and G.U. Yule. "An inquiry into the nature of frequency distributions." Journal of the Royal Statistical Society. 83: 255-279. 1920. Heinrich, H.W., D. Petersen, N. Roos. Industrial Accident Prevention: A Safety Management Approach, 5th ed. New York: McGraw-Hill. 1980. McCullagh, P. and J.A. Nelder. Generalized Linear Models. London: Chapman and Hall. 1983. Miller, J.M. "The management of occupational safety" in The Handbook of Industrial Engineering, ed. G. Salvendy. Chapter 6.14. New York: John Wiley. 1982 Tribus, Myron. "Judging the Quality of an Organization by Direct Observation." Typescript. U.S. Department of Labor. A brief guide to recordkeeping requirements for occupational injuries and illnesses. 1986. About the Author Kevin Little is a consultant and statistician with Joiner Associates Incorporated, working with managers on applying statistical thinking in their jobs. He earned his M.S. and Ph.D. degrees in statistics from the University of Wisconsin-Madison.