Survival analysis is a topic capturing the interest of many, especially since the turn of the millenium. This essay summarizes a survival function; a hazard function along with various issues and concepts associated with endeavors to estimate the probability of survival beyond some point in time. Many studies assume or confirm that survival time follows an exponential probability distribution. In contrast to linear functions between two variables, the probability of survival beyond a specific event will increase or decrease at an increasing or decreasing rate over time. One way to begin thinking about this topic is to consider that the probability of survival or living beyond age 50 decreases at an increasing rate over time. Analysts can then examine factors that relate to such a finding in their attempt to improve longevity and perhaps the quality of life after a certain age. As a thumbnail sketch of survival analysis and its models, this essay covers a bank loan and vehicle replacement decisions as two simplistic applications of a topic that can be quite complex.
Keywords Exponential Probability Distribution; Hazard Function; Longevity; Survival Analysis; Survival Function; Survival Time
Actuarial Science: Survival ModelsOverview
This essay covers many basics of survival analysis. It introduces some terminology and offers a couple applications from the areas of gambling, finance, and engineering. From a historical viewpoint, survival analysis drew heavily from statistical concepts such as probability distributions, confidence intervals, estimation, and hypothesis testing. Some readers may recall these concepts from their coursework in statistics. Survival analysis can be a very complex topic even for those with a firm understanding of Bayesian and other statistical methods. In simple terms, survival analysis is a method for estimating the probability of survival beyond some specific point in time.
This essay gets into some specifics in the pages ahead, but readers should know at the onset that survival analysis usually entails the following procedures: Establishing the baseline form of the hazard function using a sample set of data; describing the differences that distinguish surviving entities from non-surviving entities resident in that sample; generalizing, if appropriate, the results to a larger group; and, examining, if appropriate, the effect of factors on survival probability. Initiation of these steps requires consideration of the underlying hazard form. Survival time may have an exponential distribution meaning that the rate of survival or failure increases or decreases over time. Studies also vary widely in their levels of sophistication with respect to the underlying form of the survival time distribution. For example, researchers at the onset of their study may simply form a convenient assumption about the temporal nature of that probability. Better yet, they may set a course toward confirming whether the nature of the probability of survival is constant, increasing, or decreasing over time. Before getting any deeper into the topic by introducing concepts such as survival and hazard functions, a need exists to pause and consider some scholarly advice.The Martingale Concept
Oakes (2004) asserts that survival analysis is most understandable through the martingale concept. Consider an example from the subject of gambling. As most of us are aware, the first wager usually fails to produce the winnings we sought. There are a number of methods available for gamblers to cope with that initial failure. Some may look at it as a one-time expenditure of effort and currency, but others may take an extended view on the endeavor. In terms of the latter approach, the martingale is an attempt to recover recent losses by doubling the dollar amount of each successive wager.
Let us look at disaggregating that process by examining the various time segments represented in the case of this martingale. Limiting our focus to the initial wager, accepting its occurrence without any data on its actual outcome, there are some key parts to the sequence: The period before that wager; the wager event; and, the period after that wager. One approach using survival analysis could begin by marking an initial wager placement as a point in time or a date that separates the past from the future for purposes of the analysis. Let us suppose that a researcher decides to conduct a study of all gamblers and all their bets made over the course of one year since that initial event.
Notice the exclusion of the period prior to the wager event. In terms of its statistical property, whether gamblers accept it or not, the probability of winning or losing on the next wager is largely independent of the outcome seen from a previous wager. Note also that the endpoint for study has no significance other than it exists by virtue of a research design specification. One might imagine the reason for the one-year duration is an attempt by a researcher to replicate a prior study and/or follow the lead of other researchers. Whatever the reason, the researcher in this hypothetical project initiates the data collection phase. S/he begins to record all the bets made by a specific group of gamblers and their respective outcomes (win or lose) since the date of that original event in addition to data on other variables that suit study purposes.
A highly important variable is that created to record whether a gambler continued or ceased to place wagers during the study period. It makes some sense to classify as survivors those gamblers who continued to place wagers and those who ceased to place wagers as non-survivors. Suppose for a moment the researchers conducted an initial descriptive type of analysis to discern whether the survivors beat the odds for the game they were playing though they had second thoughts shortly after.
In their first analytic pass, they estimated the proportion of bets that had favorable or unfavorable outcomes. In doing so, they found that 40 percent of the survivor's bets were winners. As one can imagine, they become very interested in the result suggesting the actual win rate is much better the widely published payout ratio. Shortly afterwards, they came to realize a major flaw in their previous thought processes. Specifically, it is inappropriate from a statistical or scientific perspective to draw certain conclusions from the aforementioned comparison. For starters, there is a major difference between a frequency distribution and a probability distribution; for example, the former simply captures observations from one sample whereas the latter reflects expectations taking a much larger number of samples into account.
Perhaps those readers who completed a statistics course will recall the differences between a descriptive method of statistical analysis and an inferential method of statistical analysis. Without going into a great amount of detail here and now, the latter utilizes probability distributions and permits analysts to generalize the findings from a sample to the larger population. In other words, the results infer something about the larger population from the smaller sample. Anyone who reads about survival analysis is likely to find studies that use one or both types of analysis.The Survival Function
Survival analysis is applicable to many types of decisions and contexts. Usually, the analysis entails the use of dichotomous terms such as survival or failure, gain or loss, stay or leave, and so forth. Those features inform us that survival analysis involves classifying situations into two outcomes with a primary emphasis on the group of survivors. Taking a simple approach to a complex topic, this essay draws on examples such motor vehicle replacements (Chen & Lin, 2006) and loan application decisions (Morrison, 2004). The reader will soon recognize the value of its applications to inanimate objects and to living beings.
In terms of human or animal life, the survival function expresses the probability of the actual time of death occurring beyond some expected or specific point in time. As the name of the function implies, the group of interest is the survivors or those who are living longer than expected or after some specific date. With the availability of two sets of data and the application of statistical analysis procedures to them, a profile will eventually emerge that will allow analysts to compare and contrast survivors and non-survivors on a set of defining characteristics or features. For example, researchers may be vigilant in their attempts to delineate how an independent time-related variable such as birth year influences a dependent variable such as the probability of living past a presumed age of 110.
If and when they review the literature and studies related to survival analysis, readers will find references to those types of variables and a number of statistical techniques. These references warrant brief mention here. Regression analysis is a statistical procedure for analyzing the nature of a relationship between an independent variable and a dependent variable. In most instances, the dependent variable is a consequence of the independent variable; for example, income is partially determined by education level. Data on these variables are available in many forms such as continuous (which typically covers all the possible numeric values) and non-continuous.Data Management & Analysis
Survival analysis entails the applications of...
(The entire section is 4147 words.)