Saturday, November 20, 2010
Math and Computer Science
Ecology, Evolution, and Behavior
It has been a great privilege to spend time in all these departments, and we are tremendously grateful to all participants for the time they spent talking to us about risk, uncertainty and critical thresholds. I have learned a tremendous amount.
Last Wednesday, the RURS team had its second to last episode, visiting the statistics department on East Campus.
Participants expressed a strong orientation towards clients – clients provide the meaning, the thresholds relevant to the analysis. As a group statisticians value their neutrality. They see their role as making sense out of data, to provide clarity. One participant commented that “[t]he career of a statistician exists because of uncertainty.” However, participants were more comfortable describing random processes as variability rather than risk or uncertainty, and their role as quantifying variability. Participants also talked about quantified errors in two different ways as Type I error rate, and as the False Discovery Rate.
Risk is the expected value of a loss function. Risk is the probability of a Type I error, but “[I] would never use risk in a paper”. Risk is the probability of an adverse outcome precipitated by your actions. Risk is a probability but doesn’t have to be negative. Students in introductory statistics learn about “relative risk” – of two options both with some risk, which is the riskier risk?
One participant said that all of statistics is about reducing uncertainty, while another characterized it as assigning variation to different sources. Statistics is all about quantifying uncertainty, accepting it for what it is. Another participant described introductory statistics students as uncomfortable with uncertainty, while she is comfortable with uncertainty, with the rules not being clear. Certainty decreases with increasing familiarity with the discipline of statistics, as in life.
Another participant described two types of uncertainties – one type is driven by stochastic processes, and can be quantified with a probability distribution. In contrast, uncertainty about which model is appropriate is not quantifiable with a probability distribution.
An example of stochastic processes type of uncertainty: one has to use the parentage and genetic makeup of a Bull to predict the milk production of his daughters; this is not perfect, because each mating of a bull with a cow produces offspring that differ from one another. The uncertainty matters, because for each bull there are two types of risk: keep the bull when you shouldn’t, costing money unnecessarily, or castrating the bull and loosing access to that genetic potential.
Critical thresholds were expressed as the degree of confidence one has in a result – is it good enough to take action on? And this may vary between individuals – “It is different when talking about your surgery than my surgery.” The key piece is that critical thresholds reflect individual’s tolerance for risk and uncertainty. Statisticians need to extract these “thresholds” in the form of effect sizes in order to provide advice on sample sizes and experimental design. In the absence of thresholds, there is always the economic limit – “How many reps can you afford?”
An example offered was estimating variation in teacher performance; these point estimates come with an estimate of uncertainty as an interval. Communicating that interval to decision-makers is difficult and dealing with the population of teachers as a whole is different than making a personal decision about which teacher you want for your child.
Like Political Scientists, statisticians also did not discuss car accidents.
Friday, November 19, 2010
One thing that was clear, was that risk and uncertainty are concepts about future outcomes. Participants quickly made the distinction between risk and uncertainty with risk being probabilistic, while uncertainty describes a situation without calculable probabilities. Later, this distinction was broken down to some extent when a participant pointed out that uncertainty could be quantified with subjective probability. Another participant added that using subjective probability for uncertainty is a matter of analytical tractability - it makes the mathematical analysis possible.
Economics focuses on the study of marginal conditions, so they are always dealing with risk and uncertainty. One participant suggested that critical thresholds arise when these marginal approaches fail because of non-differentiability in the functions. Another threshold arises because of one's tolerance for risk; a determinant of what makes a threshold of that type critical is the stakes involved.
Another aspect of critical thresholds that generated some discussion was the idea that they are irreversible in some sense. Some actions permanently close off alternatives, and in neo-classical economics this can be captured with option value approaches. This can put a value on acquiring improved information, if by waiting better information will become available enabling a better choice to be made.
There was broad agreement that human beings individually are not behaving rationally - they may not have consistent risk preferences, or be able to use probability information to make decisions. For example, many highly educated and well paid people buy "product protection plans" from retailers for items that they can easily afford to replace.
Another idea was that risk can have a utility - entrepreneurs are those who can exploit that. The irony of tenured faculty making this observation was not lost on the participants.
Institutions, meaning a set of values, norms, ethics and traditions, evolve to deal with uncertainty. For example, the creation of water districts to deal with variability in water availability for farming.
One assertion of economics is that markets work better in the presence of information. However, information is not free. One participant commented that she is unwilling to drive across town to check the price of milk. In addition, one participant noted that order that information is presented, the way it is presented, and the amount of information affects the decisions that economic agents reach.
Complexity of information - five different areas of risk for a farmer, and each has a variety of instruments to deal with it. Making the best joint decision is far too complex, so individual farmers make the decisions independently. One participant suggested that a way to deal with such complexity is to manage very conservatively; farmers who can keep debt to asset ratios low have lower consequences of making an incorrect decision.
Another good example of how individual outcomes can vary from population level predictions is the observation that a farmer's career consists of 30-40 crops. An optimal decision based on theoretical probability distributions may never produce the right decision for a single farmer.
And finally, economists did use the risk of driving as an example of a bad outcome, and in particular people are more willing to take risks when they have a greater sense of control (driving vs. cancer).
Wednesday, November 17, 2010
Saturday, November 13, 2010
This week the RURS team visited Political Science – Sarah Michael’s home turf. The conversation was very deliberate, with participants expressing strong theoretical frameworks for both risk and uncertainty. In contrast, the idea of critical thresholds took a bit of getting used to – early on participants frequently expressed the idea that critical thresholds were irrelevant to Political Science as a discipline.
Participants quickly expressed a definition of risk as predictable – an objective probability you can calculate. In contrast, uncertainty is unpredictable – you can’t put a probability on it. This conceptual divide was connected to Frank Knight’s distinction between risk and uncertainty. You can’t put a number on Knightian uncertainty; or you might put a number on it, but it would be purely subjective.
Some participants expanded on the definition of risk by adding the notion of an adverse outcome, but this was rejected as a core part of the definition of risk. Notwithstanding this rejection, participants regularly conflated risk with danger throughout the session. A similar addition to the definition of uncertainty to include only events of large magnitude was largely accepted. This idea of uncertainty was connected to Nicolas Taleeb’s “Black Swans”.
The distinction between risk and uncertainty was expressed in a couple of additional ways. First, in a dynamic system, risk is white noise endogenous to the system, while uncertainty is an exogenous shock from outside the system. Second, while risk is a predicted probability of an event, participants indicated that there could be uncertainty about the predicted value. In a similar vein, while the probability of an event is risk, the exact outcome in any single case is uncertain.
This distinction between risk as a population concept and uncertainty about a particular outcome for an individual has practical applications. For example, government intervention to discourage unwanted behavior (e.g. Corruption) can be effective if it disconnects population-level risk from uncertainty to the individual (e.g. to distort their understanding of the risk of getting caught).
Another conceptualization of uncertainty was as ambiguity. The example offered was in political identity research – some people are more tolerant of fuzzy boundaries (i.e. ambiguity about identity) between groups than others.
Participants brought up Prospect theory – the idea that if there’s a potential loss people are more risk-averse than if there’s a potential gain. This notion of how people perceive risk came up later when participants pointed out that objective risk was usually different from perceived risk, and people act on perceived risk. The strongest statement of this idea: “Objective risk is functionally meaningless to people.” Participants extended this idea, acknowledging that people are not calculating their expected utility when they make decisions. As a result, one participant concluded that people live in a world dominated by uncertainty – a conclusion that was soundly rejected by the group. People do weigh outcomes when making decisions; for example, when you’re investing in the stock market you’re managing risk and hoping for uncertainty. The observation that people who play the lottery are not calculating expected payoffs, and tend to be poor was offered as a counter example; the counter-counter example was blackjack, which is played by wealthier people, although it still has a negative expected outcome. A key difference between lottery and blackjack is the number of decision points, which allows skill to play more of a role in blackjack.
One participant brought up an idea from Doug Norris, that specialization creates economic growth, but non-specialization is insurance against uncertainty. Thus, in order to create a functioning capitalist society, the government has to control uncertainty to the point where people are willing to make calculations that taking an action with unknown outcomes will be in their best interests – i.e. that they can calculate a risk.
This notion that risk and uncertainty are two broad domains in which individuals and institutions can find themselves lead to some discussion of critical thresholds. When uncertainty dominates risk (or visa versa) institutions change their behavior. The boundary where one domain dominates can be a critical threshold. For example, in the “chicken game”, you want to move your opponent from the domain of risk into the domain of uncertainty with respect to what you will do. One participant offered a concrete example: “… Kim Jong-il isn’t crazy, he’s just trying to make us uncertain.”
After expressing this initial idea of a critical threshold, several other examples were offered. In the context of a project to predict state failure, there was a threshold above which a state was predicted to fail. This threshold was set so as to equalize the number of Type I and II errors. A more theoretical example was the threshold of achieving equilibrium, or of the boundary between two equilibrium points in a dynamic system. In the context of foreign policy, national leaders can be either risk adverse or risk accepting, and they may switch in response to changing information. In the context of “norms diffusion”, participants identified “tipping points”, such as when enough nations ratify a treaty, then abruptly all nations will. Similarly in the context of political agenda setting, individuals may ignore an issue until the frequency of information passes a threshold that causes them to pay attention. Another term that came up in this context was “critical junctures”: a point where you can’t undo your action, such as the U.S. invasion of Iraq.
“…once you break it, you are going to own it.” – Colin Powell
Somewhat refreshingly, nobody mentioned car accidents.
As a doctor, Elliott focuses on individual patients, whereas, as a statistician, I've been trained to focus on the goal of accurately estimate treatment effects.It would seem that this idea has some legs, if not much in the way of actual work devoted to it.
Thursday, November 11, 2010
Well, according to Wikipedia, an experiment is "...the step in the scientific method that arbitrates between competing models or hypotheses." (Aside: it is interesting that there is a distinction made between model and hypothesis - for another time perhaps.) OK, I can't see anything wrong with that, but we need to dig a little bit deeper. There are two additional attributes that are important for distinguishing between methods of arbitrating among hypotheses: the number of simultaneous experimental treatments and the amount of replication within treatments.
The number of simultaneous experimental treatments is fairly obvious - how many different manipulations of the system under study are in use? This could range from one (an observational study of existing conditions) to many (a laboratory study with positive and negative control treatments and a dose response). Is the term "experiment" appropriate across this entire range?
The second attribute is the amount of replication within a treatment - in how many different places and times was the effect of the treatment observed? This too can range from one to many.
I believe that the term experiment is appropriate when the number of simultaneous experimental treatments is greater than one, regardless of how much replication is present. Replication does matter, but it doesn't affect the ability of the experimenter to determine causation. Rather it affects the scope of the causation - with only one replicate per treatment it is not possible to generalize beyond the set of objects studied. Within that set it is still possible to determine if a hypothesis is consistent with the data, and attribute the differences between treatment responses to the manipulation.
This attribution of causation is the reason for the treatments to be "simultaneous", because this reduces the extent of unmeasured differences between the observational units. Simultaneous has the usual temporal meaning, but also carries a certain spatial component. Clearly, two patches of grassland on different continents are unlikely to serve as reasonable replicates of each other - there are simply too many things changing. However, two grassland patches in the same ecoregion may well work for comparing different burning practices.
In AM, the idea of an experiment is to use the management action itself to create the experimental treatments, and in that case the desire to determine causation beyond the current set of objects (e.g. management sites) is less important than figuring out which management actions work the best. An experiment will figure that distinction out quicker than applying treatments sequentially to a single object, because the simultaneity of the treatments helps to reduce the number of alternative explanations.
While it is true that society often conducts large scale manipulations of ecosystems without simultaneous alternative treatments, I do not believe it is helpful to describe these manipulations as experiments. If we do, then everything is an experiment, and the word ceases to have any value, much like the word sustainability or indeed, adaptive management. An experiment with simultaneous treatments is not the only way to distinguish between competing hypotheses, but it is a very good way when it is possible.
Thursday, November 4, 2010
The RURS team hit the psychology department today, and a stimulating discussion was generated by the 18 participants, a mix of students and faculty. One interesting feature of this group was the presence of participants with professional responsibility – clinical psychologists – and this group had some very interesting things to say about risk.
Participants, both clinicians and others, inevitably discussed risk in the context making a decision such as whether to admit a client to hospital or send them home. A second kind of paradigmatic decision involved visual detection tasks, such as examining a radar screen or x-ray machine looking for bombs in luggage. In all cases they clearly identified two kinds of errors, false positives and false negatives, and associated different possible negative outcomes with each error. Participants also discussed critical thresholds in this decision making context: when an indicator of a risky behavior moves past a critical threshold, then a clinical action will be taken.
Exactly how such critical thresholds arise was a topic of some discussion. Data are often continuous, but for convenience are broken into categories for display and analysis. In some cases these arbitrary breaks become decision thresholds by default.
There was much discussion of “implicit cognition inaccessible to verbal processing”; decision making where people have trouble articulating how their decisions are reached. The opposite type of decision making involves consciously analyzing a set of steps to reach a conclusion. This distinction was reflected in discussion about the utility of information from the population scale versus the single individual scale.
Participants distinguished between population scale trends in the likelihood of an adverse outcome, estimated from actuarial data on large populations, versus the clinical setting where a single individual is being treated. Regardless of population level risk factors that are present in that patient, there is uncertainty about the particular outcome for that patient. For example, the best population level predictor of immediate suicide risk is a previous suicide attempt. However, that particular information does not rule out either outcome for the patient right now. Thus clinicians rely more on qualitative heuristics, i.e. implicit decision making, including the situational context for a patient. The context of the patient is important source of uncertainty, because many aspects of that context are unknown. This notion that the exact future outcome is not known was the dominant definition of uncertainty for this group.
This contrast between the population level and the individual level was expanded on in a discussion of how risk factors are used in diagnosis – “Not all risk factors are created equal.” For example, diagnosis of Attention Deficit Hyperactivity Disorder (ADHD) is done by examining a list of “neurological soft signs” that vary in their predictive ability at the population level. Simply adding up the number of these signs that are present and using that as the heuristic guide leads to over-prediction of ADHD, whereas only using a single strong predictor would lead to under-prediction of ADHD.
Participants described risk as two dimensional – the likelihood of an event and the magnitude of the adverse outcome. They also mentioned that adverse outcomes are difficult to define quantitatively, which creates uncertainty. A participant offered an anecdote illustrating these two dimensions – the tornado and the trick-or-treaters. A trained storm spotter was dispatched to examine a cloud on Halloween – a time of year at which tornadoes are rare. On arrival, the spotter noted “… a rotating wall cloud …”, an indicator that a tornado was possible, although it appeared weak. However, there was a nearby town with many trick-or-treaters out on the streets, so even though the likelihood of an event was low, the potential adverse outcome for even a small storm was great. The decision was made to activate the tornado alarms in the town.
Another repeated point was that the consequences of errors are a shift in critical thresholds, affecting the sensitivity and specificity of decision makers. For example, viewing a radar or x-ray screen for a long time reduces sensitivity, increasing false negative decisions. This can be mitigated by changing personnel regularly. A second example involved what happens during the training of security screeners. After a screener misses a simulated bomb, i.e. makes a false negative decision, their rate of false positive decisions increases – they overcompensate. This occurs even though the actual adverse consequence is very small. A poignant additional example with a larger adverse outcome was offered by a clinician – “You never forget your first suicide.”
Participants also made a distinction between risk to an individual patient, to the clinician, and to third parties. Suzie Q may be suicidal, which creates a risk to her, but if she is also homicidal then this creates a risk to third parties. This third party risk creates additional uncertainty, because who the third parties are is unknown to the clinician.
Risk to the clinician arises primarily from accountability – if the client injures themselves or someone else, is the clinician legally responsible? The concrete example offered involved a clinician treating a couple, and during the treatment it becomes clear to the clinician that domestic violence is an issue. The clinician is not legally obligated to report the domestic violence, and so is not accountable. However, if there is a child in the home, then there is a legal obligation to report the possibility of child abuse, creating a risk to the clinician if the potential is not reported.
Participants identified an additional trade-off between resource need and availability in the face of uncertainty – for example there are not enough hospital beds for everyone who meets a given level of homicidal tendencies. This was one area where participants agreed that population level data had a role to play, in figuring out whether resources allocated to particular needs were sufficient.
Some participants had studied the Anterior cingulate cortex, and found that pretty important and fascinating – although they didn’t expand on it. (That was one of those times when one is abruptly reminded that interdisciplinary work is hard and takes time!) Participants also raised the observation that the ability to perceive and act on risk is something that develops over time – teenagers are particularly bad at it – and in addition that studies show this ability to be variable among people, and genetically heritable.
Participants identified some additional sources of uncertainty arising from data. Measurement uncertainty arises because psychological instruments don’t measure underlying constructs exactly. Alternatively, relevant information on a risk factor or of a client’s context may be missing, creating additional uncertainty. In a slightly different context, there is a desire to be able to eliminate human judgment from risky decisions, for example by using functional brain imaging to detect if someone is being deceptive. This could create a false sense of security, which would be unjustified because of the inability of the instrumentation to attribute a given response to a particular cause in the subject – it is difficult to operationalize the assessment of risk.