Detectability Maximization

Detectability refers to an ability to correctly identify events that are nonchance regardless of how dramatic or subtle. An emphasis on the detectability or power of a test (the avoidance of misses) generates the opposite pressure to that of reliability (the avoidance of false alarms). The more extreme the criteria for judging that an event is really unlikely by chance, (e.g., only if you hear roaring and trees splintering) the less likely you will mistake chance for real effect (i.e., make a false alarm) and the more likely it is to occur again given the same causes. Or in other words the more reliable the event will be. Unfortunately it also makes it more likely that you will overlook subtle but none the less real effects (e.g., not run away when a bear is simply walking up to the camp quietly (miss)). If you decide to be extremely conservative in deciding when a discovery is present, it is likely that you will miss some that really are there, but that are only subtle. Your decision mechanism will have poor detectability.

It can be seen therefore that several things can be done to maximize your ability to realize when you have a research or therapeutic discovery when you actually have one, or to increase the detectability of your procedures. You may wish to review the logic underlying the declaration of a true effect in Chapter 4 IV. B. 1. b.

The stringency of the criterion for claiming an effect can be decreased (from D to C in the example). In this case, you would then be more likely to claim an effect when there actually was one (fewer misses). Unfortunately, however, you are also more likely to claim an effect when there is no real effect (more false alarms).

This is not an entirely desirable approach, but it does temper the tendency to make extremely stringent reliability requirements. In essence you can find more cures (fewer misses) by claiming that you found a cure for schizophrenia with less substantial results. But we would like to decrease both false alarms and misses.

Experimenter finesse is the term for your ability to devise an experiment which maximally illustrates the effect. This would mean that you are able to take advantage of special characteristics of a particularly apt subject; an apparatus which maximally facilitated the effect; a procedure which maximally reveals the effect; using independent variables which produced strong effects; or dependent variables which were very sensitive to the behavior. To exercise greater experimenter finesse in the camping example, string tin cans around the camp. Anything entering will make louder sounds. The signal (bear) will make much more noise than the wind in the trees.

Index the dependent variable over a wider time period. (e.g., whereas it may be unlikely that a single noise of a particular amplitude is indicative of danger, it is quite a different case if noises of that amplitude occur in a systematic pattern, like footsteps, in that case you should run. You would be able to detect a very quiet bear by sampling information over a wider time period.)

The following figure illustrates the effect of increasing the signal (bear) to noise (wind) ratio by increasing the magnitude of the signal (bear) by stringing cans around your camp, for example. If a bear is around, the quietist noise it can make is the sound of tin cans. If a bear makes a noise at all it tends to be very loud (amplify or multiple all scores in bear noise distribution by ten). Where you were only able to detect signals of size “thirty” or greater initially, you can detect signals of size three with the same number of false alarms, because every sound is multiplied by ten. What this does geometrically is to spread out the bear noises. Each point on the “bear curve” is moved to the right (or left) by a factor of ten. A noise that had been above the 3 on the x axis is moved to the right to 30. This gives the appearance of shifting the bear curve down.

Exert tighter technological control to reduce noise levels. In the camping story, flatten the woods and pave it over. You will then be able to easily hear a bear roar. Run pigeons in sealed unchanging chambers with few distractions. The variations in the behavior caused by chance or irrelevant factors would then drop to a very low level and the effects of very small treatment effects (signals) would be made obvious. If you wish to hear a quiet door bell at a party tell people making noise to be quieter.

The following figure illustrates the effect of increasing the signal (bear) to noise (wind) ratio by decreasing the magnitude of the noise (wind). Very few samples of ten consecutive chance noises get very loud (divide all the scores in noise distribution by some value). What this does geometrically is to squeeze in the noise distribution.

As you have seen statistical decisions are based on an explicit acceptance of the trade-off between better detectability (decreasing misses) and better reliability (decreasing false alarms). In statistics we had decided to minimize misses consistent with no more than 5 false alarm errors per 100. We accepted some errors of one type in order to attenuate errors of the other type.

A similar problem faces us when we make decisions with respect to our paradigm (i.e., choose a frame of reference within which we conceptualize our results). This choice governs the types of questions we will ask and what we accept as acceptable answers. There is a trade-off between the absence of serious research on phenomena (and/or their interpretation) which may really be correct but which are inconsistent with the existing paradigm, and the necessity of science to provide a credible theoretical framework for the phenomena on which it is focused. Consistency with the paradigm and paradigm revolution are the opposite poles of interparadigm controversies just as reliability and detectability are opposite poles of statistical decisions. Unfortunately, paradigmatic controversies are rarely presented in terms of how many false alarms we are willing to risk to decrease our misses by how much, even though the statistical analog of this problem is well thought through. For example, at what point is it worth serious effort to investigate extraterrestrial causation of ancient artifacts. It is certainly a possibility. What we must simultaneously consider is the gain if the extraterrestrial causation is true with respect to the cost if it is false, if we claim extraterrestrials did build the pyramids versus the gain if the extraterrestrial causation is false with respect to the loss if it is true if we claim humans built the pyramids. (Note that there is no disagreement concerning the existence of the pyramids just where they came from.)

Our paradigm or our theory could be wrong in two ways: 1) claiming that spacemen built the pyramids when humans built them, and 2) claiming that humans built them when spacemen really built them. Each view has its advantages and its problems and each has its likelihood to be true. There is a cost benefit ratio to the decision. We must weigh all aspects of all factors.

The issues involved can best be illustrated with several figures. They represent the factors underlying decisions and therefore are somewhat similar to those used in statistical decisions. However, their details are quite different. This first figure is intended to illustrate that: 1) only findings and their associated interpretation within the range of the paradigm's tolerance are accepted as plausible findings and acceptable interpretations; 2) Some facts may be inappropriately integrated within the existing paradigm (missed opportunity to advance to a better paradigm); 3) some facts could falsely be attributed to an alternate paradigm when they are better integrated within the existing paradigm (false alarm); and 4) some findings are so aberrant that we adjust our paradigm in order to accommodate them and we are correct in doing so.

Note that we presume that the existing paradigm is correct (temporally switching to the wind and bear example even though it is a strained example, we have total confidence that there are no bears). There are three aspects of this position: 1) all events are the result of the accepted paradigmatic laws (everything is wind); 2) there are no events which are the result of a different paradigm (there are no bears); and 3) the most deviant event possible is either the operation of the accepted paradigm in an unusual situation (strong wind) or is a mistake. This confidence in the paradigm is best illustrated by a magician. To the child, a new paradigm (magic causation) is required by the very abnormal event (coin disappearing). To the physicist, the event was by definition, an illusion and represented the operation of the accepted paradigm, just as well as any other observation. There can be no such thing as a disappearing coin.

Both misses and false alarms can happen. Regardless of the confidence in the current paradigm, it is still possible for it to be wrong. Misses are those findings which appeared to be well explained and integrated by the existing paradigm, but which are, in fact, clear illustrations of instances of a different paradigm. False alarms are those findings and completely different interpretations (which defines a new paradigm), but which are in fact simple examples of the current paradigm.

The preceding figure illustrates that as findings and their interpretation deviate from the normally accepted paradigm, the potential for gain if they are correct increases. However, their probability of being correct dramatically decreases. A choice of a paradigm can therefore be seen as somewhat like the choice between taking a single step toward your destination or jumping 100 yards in a direction which is most likely wrong but could be right. The paradigm specifies a range of views which optimize the gain/risk decision. Many small but correct steps gain more in the long run than a few big steps likely to be wrong.

The next three figures illustrate the factors underlying an actual choice of a paradigm. In order to illustrate the factors, a three dimensional figure would be needed. You must, therefore, combine the next figure which gives the front view (right to left) with the third figure which gives the side view (front to back).

Our paradigm is that perspective which minimizes complexity to the best of our knowledge. Paradigm shifts in the past have occurred whenever net complexity was reduced by so changing (e.g., Ptolemy to Copernicus). All attempted shifts to some other paradigm could be seen, therefore, as either decreasing the eventual best possible net complexity for a particular paradigm, or increasing net complexity of formulation. Those paradigm changes which reduce best net complexity are seen as good, those changes increasing net complexity are seen as foolish.

These next two figures illustrate the source of conflict among competing paradigms. If different paradigms handled findings more or less well but within each paradigm every conceivable finding was handled equally well then net complexity and local complexity would be the same thing and the following figure would provide the front to back perspective of the preceding figure. Imagine looking at the above figure from the left side, with the right side away from you.

But, the above figure does not represent how various paradigms actually handle various local findings. Each paradigm does not handle every finding equally well; net complexity is made up of many local phenomena and how the paradigm solves each of them varies. For example, burning down your house is a very simple and fast way to get rid of mice. That approach is extremely good at solving the mouse problem. However, you create a very large problem concerning where you are going to sleep at night. Solutions must be good in their net result. The following figure more correctly illustrates variations in the ability of various paradigms to handle both local complexity and net complexity.

As can be seen, the net or average complexity of A is greater than the existing paradigm and the net complexity of paradigm B is less than the existing paradigm. This is the case even though for phenomenon I paradigm A is less complex than the existing paradigm and for phenomenon II paradigm B is worse than even paradigm A. Paradigm A may offer the mirage of simplicity when it claims that space men built the pyramids. The good part is that it solves in a very simple manner, the origin of the pyramids. The problem is that the other increases in complexity that that notion brings. Is there any evidence at all that space men did it other than our inability to figure out how they did it? How did they get there? Why did they use wood and stone tools? Where did the space men come from? Are the artifacts on their home planet the result of other ancient astronauts, perhaps earth men? How do we explain the artifacts on the first planet in the universe? Where does it end?

The problem is how to gain by reducing our net false alarms without unacceptable increases in our net miss rate. Historically, those people who have been most concerned with eliminating false alarms at the risk of accepting a few misses (stick with existing paradigm even when it seems like it should be abandoned) have been dramatically more productive at both arriving at the truth and in helping mankind. Wild flashy theories like wild goose chases have just not paid off. Still we can think through those factors which govern the trade off. The following figure is an expansion of Figure X with normal science and paradigm shift provided in more detail.

Advancement in understanding nature within a particular paradigm can be conceptualized as movement toward less net complexity of formulation by moving down the columns of numbers which represent degree of “perfection.” Normal science can be seen as moving the implementation of a paradigm to its best possible implementation (movement from 1.00 to 0.00 down a vertical line which represents the reduction of net complexity within the paradigm). At any point a researcher can conduct normal science and better account for phenomena within the paradigm (i.e., movement from 1.00 toward 0.00) or the researcher can move to another paradigm such as paradigm shift to Paradigm A or to Paradigm B. For example, from "Existing Paradigm Position 0.60" a researcher could develop the following views: Paradigm A Position .40 (an increase in complexity); A 0.60 (no change in complexity); A 0.50 (a decrease in complexity); Existing Paradigm Position 0.70 (a decrease in complexity); B 0.70 (an increase in complexity); or to B 0.60 (no change in complexity); or finally to B 0.50 (a decrease in complexity).

Shifts to alternate paradigms are usually made because the researcher sees a local reduction in complexity (Point I in figure x). However, the chances are high that there will be an increase in net complexity. An additional problem is that science will advance by shifting to Paradigm A 0.50 but A at its best (0.40) will have more complexity than the Existing Paradigm at its best (0.30).

The problem of choosing to conduct normal science (E 0.60 to E 0.50) and when to be the leader of revolutionary science (E 0.60 to A or B 0.50 or 0.60) is more apparent than real. The articulation of a new paradigm (A or B) is extremely rare and universally rejected until the existing paradigm is at its maximum capacity (E 0.30) so that movement to a different paradigm is assured to be in the right direction (E 0.30 to B 0.20). Going from E 0.00 to A 0.50 appears to be a good move but is a dead end. Movement to paradigm A will not occur when the existing paradigm is at its limit. E 0.30 to A 0.40 will be an obvious increase in complexity.

Given these problems we can consider those factors which may increase our ability to know when to abandon the existing paradigm and accept a new paradigm (our detectability). That is, we can try to identify those totally anomalous but real phenomena or totally alien but correct interpretations which require a paradigm shift because they deviate more than is acceptable from the existing paradigm. If we can correctly spot them early in the game we are in a better position to leapfrog toward “total wisdom.”

One could hope that by decreasing the demand for paradigmatic consistency, we may increase the likelihood of arriving at a better paradigm, but this is both true and false. It will increase the probability of finding a better paradigm (correctly detecting new paradigms) but it will also increase the likelihood of arriving at a wrong one (false alarm). In point of fact, it is more likely to arrive at error. Because the number of ways to be wrong vastly outnumber the number of ways to be right, poorly thought through theories or sloppy research cannot be justified simply by claiming that they will decrease misses, even though to decrease misses we may need to be less stringent in our emphasis of factors which maximize theoretical consistency. The problem is that increasing consistency may decrease detectability but decreasing consistency does not necessarily increase detectability. Sloppy thinking or sloppy research is unlikely to discover a better paradigm than research with an extremely high criterion for consistency. This issue can be illustrated with the following figures. As can be seen, it is essentially the same as the one used to illustrate decision theory trade-offs.

Variations in the criterion for acceptable hypotheses varies the frequency of accepting false paradigms and missing correct paradigms. More stringent criteria are the result of demanding more consistency. Less stringent criteria attempt to increase detectability by decreasing the demand for consistency. It is a dilemma.

In general, increasing demands for reliability or consistency will increasingly exclude more phenomena and interpretations from consideration. Typically consistency demands are deliberate and with the intention to exclude questionable phenomena or interpretations. However the deliberate attempt at other times is to find phenomena rather than to assure that the finding is capable of supporting a large theoretical structure.

What we could do is to decrease the stringency of when to accept a new paradigm. We could be less demanding with respect to reliability. For example, we could demand less explicit definitions, less quantified descriptions, less multiple convergent evidence, etc. This would allow us to abandon our paradigm with less provocation (because we cannot make the paradigm explain our findings.) If we cannot figure out how the Egyptians drew straight lines, we can propose that space ships landed and constructed the pyramids.

The cost is an increase in theoretical false alarms in order to gain a decrease in the number of theories that we miss. We will more often propose erroneous paradigms which waste a lot of researcher's time and, in fact, life's work (including our own), but we will also find more new theories. This solution is obviously suboptimal.

What this does is to "decrease" the number of phenomena which the existing paradigm handles well by pointing out inconsistencies. Unfortunately, this is equivalent to increasing the heat under the "frying pan" in which someone sits. Without a better alternative to which to jump, the "fire" is as likely to be the result of the jump, as anything else.