Decisions based on survey results can often be reduced to a choice between ‘yes’ or ‘no’, such as determining whether or not a survey unit meets the release criterion. When viewed in this way, two types of incorrect decisions, or decision errors, are identified:
 Incorrectly deciding that the answer is ‘yes’ when the true answer is ‘no’, and
 Incorrectly deciding the answer is ‘no’ when the true answer is ‘yes’.
The distinctions between these two types of errors are important for two reasons:
 The consequences of making one type of error versus the other may be very different, and
 The methods for controlling these errors are different and involve tradeoffs.
For these reasons, the decision maker should specify levels for each type of decision error.
The purpose of this section is to specify the decision maker’s limits on decision errors, which are used to establish performance goals for the data collection design. The goal of the planning team is to develop a survey design that reduces the chance of making a decision error.
While the possibility of a decision error can never be totally eliminated, it can be controlled. To control the possibility of making decision errors, the planning team attempts to control uncertainty in the survey results caused by sampling design error and measurement error. Sampling design error may be controlled by collecting a large number of samples. Using more precise measurement techniques or field duplicate analyses can reduce measurement error. Better sampling designs can also be developed to collect data that more accurately and efficiently represent the parameter of interest. Every survey will use a slightly different method of controlling decision errors, depending on the largest source of error and the ease of reducing those error components.
The estimate of the standard deviation for the measurements performed in a survey unit (σs) includes the individual measurement uncertainty as well as the spatial and temporal variations captured by the survey design. For this reason, individual measurement uncertainties are not used during the final status survey data assessment. However, individual measurement uncertainties may be useful for determining an a priori estimate of σ_{s} during survey planning. Since a larger value of σ_{s} results in an increased number of measurements needed to demonstrate compliance during the final status survey, the decision maker may seek to reduce measurement uncertainty through various methods (e.g., different instrumentation). There are tradeoffs that should be considered during survey planning. For example, the costs associated with performing additional measurements with an inexpensive measurement system may be less than the costs associated with a measurement system with better sensitivity (i.e., lower measurement uncertainty, lower minimum detectable concentration). However, the more expensive measurement system with better sensitivity may reduce σ_{s} and the number of measurements used to demonstrate compliance to the point where it is more cost effective to use the more expensive measurement system. For surveys in the early stages of the Radiation Survey and Site Investigation Process, the measurement uncertainty and instrument sensitivity become even more important. During scoping, characterization, and remedial action support surveys, decisions about classification and remediation are made based on a limited number of measurements. When the measurement uncertainty or the instrument sensitivity values approach the value of the DCGL, it becomes more difficult to make these decisions. From an operational standpoint, when operators of a measurement system have an a priori understanding of the sensitivity and potential measurement uncertainties, they are able to recognize and respond to conditions that may warrant further investigation – e.g., changes in background radiation levels, the presence of areas of elevated activity, measurement system failure or degradation, etc.
The probability of making decision errors can be controlled by adopting a scientific approach, called hypothesis testing. In this approach, the survey results are used to select between one condition of the environment (the null hypothesis, H_{0}) and an alternative condition (the alternative hypothesis, H_{a}). The null hypothesis is treated like a baseline condition that is assumed to be true in the absence of strong evidence to the contrary. Acceptance or rejection of the null hypothesis depends upon whether or not the particular survey results are consistent with the hypothesis.
A decision error occurs when the decision maker rejects the null hypothesis when it is true, or accepts the null hypothesis when it is false. These two types of decision errors are classified as Type I and Type II decision errors, and can be represented by a table as shown in Table B.1.
A Type I decision error occurs when the null hypothesis is rejected when it is true, and is sometimes referred to as a false positive error. The probability of making a Type I decision error, or the level of significance, is denoted by alpha (α). Alpha reflects the amount of evidence the decision maker would like to see before abandoning the null hypothesis, and is also referred to as the size of the test.
A Type II decision error occurs when the null hypothesis is accepted when it is false. This is sometimes referred to as a false negative error. The probability of making a Type II decision error is denoted by beta (β). The term (1β) is the probability of rejecting the null hypothesis when it is false, and is also referred to as the power of the test.
.
H_{a} / The residual Activity in the Survey Unit Exceeds the Release Criteria  
DECISION  
Reject H_{0} (Meets Release Criterion) 
Accept H_{0} (Exceeds Release Criterion) 

TRUE CONDITION OF SURVEY UNIT 
Meets Release Criterion 
(No decision error)  Incorrectly Fail to Release Survey Unit (Type II) 
Exceeds Release Criterion 
Incorrectly Release Survey Unit (Type I) 
(No decision error) 
Table B1. Example representation of decision errors for a final status survey
.
There is a relationship between α and β that is used in developing a survey design. In general, increasing α decreases β and vice versa, holding all other variables constant. Increasing the number of measurements typically results in a decrease in both α and β. The number of measurements that will produce the desired values of α and β from the statistical test can be estimated from α, β, the DCGL_{W}, and the estimated variance of the distribution of the parameter of interest.
There are five activities associated with specifying limits on decision errors:
 Determining the possible range of the parameter of interest. Establish the range by estimating the likely upper and lower bounds based on professional judgement.
 Identifying the decision errors and choosing the null hypothesis.
 Define both types of decision errors (Type I and Type II) and establish the true condition of the survey unit for each decision error.
 Specify and evaluate the potential consequences of each decision error.
 Establish which decision error has more severe consequences near the action level. Consequences include health, ecological, political, social, and resource risks.
 Define the null hypothesis and the alternative hypothesis and assign the terms ‘Type I’ and ‘Type II’ to the appropriate decision error.
 Specifying a range of possible parameter values, a gray region, where the consequences of decision errors are relatively minor. It is necessary to specify a gray region because variability in the parameter of interest and unavoidable imprecision in the measurement system combine to produce variability in the data such that a decision may be ‘too close to call’ when the true but unknown value of the parameter of interest is very near the action level.
 Assigning probability limits to points above and below the gray region that reflect the probability for the occurrence of decision errors.
 Graphically representing the decision rule.
The expected outputs of this step are decision error rates based on the consequences of making an incorrect decision. Certain aspects of the site investigation process, such as the historical site assessment (HSA), are not so quantitative that numerical values for decision errors can be specified. Nevertheless, a ‘comfort region’ should be identified where the consequences of decision errors are relatively minor.
In the above section, ‘Development of a decision rule’, the parameter of interest was defined as the difference between the survey unit mean concentration of residual radioactivity and the reference area mean concentration in the 2sample case, or simply the survey unit mean concentration in the 1sample case. The possible range of values for the parameter of interest is determined based on existing information (such as the historical site assessment or previous surveys) and best professional judgement. The likely lower bound for f(δ) is either background or zero. For a final status survey when the residual radioactivity is expected to meet the release criterion and a conservative upper bound might be approximately three times DCGL_{W}.
Hypothesis testing is used to determine whether or not a statement concerning the parameter of interest should be verified. The statement about the parameter of interest is called the null hypothesis. The alternative hypothesis is the opposite of what is stated in the null hypothesis. The decision maker needs to choose between two courses of action, one associated with the null hypothesis and one associated with the alternative hypothesis.
To make a decision using hypothesis testing, a test statistic is compared to a critical value. The test statistic^{1} is a number calculated using data from the survey. The critical value of the test statistic defines a rejection region based on some assumptions about the true distribution of data in the survey unit. If the value of the test statistic falls within the rejection region, the null hypothesis is rejected. The decision rule, developed in this Appendix, is used to describe the relationship between the test statistic and the critical value.
EURSSEM considers two ways to state H_{0} for a final status survey. The primary consideration in most situations will be compliance with the release criterion. This is shown as Scenario A in Figure B.3. The null hypothesis is that the survey unit exceeds the release criterion. Using this statement of H_{0} means that significant evidence that the survey unit does not exceed the release criterion is required before the survey unit would be released.
In some situations, however, the primary consideration may be determining if any residual radioactivity at the site is distinguishable from background, shown as Scenario B in Figure B.4. In this manual, Scenario A is used as an illustration because it directly addresses the compliance issue and allows consideration of decision errors.
For Scenario A, the null hypothesis is that the survey unit does not meet the release criterion. A Type I decision error would result in the release of a survey unit containing residual radioactivity above the release criterion. The probability of making this error is α. Setting a high value for α would result in a higher risk that survey units that might be somewhat in excess of the release criterion would be passed as meeting the release criterion. Setting a low value for α would result in fewer survey units where the null hypothesis is rejected. However, the cost of setting a low value for α is either a higher value for β or an increased number of samples used to demonstrate compliance.
For Scenario A, the alternative hypothesis is that the survey unit does meet the release criterion. A Type II decision error would result in either unnecessary costs due to remediation of survey units that are truly below the release criterion or additional survey activities to demonstrate compliance. The probability of making a Type II error is β. Selecting a high value for β (low power) would result in a higher risk that survey units that actually meet the release criterion are subject to further investigation. Selecting a low value for β (high power) will minimize these investigations, but the tradeoff is either a higher value for α or an increased number of measurements used to demonstrate compliance. Setting acceptable values for α and β, as well as determining an appropriate gray region, is a crucial step in the DQO process.
.
SCENARIO A Assume as a null hypothesis that the survey unit exceeds the release criterion. This requires significant evidence that the residual radioactivity in the survey unit is less than the release criterion to reject the null hypothesis (and pass the survey unit). If the evidence is not significant at level α, the null hypothesis of a noncomplying survey unit is accepted (and the survey unit fails). 

HYPOTHESIS TEST H_{0}: Survey unit does not meet release criterion H_{a}: Survey unit does meet the release criterion 
. Survey unit passes if and only if the test statistic falls in the rejection region. 
This test directly addresses the compliance question. The mean shift for the survey unit must be significantly below the release criterion for the null hypothesis to be rejected. With this test, site owners face a tradeoff between additional sampling costs and unnecessary remediation costs. They may choose to increase the number of measurements in order to decrease the number of Type II decision errors (reduce the chance of remediating a clean survey unit for survey units at or near background levels. Distinguishability from background is not directly addressed. However, sample sizes may be selected to provide adequate power at or near background levels, hence ensuring that most survey units near background would pass. Additional analyses, such as point estimates and/or confidence intervals, may be used to address this question. A high percentage of survey units slightly below the release criterion may fail the release criterion, unless large numbers of measurements are used. This achieves a high degree of assurance that most survey units that are at or above the release criterion will not be improperly released. 
Figure B.3 Possible statement of the null hypothesis for the final status survey addressing the issue of compliance
.
In the EURSSEM framework, the gray region is always bounded from above by the DCGL corresponding to the release criterion. The lower bound of the gray region (LBGR) is selected during the DQO process along with the target values for α and β. The width of the gray region, equal to (DCGL – LBGR), is a parameter that is central to the nonparametric tests discussed in this manual. It is also referred to as the shift, Δ. The absolute size of the shift is actually of less importance than the relative shift Δ/σ, where σ is an estimate of the standard deviation of the measured values in the survey unit. The estimated standard deviation, σ, includes both the real spatial variability in the quantity being measured, and the precision of the chosen measurement method. The relative shift, Δ/σ, is an expression of the resolution of the measurements in units of measurement uncertainty. Expressed in this way, it is easy to see that relative shifts of less than one standard deviation, Δ/σ < 1, will be difficult to detect. On the other hand, relative shifts of more than three standard deviations, Δ/σ > 3, are generally easier to detect. The number of measurements that will be required to achieve given error rates, α and β, depends almost entirely on the value of Δ/σ (see Section 3.5).
Since small values of Δ/σ result in large numbers of samples, it is important to design for Δ/σ > 1 whenever possible. There are two obvious ways to increase Δ/σ. The first is to increase the width of the gray region by making LBGR small. Only Type II decision errors occur in the gray region. The disadvantage of making this gray region larger is that the probability of incorrectly failing to release a survey unit will increase. The target false negative rate β will be specified at lower residual radioactivity levels, i.e., a survey unit will generally have to be lower in residual radioactivity to have a high probability of being judged to meet the release criterion. The second way to increase Δ/σ is to make σ smaller. One way to make σ small is by having survey units that are relatively homogeneous in the amount of measured radioactivity. This is an important consideration in selecting survey units that have both relatively uniform levels of residual radioactivity and also have relatively uniform background radiation levels. Another way to make σ small is by using more precise measurement methods. The more precise methods might be more expensive, but this may be compensated for by the decrease in the number of required measurements. One example would be in using a radionuclide specific method rather than gross radioactivity measurements for residual radioactivity that does not appear in background. This would eliminate the variability in background from σ, and would also eliminate the need for reference area measurements.
.
SCENARIO B Assume as a null hypothesis that the survey unit is indistinguishable from background. This requires significant evidence that the survey unit residual radioactivity is greater than background to reject the null hypothesis (and fail the survey unit). If the evidence is not significant at level α, the null hypothesis of a clean survey unit is accepted (and the survey unit passes). 

HYPOTHESIS TEST H_{0}: Survey unit is indistinguishable from background H_{a}: Survey unit is distinguishable from background 
. Survey unit passes if and only if the test statistic falls in the rejection region. 
Distinguishability from background may be of primary importance to some stakeholders. The residual radioactivity in the survey unit must be significantly above background for the null hypothesis to be rejected. Compliance with the DCGLs is not directly addressed. However, the number of measurements may be selected to provide adequate power at or near the DCGL, hence ensuring that most survey units near the DCGL would not be improperly released. Additional analysis, based on point estimates and/or confidence intervals, is required to determine compliance if the null hypothesis is rejected by the test. A high percentage of survey units slightly below the release criterion will fail unless large numbers of measurements are used. This is necessary to achieve a high degree of assurance that for most sites at or above the release criterion the null hypothesis will fail to be improperly released. 
Figure B.4 Possible statement of the null hypothesis for the final status survey addressing the issue of indistinguishability from background.
.
The effect of changing the width of the gray region and/or changing the measurement variability on the estimated number of measurements (and cost) can be investigated using the Decision Error Feasibility Trials (DEFT) software developed by EPA [EPA1995]. This program can only give approximate sample sizes and costs since it assumes that the measurement data are normally distributed, that a Student’s t test will be used to evaluate the data, and that there is currently no provision for comparison to a reference area. Nevertheless, as a rough rule of thumb, the sample sizes calculated by DEFT are about 85% of those required by the onesample nonparametric tests recommended in this manual. This rule of thumb works better for large numbers of measurements than for smaller numbers of measurements, but can be very useful for estimating the relative impact on costs of decisions made during the planning process.
Generally, the design goal should be to achieve Δ/σ values between one and three. The number of samples needed rises dramatically when Δ/σ is smaller than one. Conversely, little is usually gained by making Δ/σ larger than about three. If Δ/σ is greater than three or four, one should take advantage of the measurement precision available by making the width of the gray region smaller. It is even more important, however, that overly optimistic estimates for σ be avoided. The consequence of taking fewer samples than are needed given the actual measurement variations will be unnecessary remediations (increased Type II decision errors).
Once the preliminary estimates of Δ and σ are available, target values for α and β can be selected. The values of α and β should reflect the risks involved in making Type I and Type II decision errors, respectively.
One consideration in setting the false positive rate are the health risks associated with releasing a survey unit that might actually contain residual radioactivity in excess of the DCGLW. If a survey unit did exceed the DCGLW, the first question that arises is ‘How much above the DCGLW is the residual radioactivity likely to be?’ The DEFT software can be used to evaluate this.
For example, if the DCGL_{W} is 100 Bq/kg (2.7 pCi/g), the LBGR is 50 Bq/kg (1.4 pCi/g), σ is 50 Bq/kg (1.4 pCi/g), α = 0.10 and β = 0.05, the DEFT calculations show that while a survey unit with residual radioactivity equal to the DCGL_{W} has a 10% chance of being released, a survey unit at a level of 115 Bq/kg (3.1 pCi/g) has less than a 5% chance of being released, a survey unit at a level of 165 Bq/kg (4.5 pCi/g) has virtually no chance of being released. However, a survey unit with a residual radioactivity level of 65 Bq/kg (1.8 pCi/g) will have about an 80% chance of being released and a survey unit with a residual radioactivity level of 80 Bq/kg (2.2 pCi/g) will only have about a 40% chance of being released. Therefore, it is important to examine the probability of deciding that the survey unit does not meet the release criterion over the entire range of possible residual radioactivity values, and not only at the boundaries of the gray region. Of course, the gray region can be made narrower, but at the cost of additional sampling. Since the equations governing the process are not linear, small changes can lead to substantial changes in survey costs.
As stated earlier, the values of α and β that are selected in the DQO process should reflect the risk involved in making a decision error. In setting values for α, the following are important considerations:
 In radiation protection practice, public health risk is modelled as a linear function of dose. Therefore a 10% change in dose, say from 15 to 16.5, results in a 10% change in risk. This situation is quite different from one in which there is a threshold. In the latter case, the risk associated with a decision error can be quite high, and low values of α should be selected. When the risk is linear, much higher values of α at the release criterion might be considered adequately protective when the survey design results in smaller decision error rates at doses or risks greater than the release criterion. False positives will tend to be balanced by false negatives across sites and survey units, resulting in approximately equal human health risks.
 The DCGL itself is not free of error. The dose or risk cannot be measured directly, and many assumptions are made in converting doses or risks to derived concentrations. To be adequately protective of public health, these models are generally designed to over predict the dose or risk. Unfortunately, it is difficult to quantify this. Nonetheless, it is probably safe to say that most models have uncertainty sufficiently large such that the true dose or risk delivered by residual radioactivity at the DCGL is very likely to be lower than the release criterion. This is an additional consideration for setting the value of α that could support the use of larger values in some situations. In this case, one would prospectively address, as part of the DQO process, the magnitude, significance, and potential consequences of decision errors at values above the release criterion. The assumptions made in any model used to predict DCGLs for a site should be examined carefully to determine if the use of site specific parameters results in large changes in the DCGLs, or whether a sitespecific model should be developed rather than designing a survey around DCGLs that may be too conservative.
 The risk of making the second type of decision error, β, is the risk of requiring additional remediation when a survey unit already meets the release criterion. Unlike the health risk, the cost associated with this type of error may be highly nonlinear. The costs will depend on whether the survey unit has already had remediation work performed on it, and the type of residual radioactivity present. There may be a threshold below which the remediation cost rises very rapidly. If so, a low value for β is appropriate at that threshold value. This is primarily an issue for survey units that have a substantial likelihood of falling at or above the gray region for residual radioactivity. For survey units that are very lightly contaminated, or have been so thoroughly remediated that any residual radioactivity is expected to be far below the DCGL, larger values of β may be appropriate especially if final status survey sampling costs are a concern. Again, it is important to examine the probability of deciding that the survey unit does not meet the release criterion over the entire range of possible residual radioactivity values, below as well as above the gray region.
 Lower decision error rates may be possible if alternative sampling and analysis techniques can be used that result in higher precision. The same might be achieved with moderate increases in sample sizes. These alternatives should be explored before accepting higher design error rates. However, in some circumstances, such as high background variations, lack of a radionuclide specific technique, and/or radionuclides that are very difficult and expensive to quantify, error rates that are lower than the uncertainties in the dose or risk estimates may be neither cost effective nor necessary for adequate radiation protection.
None of the above discussion is meant to suggest that under any circumstances a less than rigorous, thorough, and professional approach to final status surveys would be satisfactory. The decisions made and the rationale for making these decisions should be thoroughly documented.
For Class 1 survey units, the number of samples may be driven more by the need to detect small areas of elevated activity than by the requirements of the statistical tests. This in turn will depend primarily on the sensitivity of available scanning instrumentation, the size of the area of elevated activity, and the dose or risk model. A given concentration of residual radioactivity spread over a smaller area will, in general, result in a smaller dose or risk. Thus, the DCGL_{EMC} used for the elevated measurement comparison is usually larger than the DCGL_{W} used for the statistical test. In some cases, especially radionuclides that deliver dose or risk primarily via internal pathways, dose or risk is approximately proportional to inventory, and so the difference in the DCGLs is approximately proportional to the areas.
However, this may not be the case for radionuclides that deliver a significant portion of the dose or risk via external exposure. The exact relationship between the DCGL_{EMC} and the DCGL_{W} is a complicated function of the dose or risk modelling pathways, but area factors to relate the two DCGLs can be tabulated for most radionuclides, and sitespecific area factors can also be developed.
For many radionuclides, scanning instrumentation is readily available that is sensitive enough to detect residual radioactivity concentrations at the DCGL_{EMC} derived for the sampling grid of direct measurements used in the statistical tests. Where instrumentation of sufficient sensitivity (MDC, see Section 3.3.7) is not available, the number of samples in the survey unit can be increased until the area between sampling points is small enough (and the resulting area factor is large enough) that DCGL_{EMC} can be detected by scanning. For some radionuclides (e.g.,^3^H) the scanning sensitivity is so low that this process would never terminate – i.e., the number of samples required could increase without limit.
Thus, an important part of the DQO process is to determine the smallest size of an area of elevated activity that it is important to detect, Amin, and an acceptable level of risk, RA, that it may go undetected. The probability of sampling a circular area of size A with either a square or triangular sampling pattern is shown in Figure B.5.
Figure B.5 Geometric probability of sampling at least one point of an area of elevated activity as a function of sample density with either a square or triangular sampling pattern.
.
In this part of the DQO process, the concern is less with areas of elevated activity that are found than with providing adequate assurance that negative scanning results truly demonstrate the absence of such areas. In selecting acceptable values for AMIN and RA, maximum use of information from the historical site assessment and all surveys prior to the final status surveys should be used to determine what sort of areas of elevated activity could possibly exist, their potential size and shape, and how likely they are to exist. When the detection limit of the scanning technique is very large relative to the DCGLEMC, the number of measurements estimated to demonstrate compliance using the statistical tests may become unreasonably large. In this situation an evaluation of the survey objectives and considerations should be performed. These considerations may include the survey design and measurement methodology, exposure pathway modelling assumptions and parameter values used to determine the DCGLs, historical site assessment conclusions concerning source terms and radionuclide distributions, and the results of scoping and characterization surveys. In most cases the results of this evaluation is not expected to justify an unreasonably large number of measurements.
A convenient method for visualizing the decision rule is to graph the probability of deciding that the survey unit does not meet the release criterion, i.e., that the null hypothesis of Scenario A is accepted. An example of such a chart is shown in Figure B.6.
.
In this example α is 0.025 and β is 0.05, providing an expected power (1β) of 0.95 for the test. A second method for presenting the information is shown in Figure B.7. This figure shows the probability of making a decision error for possible values of the parameter of interest, and is referred to as an error chart. In both examples a gray region, where the consequences of decision errors are deemed to be relatively minor, is shown. These charts are used in the final step of the DQO Process, combined with the outputs from the previous steps, to produce an efficient and costeffective survey design. It is clear that setting acceptable values for α and β, as well as determining an appropriate gray region, is a crucial step in the DQO Process. Instructions for creating a prospective power curve, which can also be used to visualize the decision rule, are provided in Appendix E.
After the survey design is implemented, the expected values of α and β determined in this step are compared to the actual significance level and power of the statistical test based on the measurement results during the assessment phase of the data life cycle. This comparison is used to verify that the objectives of the survey have been achieved.
Due the basic hypothesis testing philosophy, the null hypothesis is generally specified in terms of the status quo (e.g., no change or action will take place if the null hypothesis is not rejected). Also, since the classical hypothesis testing approach exercises direct control over the Type I (false positive) error rate, this rate is generally associated with the error of most concern. In the case of the null hypothesis in which the residual radioactivity in the survey unit exceeds the release criterion, a Type I decision error would conclude that the residual activity was less than the release criterion when in fact it was above the release criterion. One difficulty, therefore, may be obtaining a consensus on which error should be of most concern (i.e., releasing a site where the residual activity exceeds the release criterion or failing to release a site where the residual activity is less than the release criterion). It is likely that the regulatory agency’s public healthbased protection viewpoint will differ from the viewpoint of the regulated party. The ideal approach is not only to define the null hypothesis in such a way that the Type I decision error protects human health and the environment but also in a way that encourages quality (high precision and accuracy) and minimizes expenditure of resources in situations where decisions are relatively “easy” (e.g., all observations are far below the threshold level of interest or DCGL).
.
To avoid excessive expense in performing measurements, compromises are sometimes necessary. For example, suppose that a significance level (α) of 0.05 is to be used. However, the affordable sample size may be expected to yield a test with power (β) of only 0.40 at some specified parameter value chosen to have practical significance. One possible compromise may be to relax the Type I decision error rate (α) and use a value of 0.10, 0.15, or even 0.20. By relaxing the Type I decision error rate, a higher power (i.e., a lower Type II decision error rate) can be achieved. An argument can be made that survey designs should be developed and number of measurements determined in such a way that both the Type I (α) and Type II (β) decision error rates are treated simultaneously and in a balanced manner (i.e., α = β = 0.15). This approach of treating the Type I and Type II decision error rates simultaneously is taken by the DQO Process. It is recommended that several different values for α and β be investigated before specific values are selected.
^{1} The test statistic is not necessarily identical to the parameter of interest, but is functionally related to it through the statistical analysis.