**Contents**

3.3.9.1 Introduction

3.3.9.2 Data quality indicators

3.3.9.3 Precision

3.3.9.4 Bias

3.3.9.4.1 Bias assessments for radio-analytical measurements

3.3.9.4.2 Scanning and direct measurements

3.3.9.5 Accuracy

3.3.9.6 Representativeness

3.3.9.7 Comparability

3.3.9.8 Completeness

3.3.9.9 Other sources of uncertainty

3.3.9.10 Uncertainty introduced by the applied statistical method(s)

3.3.9.11 Uncertainty in data interpretation

3.3.9.12 Number of quality control measurements

3.3.9.13 Controlling sources of error

#### 3.3.9.1 Introduction

Site surveys should be performed in a manner that ensures results are accurate and sources of uncertainty are identified and controlled. This is especially the case for final status surveys that are vital to demonstrating a facility satisfies pre-established release criteria. Quality control (QC) and quality assurance (QA) are initiated at the start of a project and integrated into all surveys as data quality objectives (DQOs) are developed. This carries over to the writing of a Quality Assurance Project Plan (QAPP), which applies to each aspect of a survey (see Section 2.13). Data quality is routinely a concern throughout the environmental remediation process, and one should recognize that QA/QC procedures will change as data are collected and analyzed, and as DQOs become more rigorous for the different types of surveys that lead up to a final status survey.

In general, surveys should be performed by trained individuals and should be conducted with approved written procedures and properly calibrated instruments that are sensitive to the suspected contaminant(s) present. However, even the best approaches for properly performing measurements and acquiring accurate data need to consider quality control activities. QC activities are necessary to obtain additional quantitative information to demonstrate that measurement results have the required precision and are sufficiently free of errors to accurately represent the site being investigated. The following two questions are the main focus of the rationale for the assessment of errors in environmental data collection activities:

- How many and what type of measurements are required to assess the quality of data from an environmental survey?
- How can the information from the quality assessment measurements be used to identify and control sources of error and uncertainties in the measurement process?

These questions are introduced as part of guidance that also includes an example to illustrate the planning process for determining a reasonable number of quality control (QC) measurements. This guidance also demonstrates how the information from the process may be used to document the quality of the measurement data. This process was developed in terms of soil samples collected in the field and then sent to a laboratory for analysis. For EURSSEM, these questions may be asked in relation to measurements of surface soils and building surfaces both of which include sampling, scanning, and direct measurements.

Quality control may be thought of in three parts:

- Determining the type of QC samples needed to detect precision or bias;
- Determining the number of samples as part of the survey design; and
- Scheduling sample collections throughout the survey process to identify and control sources of error and uncertainties.

Overall, survey activities associated with EURSSEM include obtaining the additional information related to QA of both field and laboratory activities.

The following factors should be considered when evaluating sources of bias, error, and uncertainty. Cross contamination is an added factor to consider for each of the following items:

Sample collection methods;

- Handling and preparation of samples;
- Homogenization and aliquots of laboratory samples;
- Field methods for sampling, scanning, or direct measurements;
- Laboratory analytical process;
- Total bias contributed by all sources.

Systematic investigations of field or laboratory processes can be initiated to assess and identify the extent of errors, bias, and data variability and to determine if the data quality objectives (DQOs) are achieved. An important aspect of each QC determination is the representative nature of a sample or measurement (see Section 3.3.9.6 for a description of representativeness). If additional samples or measurements are not taken according to the appropriate method, the resulting QC information will be invalid or unusable. For example, if an inadequate amount of sample is collected, the laboratory analytical procedure may not yield a proper result. The QC sample must represent the sample population being studied. Misrepresentation itself creates a bias that, if undetected, leads to inaccurate conclusions concerning an analysis. At the very least, misrepresentation leads to a need for additional QA investigation.

#### 3.3.9.2 Data quality indicators

The assessment of data quality indicators presented in this section is significant to determine data usability. The principal data quality indicators are precision, bias, representativeness, comparability, and completeness. Other data quality indicators affecting the RSSI process include the selection and classification of survey units, Type I and Type II decision error rates, the variability in the radionuclide concentration measured within the survey unit, and the lower bound of the gray region (see Appendix B).

In some instances, the data quality indicator requirements will help in the selection of a measurement system. In other cases, the requirements of the measurement system will assist in the selection of appropriate levels for the data quality indicators.

Of the six principal data quality indicators:

- Precision and bias are quantitative measures.
- Representativeness and comparability are qualitative.
- Completeness is a combination of both qualitative and quantitative measures.
- Accuracy is a combination of precision and bias.
- The selection and classification of survey units is qualitative.
- Decision error rates, variability, and the lower bound of the gray region are quantitative measures.

Determining the usability of analytical results begins with the review of QC measurements (see Section 3.4.13) and qualifiers to assess the measurement result and the performance of the analytical method. If an error in the data is discovered, it is more important to evaluate the effect of the error on the data than to determine the source of the error. The documentation described in Section 3.11 is reviewed as a whole for some criteria. Data are reviewed at the measurement level for other criteria.

Factors affecting the accuracy of identification and the precision and bias of quantisation of individual radio-nuclides, such as calibration and recoveries, should be examined radionuclide by radionuclide.

Table 3.20 presents a summary of QC measurements and the data use implications.

Quality control criterion |
Effect on identification when criterion is not met |
Quantitative bias |
Use |
|||

Spikes (Higher than expected result) | Potential for incorrectly deciding a survey unit does not meet the release criterion (Type II decision error) |
High | Use data as upper limit | |||

Spikes (Lower than expected result) | Potential for incorrectly deciding a survey unit does meet the release criterion^{a} (Type I decision error) |
Low | Use data as lower limit | |||

Replicates (Inconsistent) | None, unless analyze found in one duplicate and not the other – then either Type I or Type II decision error |
High or low^{b} |
Use data as estimate – poor precision | |||

Blanks (Contaminated) | Potential for incorrectly deciding a survey unit does not meet the release criterion (Type II decision error) |
High | Check for gross contamination or instrument malfunction | |||

Calibration (Bias) | Potential for Type I or Type II decision errors | High or low^{b} |
Use data as estimate unless problem is extreme |

Table 3.20 Use of quality control data

^{a} Only likely if recovery is near zero.

^{b} Effect on bias determined by examination of data for each radio-nuclide.

#### 3.3.9.3 Precision

Precision is a measure of agreement among replicate measurements of the same property under prescribed similar conditions. This agreement is calculated as either the range or the standard deviation. It may also be expressed as a percentage of the mean of the measurements such as relative range (for duplicates) or coefficient of variation.

For scanning and direct measurements, precision may be specified for a single person performing the measurement or as a comparison between people performing the same measurement. For laboratory analyses, precision may be specified as either intra-laboratory (within a laboratory) or inter-laboratory (between laboratories).

Precision estimates based on a single surveyor or laboratory represent the agreement expected when the same person or laboratory uses the same method to perform multiple measurements of the same location. Precision estimates based on two or more surveyors or laboratories refer to the agreement expected when different people or laboratories perform the same measurement using the same method.

Determining precision by replicating measurements with results at or near the detection limit of the measurement system is not recommended because the measurement uncertainty is usually greater than the desired level of precision. The types of replicate measurements applied to scanning and direct measurements are limited by the relatively uncomplicated measurement system (i.e., the uncertainties associated with sample collection and preparation are eliminated). However, the uncertainties associated with applying a single calibration factor to a wide variety of site conditions mean these measurements are very useful for assessing data quality.

There are several types of replicate analyses available to determine the level of precision, and these replicates are typically distinguished by the point in the sample collection and analysis process where the sample is divided. Determining precision by replicating measurements with results at or near the detection limit of the measurement system is not recommended because the measurement uncertainty is usually greater than the desired level of precision.

*Collocated Samples.*Collocated samples are samples collected adjacent to the routine field sample to determine local variability of the radionuclide concentration. Typically, collocated samples are collected about one-half to three feet away from the selected sample location. Analytical results from collocated samples can be used to assess site variation, but only in the immediate sampling area. Collocated samples should not be used to assess variability across a site and are not recommended for assessing error. Collocated samples can be non-blind, single-blind, or double-blind.*Field Replicates.*Field replicates are samples obtained from one location, homogenized, divided into separate containers and treated as separate samples throughout the remaining sample handling and analytical processes. These samples are used to assess error associated with sample heterogeneity, sample methodology and analytical procedures. Field replicates are used when determining total error for critical samples with contamination concentrations near the action level. For statistical analysis to be valid in such a case, a minimum of eight replicate samples would be required [EPA-1991]). Field replicates (or field split samples) can be non-blind, single-blind, or double-blind and are recommended for determining the level of precision for a radiation survey or site investigation.*Replicates to Measure Operator Precision.*For scanning and direct measurements, replicates to measure operator precision provide an estimate of precision for the operator and the Standard Operating Procedure (SOP) or protocol used to perform the measurement. Replicates to measure operator precision are measurements performed using the same instrument at the same location, but with a different operator. Replicates to measure operator precision are usually non-blind or single-blind measurements.*Replicates to Measure Instrument Precision.*For scanning and direct measurements, replicates to measure instrument precision provide an estimate of precision for the type of instrument, the calibration, and the SOP or protocol used to perform the measurement. Replicates to measure instrument precision are measurements performed by the same operator at the same location, but with a different instrument. Replicates to measure instrument precision are usually non-blind or single-blind measurements.*Analytical Laboratory Replicate.*An analytical laboratory replicate is a sub-sample of a routine sample that is homogenized, divided into separate containers, and analyzed using the same analytical method. It is used to determine method precision, but because it is a non-blind sample, or known to the analyst, it can only be used by the analyst as an internal control tool and not as an unbiased estimate of analytical precision [EPA-1990].*Laboratory Instrument Replicate.*A laboratory instrument replicate is the repeated measurement of a sample that has been prepared for counting (i.e., laboratory sample preparation and radiochemical procedures have been completed). It is used to determine precision for the instrument (repeated measurements using same instrument) and the instrument calibration (repeated measurements using different instruments, such as two different germanium detectors with multi-channel analyzers). A laboratory instrument replicate is generally performed as part of the laboratory QC program and is a non-blind sample. It is typically used as an internal control tool and not as an unbiased estimate of analytical precision.

For many surveys a combination of sample, operator and laboratory replicates are used to provide an estimate of overall precision for both scanning and direct measurements. Replicates of direct measurements can be compared with one another similar to the analytical results for samples. Results for scanning replicates may be obtained by stopping and recording instrument readings at specific intervals during the scanning survey (effectively performing direct measurements at specified locations). An alternative method for estimating the precision of scanning is to evaluate the effectiveness of the scanning survey for identifying areas of elevated activity. The results of scanning are usually locations that are identified for further investigation. A comparison of the areas identified by the replicate scanning surveys can be performed either quantitatively (using statistical methods) or qualitatively (using professional judgment). Because there is a necessity to evaluate whether the same number of locations was identified by both replicates as well as if the identified locations are the same, there is difficulty in developing precision as a DQO that can be evaluated.

The two basic activities performed in the assessment of precision are estimating the radionuclide concentration variability from the measurement locations and estimating the measurement error attributable to the data collection process. The level for each of these performance measures should be specified during development of DQOs. If the statistical performance objectives are not met, additional measurements should be taken or one (or more) of the performance parameters changed.

Measurement error is estimated using the results of replicate measurements, as discussed in Section 3.9.2.9; for field measurements and for laboratory measurements. When collocated measurements are performed (in the field or in the laboratory) an estimate of total precision is obtained. When collocated samples are not available for laboratory analysis, a sample subdivided in the field and preserved separately can be used to assess the variability of sample handling, preservation, and storage along with the variability in the analytical process, but variability in sample acquisition is not included. When only variability in the analytical process is desired, a sample can be subdivided in the laboratory prior to analysis.

Summary statistics such as sample mean and sample variance can provide as assessment of the precision of a measurement system or component thereof for a project. These statistics may be used to estimate precision at discrete concentration levels, average estimated precision over applicable concentration ranges, or provide the basis for a continual assessment of precision for future measurements. Methods for calculating and reporting precision are provided in EPA guidance for quality assurance project plans.

Table 3.21 presents the minimum considerations, impacts if the considerations are not met, and corrective actions for precision.

Minimum considerations for precision |
Impact when minimum considerations are not met |
Corrective action |
|||

Confidence level as specified in DQOs. Power as specified in DQOs. Minimum detectable relative differences specified in the survey design and modified after analysis of background measurements if necessary. One set of field duplicates or more as specified in the survey design. Analytical duplicates and splits as specified in the survey design. Measurement error specified. |
Errors in decisions to act or not to act based on analytical data. Unacceptable level of uncertainty. Increased variability of quantitative results. Potential for incorrectly deciding a survey unit does meet the release criterion for measurements near the detection limits (Type I decision error). |
For surveying and sampling: ● Add survey or sample locations based on information from available data that are known to be representative. ● Adjust performance objectives. For analysis: ● Analysis of new duplicate samples. ● Review laboratory protocols to ensure comparability. ● Use precision measurements to determine confidence limits for the effects on the data. The investigator can use the maximum measurement results to set an upper bound on the uncertainty if there is too much variability in the analyses. |

Table 3.21 Minimum considerations for precision, impact if not met and corrective actions

#### 3.3.9.4 Bias

Bias is the systematic or persistent distortion of a measurement process and result from faults in sampling designs and procedures, analytical procedures, sample contamination, losses, interactions with containers, deterioration, inaccurate instrument calibration, and other sources. Bias causes the mean value of the sample data to be consistently higher or lower than the true mean value.

##### 3.3.9.4.1 Bias assessments for radio-analytical measurements

Bias assessments for radio-analytical measurements should be made using personnel, equipment, and spiking materials or reference materials as independent as possible from those used in the calibration of the measurement system. QC samples used to determine bias should be included as early in the analytical process as possible.

*Reference Material.*A reference material or substance one or more of whose property values are sufficiently homogeneous and well established to be used for the calibration of an apparatus, the assessment of a measurement method, or for assigning values to materials [ISO-1993]. A certified reference material is reference material for which each certified property value is accompanied by an uncertainty at a stated level of confidence. Radioactive reference materials may be available for certain radio-nuclides in soil (e.g., uranium in soil), but reference building materials may not be available. Because reference materials are prepared and homogenized as part of the certification process, they are rarely available as double-blind samples. When appropriate reference materials are available (i.e., proper matrix, proper radionuclide, proper concentration range), they are recommended for use in determining the overall bias for a measurement system.*Performance Evaluation Samples.*Performance evaluation sample are samples that evaluate the overall bias of the analytical laboratory and detect any error in the analytical method used. These samples are usually prepared by a third party, using a quantity of analyte(s) which is known to the preparer but unknown to the laboratory, and always undergo certification analysis. The analyte(s) used to prepare the performance evaluation sample is the same as the analyte(s) of interest. Laboratory procedural error is evaluated by the percentage of analyte identified in the performance evaluation sample. Performance evaluation samples are recommended for use in determining overall bias for a measurement system when appropriate reference materials are not available. Performance evaluation samples are equivalent to matrix spikes prepared by a third party that undergo certification analysis and can be non-blind, single-blind, or double-blind.*Matrix Spike Samples.*Matrix spike samples are environmental samples that are spiked in the laboratory with a known concentration of a target analyte(s) to verify percent recoveries. They are used primarily to check sample matrix interferences but can also be used to monitor laboratory performance. However, a data set of at least three or more results is necessary to distinguish between laboratory performance and matrix interference. Matrix spike samples are often replicated to monitor method performance and evaluate error due to laboratory bias and precision (when four or more pairs are analyzed). These replicates are often collectively referred to as a matrix spike/matrix spike duplicate.

There are several additional terms applied to samples prepared by adding a known amount of the radionuclide of interest to the sample. The majority of these samples are designed to isolate individual sources of bias within a measurement system by preparing pre- and post-operation spikes. For example, the bias from the digestion phase of the measurement system can be determined by comparing the result from a pre-digest spike to the result from a post-digest spike.

When possible, bias assessments should be based on certified reference materials rather than matrix spikes or water spikes so that the effect of the matrix and the chemical composition of the contamination is incorporated into the assessment. While matrix spikes include matrix effects, the addition of a small amount of liquid spike does not always reflect the chemical composition of the contamination in the sample matrix. Water spikes do not account for either matrix effects or chemical composition of the contamination. When spikes are used to assess bias, a documented spiking protocol and consistency in following that protocol are important to obtaining meaningful data quality estimates.

Activity levels for bias assessment measurements should cover the range of expected contaminant concentrations, although the minimum activity is usually at least five times the MDC. For many final status surveys, the expected contaminant concentration is zero or background, so the highest activity will be associated with the bias assessment measurements. The minimum and maximum concentrations allowable in bias assessment samples should be agreed on during survey planning activities to prevent accidental contamination of the environment or an environmental level radio-analytical laboratory.

##### 3.3.9.4.2 Scanning and direct measurements

Field work using scanning or direct measurements eliminates some sources of error because samples are not removed, containerized, nor transported to another location for analysis. The operator’s technique or field instrument becomes the source of bias. In this case, detecting bias might incorporate field replicates (see Section 3.3.9.3) by having a second operator to revisit measurement locations and following the same procedure with the same instrument as was used by the first operator. This is an approach used to assess precision of measurements. A field instrument’s calibration can also be checked by one or more operators during the course of a survey and recorded on a control chart. Differences in set up or handling of instruments by different operators may reveal a significant source of bias that is quite different from sources of bias associated with laboratory work.

For scanning and direct measurements there are a limited number of options available for performing bias assessment measurements. Perhaps the best estimate of bias for scanning and direct measurements is to collect samples from locations where scans or direct measurements were performed, analyze the samples in a laboratory, and compare the results. Problems associated with this method include the time required to obtain the results and the difficulty in obtaining samples that are representative of the field measurement to provide comparable results. A simple method of demonstrating that analytical bias is not a significant problem for scanning or direct measurements is to use the instrument performance checks to demonstrate the lack of analytical bias. A control chart can be used to determine the variability of a specific instrument and track the instrument performance throughout the course of the survey. Field background measurements can also be plotted on a control chart to estimate bias caused by contamination of the instrument.

There are also several types of samples used to estimate bias caused by contamination:

*Background Sample.*A background sample is a sample collected up-gradient of the area of potential contamination (either on-site or off-site) where there is little or no chance of migration of the contaminants of concern. Background samples are collected from the background reference area, determine the natural composition and variability of the soil (especially important in areas with high concentrations of naturally occurring radio-nuclides), and are considered “clean” samples. They provide a basis for comparison of contaminant concentration levels with samples collected from the survey unit when the statistical tests described in Section 3.9.2.10 are performed.*Field Blanks.*Field blanks are samples prepared in the field using certified clean sand or soil and then submitted to the laboratory for analysis. A field blank is used to evaluate contamination error associated with sampling methodology and laboratory procedures. It also provides information about contaminants that may be introduced during sample collection, storage, and transport. Field blanks are recommended for determining bias resulting from contamination for a radiation survey or site investigation.*Method Blank.*A method blank is an analytical control sample used to demonstrate that reported analytical results are not the result of laboratory contamination. It contains distilled or deionised water and reagents, and is carried through the entire analytical procedure (laboratory sample preparation, digestion, and analysis). The method blank is also referred to as a reagent blank. The method blank is generally used as an internal control tool by the laboratory because it is a non-blind sample.

Table 3.22 presents the minimum considerations, impacts if the considerations are not met, and corrective actions for bias.

Minimum considerations for bias |
Impact when minimum considerations are not met |
Corrective action |
|||

Matrix spikes to assess bias of non-detects and positive sample results if specified in the survey design. Analytical spikes as specified in the survey design. Use analytical methods (routine methods whenever possible) that specify expected or required recovery ranges using spikes or other QC measures. No radio-nuclides of potential concern detected in the blanks. |
Potential for incorrectly deciding a survey unit does meet the release criterion (Type I decision error): if spike recovery is low, it is probable that the method or analysis is biased low for that radionuclide and values of all related samples may underestimate the actual concentration. Potential for incorrectly deciding a survey unit does not meet the release criterion (Type II decision error): if spike recovery exceeds 100%, interferences may be present, and it is probable that the method or analysis is biased high. Analytical results overestimate the true concentration of the spiked radio-nuclide. |
Consider re-sampling at affected locations. If recoveries are extremely low or extremely high, the investigator should consult with a radio-chemist or health physicist to identify a more appropriate method for reanalysis of the samples. |

Table 3.22 Minimum considerations for bias, impact if not met and corrective actions

#### 3.3.9.5 Accuracy

Accuracy is a measure of the closeness of an individual measurement or the average of a number of measurements to the true value. Accuracy includes a combination of random error (precision) and systematic error (bias) components that result from performing measurements. Systematic and random uncertainties (or errors) are discussed in more detail in Section 3.9.2.10.

Accuracy is determined by analyzing a reference material of known contaminant concentration or by reanalyzing material to which a known concentration of contaminant has been added. To be accurate, data must be both precise and unbiased. Using the analogy of archery, to be accurate one’s arrows must land close together and, on average, at the spot where they are aimed. That is, the arrows must all land near the bull’s eye (see Figure 3.4).

Accuracy is usually expressed either as a percent recovery or as a percent bias. Determination of accuracy always includes the effects of variability (precision); therefore, accuracy is used as a combination of bias and precision. The combination is known statistically as mean square error. Mean square error is the quantitative term for overall quality of individual measurements or estimators.

Mean square error is the sum of the variance plus the square of the bias. (The bias is squared to eliminate concern over whether the bias is positive or negative.) Frequently it is impossible to quantify all of the components of the mean square error – especially the biases – but it is important to attempt to quantify the magnitude of such potential biases, often by comparison with auxiliary data.

#### 3.3.9.6 Representativeness

Representativeness is a measure of the degree to which data accurately and precisely represent a characteristic of a population parameter at a sampling point or for a process condition or environmental condition. Representativeness is a qualitative term that should be evaluated to determine whether in-situ and other measurements are made and physical samples collected in such a manner that the resulting data appropriately reflect the media and contamination measured or studied.

Representativeness of data is critical to data usability assessments. The results of the environmental radiological survey will be biased to the degree that the data do not reflect the radio-nuclides and concentrations present at the site. Non-representative radionuclide identification may result in false negatives. Non-representative estimates of concentrations may be higher or lower than the true concentration. With few exceptions, non-representative measurements are only resolved by additional measurements. Sample collection and analysis is typically less representative of true radionuclide concentrations at a specific measurement location than performing a direct measurement. This is caused by the additional steps required in collecting and analyzing samples, such as sample collection, field sample preparation, laboratory sample preparation, and radiochemical analysis. However, direct measurement techniques with acceptable detection limits are not always available. When sampling is required as part of a survey design, it is critical that the sample collection procedures consider representativeness.

Representativeness is primarily a planning concern. The solution to enhancing representativeness is in the design of the survey plan. Representativeness is determined by examining the survey plan. Analytical data quality affects representativeness since data of low quality may be rejected for use.

Table 3.23 presents the minimum considerations, impacts if the considerations are not met, and corrective actions for representativeness.

*Minimum considerations for representativeness * |
Impact when minimum considerations are not met |
Corrective action |
|||

Survey data representative of survey unit. Documented sample preparation procedures. Filtering, compositing, and sample preservation may affect representativeness. Documented analytical data as specified in the survey design. |
Bias high or low in estimate of extent and quantity of contaminated material. Potential for incorrectly deciding a survey unit does meet the release criterion (Type I decision error). Inaccurate identification or estimate of concentration of a radio-nuclide. Remaining data may no longer sufficiently represent the site if a large portion of the data are rejected, or if all data from measurements at a specific location are rejected. |
Additional surveying or sampling. Examination of effects of sample preparation procedures. Re-analysis of samples, or re-surveying or re-sampling of the affected site areas. If the re-surveying, re-sampling, or re-analyses cannot be performed, document in the site environmental radiological survey report what areas of the site are not represented due to poor quality of analytical data. |

Table 3.23 Minimum considerations for representativeness, impact if not met and corrective actions

#### 3.3.9.7 Comparability

Comparability is the qualitative term that expresses the confidence that two data sets can contribute to a common analysis and interpolation. Comparability should be carefully evaluated to establish whether two data sets can be considered equivalent in regard to the measurement of a specific variable or groups of variables.

Comparability is not compromised provided that the survey design is unbiased, and the survey design or analytical methods are not changed over time. Comparability is a very important qualitative data indicator for analytical assessment and is a critical parameter when considering the combination of data sets from different analyses for the same radio-nuclides. The assessment of data quality indicators determines if analytical results being reported are equivalent to data obtained from similar analyses. Only comparable data sets can be readily combined.

The use of routine methods (e.g., sampling, sample preparation and preservation, see Section 3.4) simplifies the determination of comparability because all laboratories use the same standardized procedures and reporting parameters. In other cases, the decision maker may have to consult with a health physicist and/or radio-chemist to evaluate whether different methods are sufficiently comparable to combine data sets.

There are a number of issues that can make two data sets comparable, and the presence of each of the following items enhances their comparability:

- Two data sets should contain the same set of variables of interest.
- Units in which these variables were measured should be convertible to a common metric.
- Similar analytic procedures and quality assurance should be used to collect data for both data sets.
- Time of measurements of certain characteristics (variables) should be similar for both data sets.
- Measuring devices used for both data sets should have approximately similar detection levels.
- Rules for excluding certain types of observations from both samples should be similar.
- Samples within data sets should be selected in a similar manner.
- Sampling frames from which the samples were selected should be similar.
- Number of observations in both data sets should be of the same order of magnitude.

These characteristics vary in importance depending on the final use of the data. The closer two data sets are with regard to these characteristics, the more appropriate it will be to compare them. Large differences between characteristics may be of only minor importance depending on the decision that is to be made from the data.

Table 3.24 presents the minimum considerations, impacts if they are not met, and corrective actions for comparability.

Minimum considerations for comparability |
Impact when minimum considerations are not met |
Corrective action |

Unbiased survey design or documented reasons for selecting another survey design. The analytical methods used should have common analytical parameters. Same units of measure used in reporting. Similar detection limits. Equivalent sample preparation techniques. Analytical equipment with similar efficiencies or the efficiencies should be factored into the results. |
Non-additivity of survey results. Reduced confidence, power, and ability to detect differences, given the number of measurements available. Increased overall error. |
For surveying and sampling: ● Statistical analysis of effects of bias. For analytical data: ● Preferentially use those data that provide the most definitive identification and quantization of the radio-nuclides of potential concern. For quantization, examine the precision and accuracy data along with the reported detection limits. Reanalysis using comparable methods. |

Table 3.24 Minimum considerations for comparability, impact if not met and corrective actions

#### 3.3.9.8 Completeness

Completeness is a measure of the amount of valid data obtained from the measurement system, expressed as a percentage of the number of valid measurements that should have been collected (i.e., measurements that were planned to be collected).

Completeness for measurements is calculated by the following formula:

% Completeness = { (Number of valid measurements) x 100 } / ( Total number of measurements planned )

Completeness is not intended to be a measure of representativeness; that is, it does not describe how closely the measured results reflect the actual concentration or distribution of the contaminant in the media being measured. A project could produce 100% data completeness (i.e., all planned measurements were actually performed and found valid), but the results may not be representative of the actual contaminant concentration.

Alternatively, there could be only 70% data completeness (30 lost or found invalid), but, due to the nature of the survey design, the results could still be representative of the target population and yield valid estimates. The degree to which lack of completeness affects the outcome of the survey is a function of many variables ranging from deficiencies in the number of measurements to failure to analyze as many replications as deemed necessary by the QAPP and DQOs. The intensity of effect due to incompleteness of data is sometimes best expressed as a qualitative measure and not just as a quantitative percentage.

Completeness can have an effect on the DQO parameters. Lack of completeness may require reconsideration of the limits for decision error rates because insufficient completeness will decrease the power of the statistical tests described in Section 3.9.2.9.

For most final status surveys, the issue of completeness only arises when the survey unit demonstrates compliance with the release criterion and less than 100% of the measurements are determined to be acceptable. The question now becomes whether the number of measurements is sufficient to support the decision to release the survey unit. This question can be answered by constructing a power curve as described in 0 and evaluating the results. An alternative method is to consider that the number of measurements estimated to demonstrate compliance in Section 3.5 was increased by 20% to account for lost or rejected data and uncertainty in the calculation of the number of measurements. This means a survey with 80% completeness may still have sufficient power to support a decision to release the survey unit.

Completeness is of greater concern for laboratory analyses than for direct measurements because the consequences of incomplete data often require the collection of additional samples. Direct measurements can usually be repeated fairly easily. The collection of additional samples generally requires a remobilization of sample collection personnel which can be expensive. Conditions at the site may have changed making it difficult or impossible to collect representative and comparable samples without repeating the entire survey.

On the other hand, if it is simply an analytical problem and sufficient sample was originally collected, the analysis can be repeated using archived sample material. Samples collected on a grid to locate areas of elevated activity are also a concern for completeness. If one sample analysis is not valid, the entire survey design for locating areas of elevated activity may be invalidated.

Table 3.25 presents the minimum considerations, impacts if the considerations are not met, and corrective actions for completeness.

Minimum considerations for completeness |
Impact when minimum considerations are not met |
Corrective action |
|||

Percentage of measurement completeness determined during planning to meet specified performance measures. | Higher potential for incorrectly deciding a survey unit does not meet the release criterion (Type II decision error). Reduction in power. A reduction in the number of measurements reduces site coverage and may affect representativeness. Reduced ability to differentiate site levels from background. Impact of incompleteness generally decreases as the number of measurements increases. |
Resurveying, re-sampling, or reanalysis to fill data gaps. Additional analysis of samples already in laboratory. Determine whether the missing data are crucial to the survey. |

Table 3.25 Minimum considerations for completeness, impact if not met and corrective actions

#### 3.3.9.9 Other sources of uncertainty

Counting errors are often not the limiting factor in the repeatability or accuracy of results. Whenever samples are taken from a heterogenous medium such as soil, there will usually be a large sample to sample variation. In general, the larger the sample size taken, the more statistically valid will be the result. Where gamma spectrometry is being undertaken, the use of a Marinelli beaker which surrounds the sensitive volume of the detector will give an optimum geometry in terms of sensitivity and in terms of maximizing the sample size. If this approach is taken, care should be taken that:

- True coincidence summing does not adversely affect the results at a significant level.
- The range of gamma rays in the sample medium is not much less than the thickness of the sample (otherwise, the detector will be sensitive to a much smaller volume of sample than might have been believed).

The latter effect will be compensated adequately if:

- The calibration standard used is similar in density to the sample density or
- To apply a detector efficiency program to calculate the effect.

#### 3.3.9.10 Uncertainty introduced by the applied statistical method(s)

EURSSEM encourages the use of statistics to provide a quantitative estimate of the probability that the release criterion is not exceeded at a site. While it is unlikely that any site will be able to demonstrate compliance with a dose- or risk-based regulation without at least considering the use of statistics, EURSSEM recognizes that the use of statistical tests may not always provide the most effective method for demonstrating compliance.

For example, EURSSEM recommends a simple comparison to an investigation level to evaluate the presence of small areas of elevated activity in place of complicated statistical tests. At some sites a simple comparison of each measurement result to the derived concentration guideline level (DCGLW), to demonstrate that all the measurement results are below the release criterion, may be more effective than statistical tests for the overall demonstration of compliance with the regulation provided an adequate number of measurements are performed.

EURSSEM recommends the use of *non-parametric statistical tests* for evaluating environmental data.

There are two reasons for this recommendation:

- Environmental data is usually not normally distributed.
- There are often a significant number of qualitative survey results (e.g., less than Minimum Detectable Concentration – MDC).

Either one of these conditions means, that parametric statistical tests may not be appropriate. If one can demonstrate that the data are distributed according to a certain parametric statistical test and that there are a sufficient number of results to support this decision concerning a survey unit, parametric tests will generally provide higher power (or require fewer measurements to support a decision concerning the survey unit). The tests to demonstrate that the data are distributed according to a certain parametric statistical test generally require more measurements than the non-parametric tests.

The parameter of interest is the mean concentration in the survey unit. The non-parametric tests recommended in this manual, in their most general form, are tests of the median. If one assumes that the data are from a symmetric distribution – where the median and the mean are effectively equal – these are also tests of the mean.

If the assumption of symmetry is violated, then non-parametric tests of the median approximately test the mean. That is, the correct decision will be made about whether or not the mean concentration exceeds the derived concentration guideline level (DCGL), even when the data come from a skewed distribution. In this regard, the nonparametric tests are found to be correct more often than the commonly used Student’s t-test. The robust performance of the Sign and Wilcoxon Rank Sum (WRS) tests over a wide range of conditions is the reason that they are recommended in this manual.

There are a wide variety of statistical tests designed for use in specific situations. These tests may be preferable to the generic non-parametric statistical tests recommended in EURSSEM when the underlying assumptions for these tests can be verified.

When a given set of assumptions is true, a parametric test designed for exactly that set of conditions will have the highest power. For example, if the data are from a normal distribution, the Student’s t-test will have higher power than the non-parametric tests. It should be noted that for large enough sample sizes (e.g., large number of measurements), the Student’s t-test is not a great deal more powerful than the non-parametric tests. On the other hand, when the assumption of normality is violated, the non-parametric tests can be very much more powerful than the Student’s t-test. Therefore, any statistical test may be used provided that the data are consistent with the assumptions underlying their use. When these assumptions are violated, the prudent approach is to use the non-parametric tests which generally involve fewer assumptions than their parametric equivalents.

Table 3.26 lists several examples of statistical tests that may be considered for use at individual sites or survey units. A brief description of the tests and references for obtaining additional information on these tests are also listed in the table. Applying these tests may require consultation with a statistician.

Alternate tests |
Probability model assumed |
Type of test |
Advantages |
Disadvantages |

Alternate 1 – Sample Tests (no reference area measurements) |
||||

Student’s t Test [EPA-1996b] | Normal | Parametric test for H _{o}: Mean < L |
Appropriate if data appears to be normally distributed and symmetric. | Relies on a non-robust estimator for μ and σ. Sensitive to outliers and departures from normality. |

t Test Applied to Logarithms [EPA-1996b] | Lognormal | Parametric test for H _{o}: Mean < L |
This is a well-known and easy-to-apply test. Useful for a quick summary of the situation if the data is skewed to right. |
Relies on a non-robust estimator for σ. Sensitive to outliers and departures from log-normality. |

Minimum Variance Unbiased Estimator for Lognormal Mean [GILBERT] | Lognormal | Parametric estimates for mean and variance of lognormal distribution. |
A good parametric test to use if the data is lognormal. | Inappropriate if the data is not lognormal. |

Chen Test [JASA] | Skewed to right, including Lognormal | Parametric test for H _{o}: Mean > 0 |
A good parametric test to use if the data is lognormal. | Applicability only for testing H_{o}: “survey unit is clean”. Survey unit must be significantly greater than 0 to fail. Inappropriate if the data is not skewed to the right. |

Bayesian approaches [DEGROOT] | Varies, but a family of probability distributions must be selected. |
Parametric test for H _{o}: Mean < L |
Permits use of subjective “expert judgment” in interpretation of data. | Decisions based on expert judgment may be difficult to explain and defend. |

Bootstrap [HALL] | No restriction | Non-parametric. Uses re-sampling methods to estimate sampling variance. |
Avoids assumptions concerning the type of distribution. | Computer intensive analysis required. Accuracy of the results can be difficult to assess. |

Lognormal Confidence Intervals using Bootstrap [ANGUS] | Lognormal | Uses re-sampling methods to estimate one-sided confidence interval for lognormal mean. |
Non-parametric method applied within a parametric lognormal model. | Computer intensive analysis required. Accuracy of the results can be difficult to assess. |

Alternate 2 – Sample Tests (reference area measurements are required) |
||||

Student’s t Test [EPA-1996b] | Symmetric, normal | Parametric test for difference in means H _{o}: μ_{x} < μ_{y} |
Easy to apply. Performance for non-normal data is acceptable. | \^. Relies on a non-robust estimator for σ, therefore test results are sensitive to outliers. |

Mann-Whitney Test [HOLLANDER] | No restrictions | Non-parametric test difference in location H _{o}: μ_{x} < μ_{y} |
Equivalent to the WRS test, but used less often. Similar to re-sampling, because test is based on set of all possible differences between the two data sets. |
Assumes that the only difference between the test and reference areas is a shift in location. |

Kolmogorov-Smirnov [HOLLANDER] | No restrictions | Non-parametric test for any difference between the 2 distributions |
A robust test for equality of two sample distributions against all alternatives. | May reject because variance is high, although mean is in compliance. |

Bayesian approaches [BOX] | Varies, but a family of probability distributions must be selected |
Parametric tests for difference in means or difference in variance. | Permits use of “expert judgment” in the interpretation of data. | Decisions based on expert judgment may be difficult to explain and defend. |

2-Sample Quantile Test [EPA-1992] | No restrictions | Non-parametric test for difference in shape and location. | Will detect if survey unit distribution exceeds reference distribution in the upper quantiles. | Applicable only for testing H_{o}: “survey unit is clean”. Survey unit must be significantly greater than 0 to fail. |

Simultaneous WRS and Quantile Test [EPA-1992] | No restrictions | Non-parametric test for difference in shape and location. | Additional level of protection provided by using two tests. Has advantages of both tests. | Cannot be combined with the WRS test that uses H_{o}: “survey unit is not clean”. Should only be combined with WRS test for H_{o}: “survey unit is clean”. |

Bootstrap and other Re-sampling methods [HALL] | No restrictions | Non-parametric. Uses re-sampling methods to estimate sampling variance. | Avoids assumptions concerning the type of distribution. Generates informative re-sampling distributions for graphing. |
Computer intensive analysis required. |

Alternate to Statistical Tests |
||||

Decision Theory [DOE] | No restrictions | Incorporates loss function in the decision theory approach. | Combines elements of cost-benefit analysis and risk assessment into the planning process. | Limited experience in applying the method to compliance demonstration and decommissioning. Computer intensive analysis required. |

Table 3.26 Examples of alternate statistical tests

#### 3.3.9.11 Uncertainty in data interpretation

It should be recognized that there will always be an element of uncertainty in the interpretation of site characterization data. This needs to be acknowledged in the reporting and quantified where possible. The significance of the uncertainty and methods of reducing it should also be explained to stakeholders. There are three aspects to site characterization data uncertainty:

*Conceptual model uncertainty.*The initial conceptual model of the site will have formed the basis for identification of potential pollutant linkages and for the design of the survey. The site characterisation will have focused on reducing those uncertainties in the preliminary conceptual model that are of greatest significance to possible adverse impacts on receptors. Nevertheless, some residual uncertainty will remain at the end of the site characterisation process. For example, there may be uncertainty regarding the presence of preferential flowpaths at the site (perhaps associated with sub-surface services or made ground). Areas of remaining uncertainty should be identified for phased investigation or other potential uncertainty reducing measures such as increased numbers of samples, real-time data collection to identify target areas or use of the Triad approach. The greater the natural or inherent variation in residual radioactivity, the greater the uncertainty associated with a decision based on the survey results.*Data uncertainty.*Only a very small fraction of the site will have been directly sampled. It is important to evaluate the extent to which data obtained are representative of the site. Key issues to consider for the acquired data are set out in Table 3.34. The unanswered question is: “How well do the survey results represent the true level of residual radioactivity in the survey unit?”*Measurement errors.*These create uncertainty by masking the true level of residual radioactivity and may be classified as random or systematic errors. Random errors affect the precision of the measurement system, and show up as variations among repeated measurements. Systematic errors show up as measurements that are biased to give results that are consistently higher or lower than the true value.

Area of uncertainty |
Potential solutions |
||

Sample heterogeneity (sub-sampling errors: see Section 3.4). |
Ensure representative sample mixing, splitting etc. | ||

Spatial variability of the parameter being measured. |
Optimised contaminated land investigation approach (design). | ||

Systematic measurement biases (gross alpha/beta analysis of soil: see Section 3.4 and Appendix D). |
Use laboratory practices to reduce uncertainty. |

Table 3.27 Key issues in data uncertainty

#### 3.3.9.12 Number of quality control measurements

The number of QC measurements is determined by the available resources and the degree to which one needs assurance that a measurement process is adequately controlled. The process is simplified, for example, when the scope of a survey is narrowed to a single method, one sampling crew, and a single laboratory to analyze field samples. Increasing the number of samples and scheduling sample collections and analyses over time or at different laboratories increases the level of difficulty and necessitates increasing the number of QC measurements. The number of QC measurements may also be driven upward as the action level approaches a given instrument’s detection limit. This number is determined on a case-by-case basis, where the specific contaminant and instruments are assessed for detecting a particular radionuclide.

A widely used standard practice is to collect a set percentage, such as 5%, of samples for QA purposes [EPA-1987]. However, this practice has disadvantages. For example, it provides no real assessment of the uncertainties for a relatively small sample size. For surveys where the required number of measurements increases, there may be a point beyond which there is little added value in performing additional QC measurements. Aside from cost, determining the appropriate number of QC measurements essentially depends on site-specific factors. For example, soil may present a complex and variable matrix requiring many more QC measurements for surface soils than for building surfaces.

A performance based alternative to a set percentage or rule of thumb can be implemented [EPA-1990]. First, potential sources of error or uncertainty, the likelihood of occurrence, and the consequences in the context of the DQOs should be determined. Then, the appropriate type and number of QC measurements based on the potential error or uncertainty are determined. For example, field replicate samples (i.e., a single sample that is collected, homogenized, and split into equivalent fractions in the field) are used to estimate the combined contribution of several sources of variation. Hence, the number of field replicate samples to be obtained in the study should be dictated by how precise the estimate of the total measurement should be.

Factors influencing this estimate include:

- The number of measurements;
- The number and experience of personnel involved;
- The current and historical performance of sampling and analytical procedures used;
- The variability of survey unit and background reference area radioactivity measurement systems used;
- The number of laboratories used;
- The level of radioactivity in the survey unit (which for a final status survey should be low);

How close an action level (e.g., DCGL) is to a detection limit (which may represent a greater concern after reducing or removing radionuclide concentrations by remediation).

Degrees of freedom ^{1} |
Level of cnfidence (%) |
||||

90 |
95 |
97.5 |
99 |
||

2 | 9.49 | 19.49 | 39.21 | 99.50 | |

5 | 3.10 | 4.34 | 6.02 | 9.02 | |

10 | 2.05 | 2.54 | 3.08 | 3.91 | |

15 | 1.76 | 2.07 | 2.40 | 2.87 | |

20 | 1.61 | 1.84 | 2.08 | 2.42 | |

25 | 1.52 | 1.71 | 1.91 | 2.17 | |

30 | 1.46 | 1.62 | 1.78 | 2.01 | |

40 | 1.38 | 1.51 | 1.64 | 1.80 | |

50 | 1.33 | 1.44 | 1.61 | 1.68 | |

100 | 1.21 | 1.28 | 1.35 | 1.43 |

Table 3.28 Upper confidence limits for the true variance as a function of the number of QC measurements used to determine the estimated variance [EPA-1990].

^{1} To obtain the necessary number of quality control measurements, add one to the degrees of freedom.

The precision of an estimate of the “true” variance for precision or bias within a survey design depends on the number of degrees of freedom used to provide the estimate. Table 3.28 provides the one-sided upper confidence limits for selected degrees of freedom assuming the results of the measurements are normally distributed. Confidence limits are provided for 90, 95, 97.5, and 99 percent confidence levels. At the stated level of confidence, the “true” variance of the estimate of precision or bias for a specified number of QC measurements will be between zero and the multiple of the estimated variance listed in Table 3.28. For example, for five degrees of freedom one would be 90% confident that the true variance for precision falls between zero and 3.10 times the estimated variance. The number of QC measurements is equal to one greater than the degrees of freedom.

When planning surveys, the number of each type of QC measurement can be obtained from Table 3.28. For example, if the survey objective is to estimate the variance in the bias for a specific measurement system between zero and two times the estimated variance at a 95% confidence level, 15 degrees of freedom or 16 measurements of a material with known concentration (e.g., performance evaluation samples) would be indicated. EURSSEM recommends that the survey objective be set such that the true variance falls between zero and two times the estimated variance. The level of confidence is then determined on a site-specific basis to adjust the number of each type of QC measurement to the appropriate level (i.e., 11, 16, 21 or 31 measurements). The results of the QC measurements are evaluated during the assessment phase of the data life cycle (see Section 3.10.8 and Section 2.13).

Example 3.13: A contaminated site with^{60}Co and consisting of four Class 1 interior survey units

A site is contaminated with

^{60}Co and consists of four Class 1 interior survey units, nine Class 2 interior survey units, two Class 3 interior survey units, and one Class 3 exterior survey unit. Three different measurement systems are specified in the survey design for performing scanning surveys, one measurement system is specified for performing direct measurements for interior survey units, and one measurement system is specified for measuring samples collected from the exterior survey unit.

Repeated measurements are used to estimate precision. For scan surveys there is not a specified number of measurements. 10% of the scans in each Class 1 survey unit were repeated as replicates to measure operator precision (see Section 3.3.9.2) within 24 hours of the original scan survey. 5% of each Class 2 and Class 3 survey unit were similarly repeated as replicates to measure operator precision. The results of the repeated scans were evaluated based on professional judgment. For direct measurements and sample collection activities, a 95% confidence level was selected as consistent with the objectives of the survey. Using Table 3.28, it was determined that 16 repeated measurements were required for both the direct measurement technique and the sample collection and laboratory measurement technique. Because 72 direct measurements would be performed in Class 1 survey units, 99 in Class 2 survey units, and 20 in Class 3 survey units, it was anticipated that at least 16 direct measurements would have sufficient activity above background to perform repeated measurements and obtain usable results (see Section 3.5 for guidance on determining the number of measurements. The 16 direct measurement locations to be repeated would be selected based on the results of the direct measurements and would represent the entire usable range of activity found in the survey units rather than measuring the 16 locations with the highest activities. (The usable range of activity includes the highest measurement result in the survey unit and the lowest measurement result with an acceptable measurement uncertainty compared to the desired level of precision.) The repeated measurements would be performed by different operators using the same equipment, but they would not know the results of the original survey. To ensure that the measurements would be valid, the QC measurements to check for contamination would be performed at the same time. Because the laboratory’s QA program called for periodic checks on the precision of the laboratory instruments, the total survey design precision for laboratory measurements was measured. Because the only samples collected would come from a Class 3 area, the sample activities were expected to be close to or below the measurement system MDC. This meant that field replicate samples would not provide any usable information. Also, QC samples for bias were repeated to obtain a usable estimate of precision for the survey design.

Measurements of materials with known concentrations above background (e.g., performance evaluation samples) and known concentrations at or below background (e.g., field blanks) are used to estimate bias. For scan surveys, the repeated scanning performed to estimate precision would also serves as a check for contamination using blanks. Because there was no appropriate material of known concentration on which to perform bias measurements, the calibration checks were used to demonstrate that the instruments were reading properly during the surveys. A control chart was developed using the instrument response for an un-calibrated check source. Measurements were obtained using a specified source-detector alignment that could be easily repeated. Measurements were obtained at several times during the day over a period of several weeks prior to taking the instruments into the field. Calibration checks were performed before and after each survey period in the field and the results immediately plotted on the control chart to determine if the instrument was performing properly. This method was also adopted for the direct measurement system. 20 samples were required by the survey design for the Class 3 exterior survey unit. To ensure that the samples were truly blind for the laboratory, samples three times the requested volume were collected. These samples were sent to a second laboratory for preparation. Each sample was weighed, dried, and reweighed to determine the moisture content. Then each sample was ground to a uniform particle size of 1 mm (approximately 16 mesh) and divided into three separate aliquots (each aliquot was the same size). For each sample one aliquot was packaged for transport to the laboratory performing the analysis. After these samples were packaged, 16 of the samples had both of the remaining aliquots spiked with the same level of activity using a source solution traceable to the National Institute of Science and Technology (NIST). The 16 samples each had a different level of activity within a range that was accepted by the laboratory performing the analysis. These 32 samples were also packaged for transport to the laboratory. In addition, 16 samples of a soil similar to the soil at the site were prepared as blanks to check against contamination. The 20 samples, 32 spikes, and 16 blanks were transported to the laboratory performing the analyses in a single shipment so that all samples were indistinguishable from each other except by the sample identification.

#### 3.3.9.13 Controlling sources of error

During the performance of a survey, it is important to identify sources of error and uncertainty early in the process so that problems can be resolved. The timing of the QC measurements within the survey design can be very important. In order to identify problems as early as possible, it may be necessary to perform a significant number of QC measurements early in the survey. This can be especially important for surveys utilizing an innovative or untested survey design. Survey designs that have been used previously and produced reliable results may be able to space the QC measurement evenly throughout the survey, or even wait to have samples analyzed at the end of the survey, as long as the objectives of the survey are achieved.

For example, a survey design requires a new scanning method to be used for several survey units when there are little performance data available for this technique. To ensure that the technique is working properly, the first few survey units are re-scanned to provide an initial estimate of the precision and bias. After the initial performance of the techniques has been verified, a small percentage of the remaining survey units is re-scanned to demonstrate that the technique is operating properly for the duration of the survey.

Identifying sources of error and uncertainty is only the first step. Once the sources of uncertainty have been identified, they should be minimized and controlled for the rest of the survey. Section 3.10.8 discusses the assessment of survey data and provides guidance on corrective actions that may be appropriate for controlling sources of error or uncertainty after they have been identified.

– by Rafael Garcia-Bermejo Fernandez about 6 years ago