When neither the participant nor the person assigning the participant to a study arm knows which study arm the participant will be randomized into until the moment of randomization. Any baseline assessments should take place either before randomization, or by a staff member who does not know the treatment group to which the participant has been assigned to. Lack of allocation concealment can introduce selection bias.
Autoregressive Integrated Moving Average (ARIMA)
A data analysis technique appropriate for time series data, where autocorrelation (successive or seasonal observations that are serially dependent) is likely. This approach is particularly appropriate to identify significant shifts in data associated with policy or other population level interventions, independent of the observed regularities in the history of the dependent variable. The ARIMA model describes the stochastic autocorrelation structure of the data series and, in effect, filters out any variance in a dependent variable that is predictable on the basis of the past history of that variable.
Baseline (in single subject research designs)
The period of time in which the target behavior (dependent variable) is observed and recorded as it occurs without a special or new intervention. Can also refer to a period of time following a treatment in which conditions match what was present in the original baseline.
Baseline (in controlled trials)
This initial assessment that takes place before the intervention has been implemented.
Influences on a study that can lead to invalid conclusions about a treatment or intervention. Bias in research can make a treatment look better or worse than it really is. Bias can even make it look as if the treatment works when it actually doesn't. Bias can occur by chance or as a result of systematic errors in the design and execution of a study. Bias can occur at different stages in the research process, e.g. in the collection, analysis, interpretation, publication or review of research data. Some commonly referred to types of bias are:
Non-response bias. When participants who do not participate in a study or complete follow-up assessments are systematically different from those who do.
Performance bias. Systematic differences in care provided apart from the intervention being evaluated. For example, if study participants know they are in the control group they may be more likely to use other forms of care; people who know they are in the experimental group may experience placebo effects, and care providers may treat patients differently according to what group they are in. Masking (blinding) of both the recipients and providers of care is used to protect against performance bias.
Publication bias. Studies with statistically significant results are more likely to get published than those with non-significant results. Meta-analyses that are exclusively based on published literature may therefore produce biased results. This type of bias can be assessed by a funnel plot or L’abbe plot.
Recall bias. When the study groups are systematically different in their ability to recall events that are key to assessing the effect of the intervention.
Reporting bias. When studies are more likely to report, selectively, outcomes that are statistically significant.
Selection bias. Selection bias has occurred if:
- the characteristics of the sample differ from those of the wider population from which the sample has been drawn, OR
- there are systematic differences between comparison groups of patients in a study in terms of prognosis or responsiveness to treatment.
Blinding (a.k.a., Masking)
The practice of keeping the research staff or participants of a study ignorant of the group to which a participant has been assigned. For example, a clinical trial in which the participating patients or their doctors are unaware of whether they (the patients) are taking the experimental drug or a placebo (dummy treatment). The purpose of 'blinding' or 'masking' is to protect against bias. Unless a placebo medication is involved, it is usually impossible to blind the interventionist and participants to the group assignment of the participants, but it is critically important that staff conducting the assessments be blind to the group assignment.
Celeration line approach
Also called the split-middle method of trend estimation. The procedure is designed to identify the trend of the data. A trend line is computed using data in the baseline phase. This line is then extended to the treatment phase to evaluate the effect of intervention on the subject's performance. The proportion of data points above and below the trend line is compared from the baseline to the treatment phase. If the treatment indicates no effect, the proportion of data points below and above the line should be equivalent in the baseline and the treatment phases.
Something that influences a study and can contribute to misleading findings if it is not understood or appropriately dealt with. For example, if a group of people exercising regularly and a group of people who do not exercise have an important age difference then any difference found in outcomes about heart disease could well be due to one group being older than the other rather than due to the exercising. Age is the confounding factor here and the effect of exercising on heart disease cannot be assessed without adjusting for age differences in some way.
The degree to which the observed pattern of how our treatment and assessments work corresponds to our theory of how they should work. The researcher has a theory of how the measures relate to one another and how the treatment works and relates to the outcome. When how things work in reality - as captured by the study’s assessments - matches up with how we theorized they should work, that provides evidence of construct validity.
Can be applied to a data series with as few as eight observations. The C-statistic produces a z value, which is interpreted using the normal probability table for z scores. The C-statistic is first calculated for the baseline data. If the baseline data do not contain a significant trend, the baseline and intervention data are combined and the C-statistic is again computed to determine whether a statistically significant change has occurred.
The magnitude of a treatment effect, independent of sample size. The effect size can be measured as either: a) the standardized difference between the treatment and control group means, or b) the correlation between the treatment group assignment (independent variable) and the outcome (dependent variable).
Specified characteristics that would prevent a potential participant from being included in the study.
The degree to which the results of a study hold true in non-study situations, e.g. in routine clinical practice. May also be referred to as the generalizability of study results to non-study patients or populations.
Fidelity (of intervention)
The degree to which the intervention was delivered as planned and was differentiated from the control condition as planned, at both conceptual and pragmatic levels. Conceptually, the issue is whether the intervention, but not the control condition, captured the theoretical constructs that the researcher believes produce its positive effect. Pragmatically, the issues are whether the interventionists followed the treatment plan by delivering the intended intervention elements and by not delivering any elements that were proscribed.
A graphical display of results from individual studies on a common scale, allowing visual comparison of results and examination of the degree of heterogeneity between studies.
Funnel plots are simple scatter plots on a graph. They show the treatment effects estimated from separate studies on the horizontal axis against a measure of sample size on the vertical axis. Publication bias may lead to asymmetry in funnel plots.
Specified characteristics that would enable a potential participant to be included in the study.
Refers to the integrity of the study design. Experimental studies with a high degree of internal validity give a high degree of confidence that differences seen between the groups are due to the intervention.
The degree to which two or more different raters give the same results to the same rating opportunity. This measures consistency between raters.
The degree to which a single rater is consistent in their assessment results.
An analysis that allows the researcher to estimate how many participants would be needed to achieve statistical significance for a given expected effect size. Power is the ability of a study to demonstrate an association or causal relationship between two variables, given that an association exists. For example, 80% power in a clinical trial means that the study has an 80% chance of ending up with a p value of less than 5% in a statistical test (i.e. a statistically significant treatment effect) if there really was an important difference (e.g. 10% versus 5% mortality) between treatments. If the statistical power of a study is low, the study results will be questionable (the study might have been too small to detect any differences). By convention, 80% is an acceptable level of power.
Reliability refers to a method of measurement that consistently gives the same results. For example someone who has a high score on one occasion tends to have a high score if measured on another occasion very soon afterwards. If different clinicians make independent assessments in quick succession - and if their assessments tend to agree then the method of assessment is said to be reliable. (see also Interrater reliability and Intrarater reliability).
When a study is repeated in a different sample. Replications sometimes involve minor changes that add new information in addition to confirming results of the previous study.
Creating a master list which assigns participants to a study arm. A random number table or computer-generated random numbers provide the basis for adequate sequence generation.
Statistical validity (or statistical conclusion validity)
The degree to which a statistical result is due to real differences rather than to chance or random error. This has to due with the proper use and interpretation of statistical tests.
Two-Standard Deviation Band Method
Also called the Shewart chart method. First the standard deviation is computed for the baseline data. Once the standard deviation is computed for the baseline data, bands are drawn on the graph that contain scores within 2 standard deviations from the mean. The treatment effect is considered significant if at least two consecutive data points lie outside of the bands.
There are a number of types of validity, all of which describe the degree to which we can “believe” the results of a study, either for the specific sample of participants, or for how the results apply to other samples. Commonly referenced types of validity are internal, external, statistical, and construct.
A measure of the degree of inconsistency in study results indicating the percentage of total variation across studies that is due to genuine differences in the studies rather than chance; calculated as 100%x(Q-df)/Q where Q=Cochran's Q and df=degrees of freedom.