December 23, 2002
Help on accessing alternative formats, such as Portable Document Format (PDF), Microsoft Word and PowerPoint (PPT) files, can be obtained in the alternate format help section.
Contact: Policy Bureau Enquiries
Our file number: 02-122028-691
The Health Canada Toxicological Evaluation guidances (revised 1996) are being withdrawn following an internal review by a Safety Expert Working Group which concluded that they no longer reflected current toxicological methodologies. Furthermore, the review revealed substantial areas of overlap and inconsistency between these guidances and their more recently adopted ICH counterparts.
The following Health Canada-adopted ICH Safety (Nonclinical) guidances, previously available as part of the Toxicological Evaluation guidances, are being re-issued as stand alone documents:
These ICH guidances have been developed by the appropriate ICH Expert Working Group and have been subject to consultation by the regulatory parties, in accordance with the ICH Process. The ICH Steering Committee has endorsed the final draft and recommended its adoption by the regulatory bodies of the European Union, Japan and USA.
In adopting these ICH guidances, Health Canada as observer to ICH, endorses the principles and practices described therein. These documents should be read in conjunction with this covering notice and with the relevant sections of other applicable Health Canada guidances.
These and other guidance documents are currently available on the Therapeutic Products Directorate / Biologics and Genetic Therapies Directorate Website (s) (http://www.hc-sc.gc.ca/hpfb-dgpsa/tpd-dpt/). The availability of printed copies of guidance documents may be confirmed by consulting the Guidelines and Publications Order Forms (available on the TPD/BGTD Website) or by contacting the Publications Coordinator2.
Should you have any questions regarding the content of the guidance, please contact
Colette F. Strnad, B. Sc., Ph.D.
Title: Senior Scientific Advisor
Office of Science
Therapeutic Products Directorate
Holland Cross, Tower B, 2nd Floor,
A.L. 3102C3 1600 Scott Street
Ottawa, Ontario K1A 1B6
telephone: (613) 941-3693
fax: (613) 941-5035
email: colette_strnad@hc-sc.gc.ca
Published by authority of the Minister of Health
1996
Health Products and Food Branch Guidance Document
© Minister of Public Works and Government Services Canada 1996
Available in Canada through
Health Canada - Publications
Brooke Claxton Building,
A.L. #0913A Tunney's Pasture
Ottawa, Ontario K1A 0K9
Tel: (613) 954-5995
Fax: (613) 941-5366
This guidance has been developed by the appropriate ICH Expert Working Group and has been subject to consultation by the regulatory parties, in accordance with the ICH Process. The ICH Steering Committee has endorsed the final draft and recommended its adoption by the regulatory bodies of the European Union, Japan and USA.
In adopting this ICH guidance, Health Canada endorses the principles and practices described therein. This document should be read in conjunction with the accompanying notice and the relevant sections of other applicable guidances.
Guidance documents are meant to provide assistance to industry and health care professionals on how to comply with the policies and governing statutes and regulations. They also serve to provide review and compliance guidance to staff, thereby ensuring that mandates are implemented in a fair, consistent and effective manner.
Guidance documents are administrative instruments not having force of law and, as such, allow for flexibility in approach. Alternate approaches to the principles and practices described in this document may be acceptable provided they are supported by adequate scientific justification. Alternate approaches should be discussed in advance with the relevant program area to avoid the possible finding that applicable statutory or regulatory requirements have not been met.
As a corollary to the above, it is equally important to note that Health Canada reserves the right to request information or material, or define conditions not specifically described in this guidance, in order to allow the Department to adequately assess the safety, efficacy or quality of a therapeutic product. Health Canada is committed to ensuring that such requests are justifiable and that decisions are clearly documented.
There is a considerable overlap in the methodology that could be used to test chemicals and medicinal products for potential reproductive toxicity. As a first step to using this wider methodology for efficient testing, this guidance document attempts to consolidate a strategy based on study designs currently in use for testing of medicinal products; it should encourage the full assessment on the safety of chemicals on the development of the offspring. It is perceived that tests in which animals are treated during defined stages of reproduction better reflect human exposure to medicinal products and allow more specific identification of stages at risk. While this approach may be useful for most medicines, long-term exposure to low doses does occur and may be represented better by a one or two generation study approach.
The actual testing strategy should be determined by:
To employ this concept successfully, flexibility is needed (Note 1). No guidance can provide sufficient information to cover all possible cases. All persons involved should be willing to discuss and consider variations in test strategy according to the state of the art and ethical standards in human and animal experimentation.
Note 1 (1.1) Scientific Flexibility
These guidances are not mandatory rules; they are a starting point rather than an end point. They provide a basis from which an investigator can devise a strategy for testing according to available knowledge of the test material and the "state of the art." For encouragement, some alternative test designs have been mentioned in this document but there are others that can be sought out or devised. In devising a strategy, the primary objective should be to detect and bring to light any indication of toxicity to reproduction.
Fine details of study design and technical procedures have been omitted from the text. Such decisions rightly belong in the field of the investigator, since a technique that may be suitable for one laboratory may not be suitable for another. The investigator needs to utilize staff and resources to do the best he or she can achieve, and should know how to do this better than any outsider; human attributes of attitude, ability, and consistency are more important than material facilities. For necessary compliance to GLP, reference is made to such regulations.
The aim of reproduction toxicity studies is to reveal any effect of one or more active substance(s) on mammalian reproduction. For this purpose both the investigations and the interpretation of the results should be related to all other pharmacological and toxicological data available to determine whether potential reproductive risks to humans are greater, lesser, or equal to those posed by other toxicological manifestations. Further, repeated dose toxicity studies can provide important information regarding potential effects on reproduction, particularly male fertility. To extrapolate the results to humans (assess the relevance), data on likely human exposures, comparative kinetics, and mechanisms of reproductive toxicity may be helpful.
The combination of studies selected should allow exposure of mature adults and all stages of development from conception to sexual maturity. To allow detection of immediate and latent effects of exposure, observations should be continued through one complete life cycle, (i.e., from conception in one generation through conception in the following generation). For convenience of testing, this integrated sequence can be subdivided into the following stages.
For timing conventions, see Note 2.
Note 2 (1.2) Timing Conventions
In this guidance document, the convention for timing of pregnancy is to refer to the day that a sperm-positive vaginal smear and/or plug is observed as day 0 of pregnancy even if mating occurs overnight. Unless shown otherwise, it is assumed that, for rats, mice, and rabbits, implantation occurs on day 6 to 7 of pregnancy, and closure of the hard palate on day 15 to 18 of pregnancy.
Other conventions are equally acceptable but MUST be defined in reports. Also, the investigator must be consistent in different studies to assure that no gaps in treatment occur. It is an advisable precaution to provide an overlap of at least one day in the exposure period of related studies.
The accuracy of the time of mating should be specified since this will affect the variability of fetal and neonatal parameters.
Similarly, for reared litters, the day offspring are born will be considered as post-natal or lactation day 0, unless otherwise specified. However, particularly with regard to delays in, or prolongation of, parturition, reference to a postcoital time frame may be useful.
The guidance addresses the design of studies primarily for detection of effects on reproduction. When an effect is detected, further studies to characterise fully the nature of the response have to be designed on a case-by-case basis (Note 3). The rationale for the set of studies chosen should be given and should include an explanation for the choice of dosages.
Studies should be planned according to the "state of the art," and take into account pre-existing knowledge of class-related effects on reproduction. They should avoid suffering and should use the minimum number of animals necessary to achieve the overall objectives. If a preliminary study is performed, the results should be considered and discussed in the overall evaluation (Note 4).
Note 3 (1.3) First Pass and Secondary Testing
To a greater or lesser degree all first pass (guideline) tests are apical in nature (i.e., an effect on one endpoint may have several different origins). A reduced litter size at birth may be due to a reduced ovulation rate (corpora lutea count), higher rate of pre-implantation deaths, higher rate of post-implantation deaths or immediate post-natal deaths. In turn, these deaths may be the consequence of an earlier physical malformation that can no longer be observed due to subsequent secondary changes, and so on. Particularly for effects with a natural low frequency among controls, discrimination between treatment induced and coincidental occurrence is dependent upon association with other types of effects.
A toxicant usually induces more than one type of effect in a dose-dependent manner. For example, induction of malformation is almost invariably associated with increased embryonic death and an increased incidence of less severe structural changes. Given an effect on one endpoint, secondary investigations for possible associations should be considered, (i.e., the nature, scope, and origins of the substance's toxicity should be characterized). Characterization should also include identification of dose-response relationships to facilitate risk assessment; this is different from the situation in first pass tests where the presence or absence of a doseresponse assists discrimination between treatment-related and coincidental differences.
Note 4 (1.3) Preliminary Studies
At the time most reproduction studies are planned or initiated, there is usually information available from acute and repeated dose toxicity studies of at least one month's duration. This information can be expected to be sufficient in identifying doses for reproductive studies. If adequate preliminary studies are performed, they are part of the justification of the choice of dose for the main study. Such studies should be submitted regardless of their GLP status in principle. This may avoid unnecessary use of animals.
The animals used must be well defined with respect to their health, fertility, fecundity, prevalence of abnormalities, embryofetal deaths and the consistency they display from study to study. Within and between studies, animals should be of comparable age, weight, and parity at the start; the easiest way to fulfil these criteria is to use animals that are young, mature adults at the time of mating, with the females being virgin.
Studies should be conducted in mammalian species. It is generally desirable to use the same species and strain as in other toxicological studies. Reasons for using rats as the predominant rodent species are practicality, comparability with other results obtained in this species, and the large amount of background knowledge accumulated.
In embryotoxicity studies only, a second mammalian species traditionally has been required, the rabbit being the preferred choice as a "non-rodent." Reasons for using rabbits in embryotoxicity studies include the extensive background knowledge that has accumulated, as well as availability and practicality. Where the rabbit is unsuitable, an alternative non-rodent or a second rodent species may be acceptable and should be considered on a case by case basis (Note 5).
Note 5 (2.1) Selection of Species and Strains
In choosing an animal species and strain for reproductive toxicity testing, care should be given to select a relevant model. Selection of the species and strain used in other toxicology studies may avoid the need for additional preliminary studies. If it can be shown - by means of kinetic, pharmacological, and toxicological data - that the species selected is a relevant model for the human, a single species can be sufficient. There is little value in using a second species if it does not show the same similarities to humans. Advantages and disadvantages of species (strains) should be considered in relation to the substance to be tested, the selected study design, and in the subsequent interpretation of the results.
All species have their advantages. Rats, and to a lesser extent mice, are good, general purpose models; the rabbit has been somewhat neglected as a "non-rodent" species for repeated dose toxicity and other reproduction studies than embryotoxicity testing. It has attributes that would make it a useful model for fertility studies, especially male fertility. For both rabbits and dogs (which are often used as a second species for chronic toxicity studies) it is feasible to obtain semen samples without resorting to painful techniques (electro ejaculation) for longitudinal semen analysis. Most of the other species are not good, general purpose models and probably are best used for very specific investigations only.
All species have their disadvantages, for example:
Other test systems are considered to be any developing mammalian and non-mammalian cell systems, tissues, organs, or organism cultures developing independently in vitro or in vivo. Integrated with whole animal studies either for priority selection within homologous series or as secondary investigations to elucidate mechanisms of action, these systems can provide invaluable information and, indirectly, reduce the numbers of animals used in experimentation. However, they lack the complexity of the developmental processes and the dynamic interchange between the maternal and the developing organisms. These systems cannot provide assurance of the absence of effect, nor provide perspective with respect to risk or exposure. In short, there are no alternative test systems to whole animals currently available for reproduction toxicity testing with the aims set out in the introduction (Note 6).
Note 6 (2.2) Uses of Other Test Systems than Whole Animals
Other test systems have been developed and used in preliminary investigations ("pre-screening" or priority selection) and secondary testing.
For preliminary investigation of a range of analogue series of substances it is essential that the potential outcome in whole animals is known for at least one member of the series to be studied (by inference, effects are expected). With this strategy substances can be selected for higherlevel testing.
For secondary testing or further substance characterization other test systems offer the possibility to study some of the observable developmental processes in detail, (e.g., to reveal specific mechanisms of toxicity, to establish concentration-response relationships, to select "sensitive periods", or to detect effects of defined metabolites).
Selection of dosages is one of the most critical issues in design of the reproductive toxicity study. The choice of the high dose should be based on data from all available studies (pharmacology, acute and chronic toxicity, and kinetic studies) (Note 7). A repeated dose toxicity study of about 2 to 4 weeks duration provides a close approximation to the duration of treatment in segmental designs of reproductive studies. When sufficient information is not available, preliminary studies are advisable (see Note 4).
Having determined the high dosage, lower dosages should be selected in a descending sequence, the intervals depending on kinetic and other toxicity factors. While it is desirable to be able to determine a "no observed adverse effect level," priority should be given to setting dosage intervals close enough to reveal any dosage related trends that may be present (Note 8).
Note 7 (3.1) Selection of Dosages
Using similar doses in the reproductive toxicity studies as in the repeated dose toxicity studies will allow interpretation of any potential effects on fertility in context with general systemic toxicity.
Some minimal toxicity is expected to be induced in the high-dose dams.
According to the specific compound, factors limiting the high dosage determined from repeat dose toxicity studies or from preliminary reproduction studies could include:
Note 8 (3.1) Determination of Dose-Response Relationships
For many of the variables in reproduction studies, the power to discriminate between random variation and treatment effect is poor and the presence or absence of a dosage-related trend can be a critical means of determining the probability of a treatment effect. It has to be kept in mind that in these studies, dose responses may be steep, and wide intervals between doses would be inadvisable. If an analysis of dose-response relationships for the effects observed is attempted in a single study, it is recommended to use at least three dose levels and appropriate control groups. If in doubt, a fourth dose group should be added to avoid excessive dosage intervals. Such a strategy should provide a "no observed adverse effect level" for reproductive aspects. If not, the implication is that the test substance merits a greater depth of investigation and further studies.
In general the route or routes of administration should be similar to those intended for human usage. One route of substance administration may be acceptable if it can be shown that a similar distribution (kinetic profile) results from different routes (Note 9).
The usual frequency of administration is once daily but consideration should be given to use either more frequent or less frequent administration taking kinetic variables into account (Note 10).
Note 9 (3.2) Exposure by Different Routes of Administration
If it can be shown that one route provides a greater body burden, (e.g., area under the curve (AUC)), there seems little reason to investigate routes that would provide a lesser body burden or which present severe practical difficulties (e.g., inhalation). Before designing new studies for a new route of adminstration, existing data on kinetics should be used to determine the necessity of another study.
It is preferable to have some information on kinetics before initiating reproduction studies, since this may suggest the need to adjust choice of species, study design, and dosing schedules. At this time the information need not be sophisticated nor derived from pregnant or lactating animals.
At the time of study evaluation, further information on kinetics in pregnant or lactating animals may be required according to the results obtained (Note 10).
Note 10 (3.3) Kinetics in Pregnant Animals
Kinetic investigations in pregnant and lactating animals may pose some problems due to the rapid changes in physiology. It is best to consider this as a two-or three-phase approach. In planning studies, kinetic data (often from non-pregnant animals) provide information on the general suitability of the species, and can assist in deciding study designs and choice of dosage. During a study, kinetic investigations can provide assurance of accurate dosing or indicate marked deviations from expected patterns.
It is recommended that control animals be dosed with the vehicle at the same rate as test-group animals. When the vehicle may cause effects or affect the action of the test substance, a second (sham or untreated) control group should be considered.
All available pharmacological, kinetic, and toxicological data for the test compound and similar substances should be considered in deciding the most appropriate strategy and choice of study design. It is anticipated that, initially, preference will be given to designs that do not differ too radically from those of established guidances for medicinal products (the most probable option). For most medicinal products, the three-study design will usually be adequate. Other strategies, combinations of studies, and study designs could be as valid or more valid as the "most probable option" according to circumstances. The key factor is that, in total, they leave no gaps between stages and allow direct or indirect evaluation of all stages of the reproductive process (Note 11).
Designs should be justified.
Note 11 (4) Examples of Choosing Other Options
For compounds causing no lethality at 2 g/kg and no evidence of repeated dose toxicity at 1 g/kg, conduct of a single two-generation study with one control and two test groups (0.5 and 1.0 g/kg) would seem sufficient. However, it might pose the question as to whether the correct species had been chosen or whether the compound was an effective medicine.
For compounds that may be given as a single dose once-in-a-lifetime (e.g., diagnostics, medicines used in operations) it may be impossible to administer repeated dosages more than twice the human therapeutic dosage for any length of time. A reduced period of treatment allowing a higher dose would seem more appropriate. For females, considerations of human exposure suggest little or no need for exposures beyond the embryonic period.
For dopamine agonists or compounds reducing circulating prolactin levels, female rats are poor models; the rabbit would probably make a better choice for all the reproductive toxicity studies, but it does not appear to have been attempted. This also applies to other types of compound when the rabbit shows a pattern of metabolism considerably closer to humans than the rat.
For drugs where alterations in plasma kinetics are seen following repeated administration, the potential for adverse effects on embryo-fetal development may not be fully evaluated in studies according to 4.1.3. In such cases, it may be desirable to extend the period of drug adminstration to females in a 4.1.1 study to day 17. With sacrifice at term, both fertility and embryo-fetal development can be assessed.
The "most probable option" can be equated to a combination of studies for effects on:
Sperm analysis can be used as an optional procedure for confirmation or better characterisation of the effects observed (Note 12).
Note 12 (4.1.1) Premating Treatment
The design of the fertility study, especially the reduction in the premating period for males, is based on evidence accumulated and on re-appraisal of the basic research on the process of spermatogenesis. Compounds inducing selective effects on male reproduction are rare; compounds affecting spermatogenesis almost invariably affect post meiotic stages and weight of testis; mating with females is an insensitive means of detecting effects on spermatogenesis. Histopathology of the testis has been shown to be the most sensitive method for the detection of effects on spermatogenesis. Good pathological and histopathological examination (e.g. by employing Bouin's fixation, paraffin embedding, transverse section of 2-4 microns for testes, longitudinal section for epididymides, PAS and haematoxylin staining) of the male reproductive organs provides a direct means of detection. Sperm analysis (sperm counts, sperm motility, sperm morphology) can be used as an optional method to confirm findings by other methods and to characterise effects further. Sperm analysis data are considered more relevant for fertility assessment when samples from vas deferens or from cauda epididymis are used. Information on potential effects on spermatogenesis (and female reproductive organs) can be derived from repeated dose toxicity studies or reproductive toxicity studies.
For detection of effects not detectable by histopathology of male reproductive organs and sperm analysis, mating with females after a premating treatment of 4 weeks has been shown to be least as efficient as mating after a longer duration of treatment. Two weeks may be acceptable in some cases. However, when a 2 weeks treatment period is selected, more convincing justification should be provided. When the available evidence suggests that the scope of investigations in the fertility study should be increased, appropriate studies should be designed to characterise the effects further.
Note 13 (4.1.1, 4.1.2, 4.1.3) Number of Animals
There is very little scientific basis underlying specified group sizes in past and existing guidances nor in this one. The numbers specified are educated guesses governed by the maximum study size that can be managed without undue loss of overall study control. This is indicated by the fact that the more expensive the animal is to obtain or keep, the smaller the group size proposed. Ideally, at least the same group size should be required for all species, and there is a case for using larger group sizes for less frequently-used species such as primates.
It should also be made clear that the numbers required depend on whether or not the group is expected to demonstrate an effect. For a high frequency effect few animals are required. To presume the absence of an effect, the number required varies according to the variable (endpoint) being considered, its prevalence in control populations (rare or categorical events) or dispersion around the central tendency (continuous or semicontinuous variables). (See also Note 23.)
For all but the rarest events (such as malformations, abortions, total litter loss), evaluation of between 16 to 20 litters for rodents and rabbits tends to provide a degree of consistency between studies. Below 16 litters per evaluation, between-study results become inconsistent. Above 20 to 24 litters per group, consistency and precision are not greatly enhanced. These numbers relate to evaluation. If groups are subdivided for different evaluations, the number of animals starting the study should be doubled. Similarly, in studies with two breeding generations, 16 to 20 litters would be required for the final evaluation of the litters of the F1 generation. To allow for natural wastage, the starting group size of the F0 generation must be larger.
Note 14 (4.1.1) Mating
Note 15 (4.1.1) Terminal Sacrifice
Note 16 (4.1.1, 4.1.2, 4.1.3) Observations
Daily weighing of pregnant females during treatment can provide useful information. Weighing an animal more frequently than twice weekly during periods other than pregnancy (premating, mating, lactation) may also be advisable for some compounds.
For apparently non-pregnant rats or mice (but not rabbits), ammonium sulphide staining of the uterus might be useful to identify peri-implantation death of embryos.
Note 17 (4.1.2) Treatment of Offspring
Consequent to derivation from existing guidances for medicines, this guidance document does not fully cover exposures from weaning through puberty, nor does it deal with the possibility of reduced reproductive life span.
To detect adverse effects for medicinal products that may be used in infants and juveniles, special studies (case by case designs) involving direct treatment of offspring, at ages to be specified, should be considered.
Note 18 (4.1.2) Separate Embryotoxicity and Peri- Post-natal Studies
If a pre- and post-natal study is separated into two studies, one covering the embryonic period the other the fetal period, parturition, and lactation, post-natal evaluation of offspring is required in both studies.
Note 19 (4.1.2) F1-Animals
The guidance suggests selection of one male and one female per litter, on the evidence that it is feasible to conduct behavioural and other functional tests on the same F1 individuals that will be used for assessment of reproductive function. This has the advantage of allowing cross-referencing of performance in different tests at the individual level. It is recognized, however, that some laboratories prefer to select separate sets of animals for behaviour testing and for assessment of reproductive function. Whichever is the most suitable for an individual laboratory will depend upon the combination of tests used and the resources available.
Note 20 (4.1.2) Reduction of Litter Size
The value of culling or not culling for detection of effects on reproduction is still under discussion. Whether or not culling is performed, it should be explained by the investigator.
Note 21 (4.1.2) Physical Development, Sensory Functions, Reflexes, and Behaviour
The best indicator of physical development is bodyweight. Achievement of preweaning landmarks of development such as pinna unfolding, coat growth, incisor eruption, etc., is highly correlated with pup bodyweight. This weight is better related to postcoital time than post-natal time, at least when significant differences in gestation length occur. Reflexes, surface righting, auditory startle, air righting, and response to light are also dependent on physical development.
Two post-weaning landmarks of development that are advised are vaginal opening of females and cleavage of the balanopreputial gland of males. The latter is associated with increasing testosterone levels, whereas testis descent is not. These landmarks indicate the onset of sexual maturity and it is advised that bodyweight be recorded at the time of attainment to determine whether any differences from control are specific or related to general growth.
Note 22 (4.1.3) Individual Identification and Evaluation of Fetuses
It must be possible to relate all findings by different techniques (i.e., body weight, external inspection, visceral and/or skeletal examinations) to a single specimen in order to detect patterns of abnormalities. The examination of mid- and low-dose fetuses for visceral and/or skeletal abnormalities may not be necessary where the evaluation of the high dose and the control groups did not reveal any relevant differences. It is advisable, however, to store the fixed specimen for possible later examination. If fresh dissection techniques are normally used, difficulties with later comparisons involving fixed fetuses should be anticipated.
If the dosing period of the fertility study and pre- and post-natal study are combined into a single investigation, this comprises evaluation of stages A to F of the reproductive process (see Section 1.2). If such a study, if it includes fetal examinations, provided clearly negative results at sufficiently high exposure no further reproduction studies in rodents should be required. Fetal examinations for structural abnormalities can also be supplemented with an embryo-fetal development study (or studies) to make a two-study approach (Notes 3, 11).
Results from a study for effects on embryo-fetal development in a second species are expected (see also Section 4.1.3).
The simplest two segment design would consist of the fertility study and the pre- and post-natal development study, if it includes fetal examinations. It can be assumed, however, that if the preand post- natal development study provided no indication of prenatal effects at adequate margins above human exposure, the additional fetal examinations (as made in Section 4.1.3) are most unlikely to provide a major change in the assessment of risk.
Alternatively, female treatment in the fertility study (Section 4.1.1) could be continued until closure of the hard palate, and fetuses examined according to the procedures of the embryofetal development study (in Section 4.1.3). This, combined with the pre- and post-natal study (Section 4.1.2), would provide all the examinations required in "the most probable option" but would use considerably less animals (Notes 3, 11).
Results from a study for effects on embryo-fetal development in a second species are expected (see also Section 4.1.3).
Analysis of the statistics of a study is the means by which results are interpreted. The most important part of this analysis is to establish the relationship between the different variables and their distribution (descriptive statistics), since these determine how groups should be compared. The distributions of the endpoints observed in reproductive tests are usually non-normal, and extend from almost continuous to the extreme categorical.
When employing inferential statistics (determination of statistical significance) the mating pair or litter, not the fetus or neonate, should be used as the basic unit of comparison. The tests used should be justified (Note 23).
Note 23 (5) Inferential Statistics
"Significance" tests (inferential statistics) can be used only as a support for the interpretation of results. The interpretation itself must be based on biological plausibility. It is unwise to assume that a difference from control values is not biologically relevant simply because it is not "statistically significant." To a lesser extent, it can be unwise to assume that a "statistically significant" difference must be biologically relevant. Particularly for low-frequency events (e.g., embryonic death, malformations) with one-sided distributions, the statistical power of studies is low. Confidence intervals for relevant quantities can indicate the likely size of the effect. When using statistical procedures, experimental units of comparison should be considered: the litter, not the individual conceptus, the mating pair, when both sexes are treated, the mating pair of the parent generation in a two generation study.
The key to good reporting is the tabulation of individual values in a clear, concise manner to account for every animal that was entered into the study. A reader should be able to follow the history of any individual animal from initiation to termination and should be able to deduce with ease the contribution that the individual has made to any group summary values. Group summary values should be presented in a form that is biologically plausible (i.e., avoid false precision) and that reflects the distribution of the variable. Appendices or tabulations of individual values such as bodyweight, food consumption, litter values should be concise and, as far as possible, consist of absolute rather than calculated values; unnecessary duplication should be avoided.
For tabulation of low-frequency observations such as clinical signs, autopsy findings, abnormalities, etc., it is advisable to group together the (few) individuals with a positive recording. Especially in the presentation of data on structural changes (fetal abnormalities), the primary listing (tabulation) should clearly identify the litters containing abnormal fetuses, identify the affected fetus in the litter, and report all the changes observed in the affected fetuses. Secondary listings by type of change can be derived from this, if necessary.
Besides effects on the reproductive competence of adult animals, toxicity to reproduction includes: