Frequently Asked Questions
- What are molecular markers?
Any identifiable feature of the DNA sequence that can be reliably assayed to detect genetic differences between individuals or populations. Commonly used molecular marker assays are designed to detect Single Nucleotide Polymorphisms (SNPs), Simple Sequence Repeats (SSRs) or Restriction Fragment Length Polymorphisms (RFLPs).
- What are Single Nucleotide Polymorphisms (SNPs)?
Nucleotide positions along the chromosomes where the DNA sequences differ between individuals.
- What are Simple Sequence Repeats (SSRs)?
SSRs are tandem arrays of simple sequence repeats (mono, di, tri, tetra or penta- nucleotide motifs) commonly found in the DNA of eukaryotic organisms. SSRs represent a rapidly evolving (highly mutable) fraction of eukaryotic genomes. They are useful as genetic markers because differences in the length of the array (number of repeated motifs) can be easily assayed via PCR to identify genetic similarities and differences between individuals at specific loci.
- What is population structure?
A population is said to be structured if individuals do not mate at random (i.e., they deviate from Hardy Weinburg equilibrium). For example, individuals that are found in geographical proximity may be more likely to mate with one another than with individuals that are geographically distant from each other. Natural or artificial selection also contributes to non-random mating among members of a population.
- Why is the sub-population structure of rice important for breeding?
O.sativa is composed of at least five genetically identifiable sub-populations, and there is sub-population structure in O. rufipogon as well. Some sub-populations are more closely related than others and the genetic distance between sub-populations can be used to predict both the frequency of obtaining transgressive segregation among the progeny of a cross, and to predict the likelihood of encountering fertility problems in distant crosses. A majority of the world's rice is produced from inbred (pure line) varieties and historically, inbred variety development has focused almost exclusively on crosses between members of the same sub-population (indica X indica) or between related sub-populations (i.e., tropical japonica X temperate japonica). This is largely due to the prevalence of sub-population incompatibilities that lead to sterility and make it difficult to obtain a random array of fertile recombinants from indica X japonica crosses.
- What are the problems associated with hybrid rice breeding?
In contrast to inbred rice variety development, hybrid rice breeding is more complicated, costly, and technologically intensive. F1 hybrid seed that is sold to farmers is produced every year by crossing two genetically different inbred lines; the parental inbreds are selected for crossing because they give rise to highly productive, heterotic F1 hybrids. Because rice is a naturally inbreeding species, it is difficult to obtain high levels of out-crossing, due in large part to the enclosed floret morphology that helps to enforce the inbreeding habit. To encourage outcrossing, hybrid rice breeders use a male sterile lines (generally either a two-line or the three-line system), but it remains challenging to achieve reliable and economically viable levels of out-crossing during F1 hybrid seed production.
- What is meant by "transgressive variation?"
Transgressive variation refers to the production of offspring with phenotypic traits or characteristics (such as flowering time or number of seeds per plant) that exceed those of the parents. This generally results from cooperation or interaction between the genes present in the two parental types. For example, genes from one of the parent could be activators or master switches at the top of the regulatory network leading to a phenotype.
- How can transgressive variation be useful for rice breeding?
Transgressive variation is frequently observed in populations that are derived from genetically divergent parents. Divergent parents differ in allelic constitution at many loci), giving rise to individuals that are both "better" and "worse" for specific traits than either parent. Crossing genetically diverse lines provides the breeder with a larger array of genetic variation in the offspring from which to make selections, enhancing the possibility of identifying offspring that outperform the "better parent" by a substantial amount. However, because most offspring contain some traits that are "better" and others that are "worse" than their parents, breeders have to spend time selecting against the deleterious alleles as well as for the advantageous alleles in each generation. The use of molecular markers can greatly improve the efficiency of the process, and markers are often used in combination with backcrossing to capture positive transgressive variation for rice improvement.
- What is "marker-assisted backcross introgression" and what does it accomplish?
Introgression involves the movement of a gene or genes from one species into the gene pool of another. This is often accomplished by generating an interspecific F1 hybrid (such as O.rufipogon/O.sativa) and then backcrossing the F1 to one of its parents for one or more generations (BC1, BC2, etc). Using molecular markers, it is possible to identify specific regions of one parent's genome that consistently confer enhanced trait performance in the genetic background of the other parent. Thus, this approach to breeding provides a way to introduce specific genes or QTLs from a wild, ancestral line into a cultivated variety without the necessity for whole-genome compatibility.
- What are near-isogenic lines (NILs)?
A set of lines that are genetically identical, except for one or a few loci. Such lines can be created by crossing a donor line (containing a gene or trait of interest) with a recurrent parent to produce a heterozygous F1, and then repeatedly back-crossing the offspring to the recurrent parent (BC1, BC2, etc), retaining the donor gene or trait in each successive generation. Marker assisted selection (MAS) can be used to increase the efficiency of NIL development by screening individuals for the presence of the target locus (gene) in each generation and the absence of extraneous donor DNA throughout the rest of the genome to speed up the return to recurrent parent type.
- What is linkage disequilibrium (LD)?
LD refers to non-random statistical associations between unlinked alleles in individuals randomly sampled from a population. Alleles in LD are co-inherited more often than would be expected by chance alone. Such allelic associations arise because of non-random mating among individuals. When specific, unlinked alleles in the genome are known to be in LD, this indicates that the ancestries of the genes that are physically linked to those alleles (haplotypes) will also tend to be correlated.
- What is epistasis?
Epistasis is the modification of the action of one gene by one or several independently assorting genes. Epistatic genes may be linked, but their effects must reside at different loci in the genome. Phenotypes that result from epistatic interactions among genes differ from what would be expected if the loci were acting independently.
- What is association mapping?
Association mapping involves the use of molecular markers to find statistical associations between genes (or markers) and traits. This method works at a population level and relies on LD between an array of linked markers and the functional mutations responsible for trait variation. Patterns of LD in a sample reflect the effects of many historical recombination events over thousands of generations and may therefore permit fine-scale mapping of a trait. Because meiotic recombination shuffles genetic material between chromosomes and causes LD to decay with distance, markers that show high LD with a trait are likely to be physically linked to the functional mutation.
- How does association mapping differ from linkage mapping?
Although the goal of both association mapping and linkage mapping is to find associations between phenotypes and genes (or molecular markers), there are some important differences. Linkage mapping, is usually done in the context of closely related individuals having known relationships, such as the offspring of a controlled cross or the members of a family where the pedigree is known. Since the number of recombination events in these cases is relatively small, genes or quantitative traits are mapped to large chromosomal blocks, and the resolution is low (Mb scale). On the other hand, association mapping is done using distantly related individuals with unknown relationships, randomly chosen from a natural population. If the population has a long history of inbreeding, LD will decay slowly and the resolution of the association mapping study will be quite low. However, if the population is genetically diverse and naturally out-crossing, LD will decay rapidly, increasing the resolution of the association mapping study. Patterns of LD are highly variable across a genome and reflect the effects of many historical recombination events over several thousand generations. The extent of LD in the population(s) used for association mapping ultimately determines the resolution of the study.
- What are the challenges associated with association mapping in rice?
There are many statistical challenges associated with association mapping in rice. The primary caveat is that population structure can lead to false signals of association between markers and traits. This is because if individuals that share a trait are also more closely related to each other than to those that do not share the trait, then a significant number of unlinked markers will co-segregate with the trait due to common ancestry rather than to physical linkage. Secondly, it is necessary to allow for epistatic and population specific effects (i.e. where the same allele can have different phenotypic effects in different populations because of variation in genetic backgrounds) in current models since these are likely to be important in rice. Also, because partial-selfing in rice can lead to spurious inferences of population structure, it is necessary to modify existing methodologies to jointly infer population structure and population-specific selfing rates. Finally, it is necessary to properly control for genome-wide false positive rates due to multiple hypothesis testing when doing whole genome association mapping, since in a large collection of SNPs, some associations will exist simply by chance.