Am. J. Bot. Join the BSA
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when eLetters are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via ISI Web of Science (37)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Kelchner, S. A.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Kelchner, S. A.
Agricola
Right arrow Articles by Kelchner, S. A.
(American Journal of Botany. 2002;89:1651-1669.)
© 2002 Botanical Society of America, Inc.


Systematics and Phytogeography

Group II introns as phylogenetic tools: structure, function, and evolutionary constraints1

Scot A. Kelchner

Centre for Plant Biodiversity Research, Commonwealth Scientific and Industrial Research Organisation, Division of Plant Industry, Canberra, ACT 2601 Australia; School of Botany and Zoology, The Australian National University, Canberra, ACT 2601 Australia

Received for publication January 3, 2002. Accepted for publication May 3, 2002.


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 GROUP II INTRON STRUCTURE...
 MITOCHONDRIAL GROUP II INTRONS
 CHLOROPLAST INTRON LOSS IN...
 MUTATION PATTERNS IN GROUP...
 TECHNIQUES
 ALIGNMENT AND ANALYSIS
 GROUP II INTRONS: A...
 LITERATURE CITED
 
Group II introns comprise the majority of noncoding DNA in many plant chloroplast genomes and include the commonly sequenced regions trnK/matK, the rps16 intron, and the rpl16 intron. As demand increases for nucleotide characters at lower taxonomic levels, chloroplast introns may come to provide the bulk of plastome sequence data for assessment of evolutionary relationships in infrageneric, intergeneric, and interfamilial studies. Group II introns have many attractive properties for the molecular systematist: they are confined to organellar genomes in eukaryotes and the majority are single-copy; they share a well-defined and empirically tested secondary and tertiary structure; and many are easily amplified due to highly conserved sequence in flanking exons. However, structure-linked mutation patterns in group II intron sequences are more complex than generally supposed and have important implications for aligning nucleotides, assessing mutational biases in the data, and selecting appropriate models of character evolution for phylogenetic analysis. This paper presents a summary of group II intron function and structure, reviews the link between that structure and specific mutational constraints in group II intron sequences, and discusses strategies for accommodating the resulting complex mutational patterns in subsequent phylogenetic analyses.

Key Words: chloroplast noncoding DNA • group II introns • molecular evolution • phylogenetic analysis • RNA structure • rpl16 intron


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 GROUP II INTRON STRUCTURE...
 MITOCHONDRIAL GROUP II INTRONS
 CHLOROPLAST INTRON LOSS IN...
 MUTATION PATTERNS IN GROUP...
 TECHNIQUES
 ALIGNMENT AND ANALYSIS
 GROUP II INTRONS: A...
 LITERATURE CITED
 
Group II (G2) introns comprise the majority of noncoding DNA in most chloroplast genomes (Jurica and Stoddard, 1999 ). Some are among the fastest evolving regions known in the plastome (Wolfe, Li, and Sharp, 1987 ; Downie, Katz-Downie, and Cho, 1996 ; Small et al., 1998 ; Downie, Katz-Downie, and Watson, 2000 ). As molecular systematists focus more attention on relationships at lower taxonomic levels in plants, chloroplast group II introns are increasingly providing a rich source of sequence characters for infrageneric and intrafamilial phylogeny estimation (Table 1).


View this table:
[in this window]
[in a new window]
 
Table 1. Phylogenetic analyses of land plants using chloroplast group II intron sequence data. Taxon level comparisons between studies do not necessarily reflect equivalent evolutionary time scales

 
There has been much speculation among molecular systematists as to the nature of group II introns and their possible evolutionary constraints (e.g., Clegg et al., 1994 ; Downie, Katz-Downie, and Cho, 1996 ; Jordan, Courtney, and Neigel, 1996 ; Kelchner and Clark, 1997 ; Downie et al., 1998b ). For many years, introns have been considered "junk" DNA and were widely presumed to evolve under minimal selective constraints in a fashion consistent with the neutral theory of sequence evolution (sensu Kimura, 1968 , 1983 ). The legacy of this expectation persists in molecular systematics, possibly due to the absence of readily attainable information to the contrary. There have been many recent advances in our understanding of group II intron structure and function, however, and this paper aims to provide a synopsis of that material in the context of its importance to phylogenetic methodology.

One of the most exciting features of group II introns as phylogenetic tools is the uniformity of their structure and function. It should be clear in the following discussion that functional requirements induce structural constraints on group II introns and that these constraints may contribute to heterogeneous mutation patterns across G2 intron sequences. Understanding the connection between structure, function, and evolutionary constraints in a G2 intron is therefore fundamental to improving all levels of phylogenetic analysis based on G2 intron sequence data.

The information presented here is intended to assist molecular systematists in the use of G2 intron sequences for phylogeny estimation in higher plants. This review does not cover the special case of the trnL intron, the sole group I intron in the chloroplast genome; it is expected, however, that most of the methodological approaches described here for G2 intron analysis will apply to similarly structured RNA molecules, including group I introns.


    GROUP II INTRON STRUCTURE AND FUNCTION
 TOP
 ABSTRACT
 INTRODUCTION
 GROUP II INTRON STRUCTURE...
 MITOCHONDRIAL GROUP II INTRONS
 CHLOROPLAST INTRON LOSS IN...
 MUTATION PATTERNS IN GROUP...
 TECHNIQUES
 ALIGNMENT AND ANALYSIS
 GROUP II INTRONS: A...
 LITERATURE CITED
 
Introns are classified on the basis of their conserved RNA folding patterns according to the nomenclature of Michel and Dujon (1983) and Michel, Umesono, and Ozeki (1989) . Each structural class, referred to as group I, group II, and group III, is characterized by a diagnostic secondary structure configuration. Group II introns include two subclasses, IIA and IIB, each consisting of two forms (IIA1 and IIA2, IIB1 and IIB2; Michel, Umesono, and Ozeki, 1989 ).

Whereas group I introns are found in all genomes of prokaryotic and eukaryotic organisms, group II introns are restricted to plant and fungi organelles and certain prokaryotes of cyanobacterial and proteobacterial lineages. Mitochondrial genomes of plants and fungi have their own set of group II introns that appear to differ historically from those found in chloroplast genomes (Michel and Ferat, 1995 ; Toor, Hausner, and Zimmerly, 2001 ). It has been demonstrated that each organelle may have maintained its unique group II intron assembly by vertical descent since the incorporation of the organelle into the eukaryotic cell (Toor, Hausner, and Zimmerly, 2001 ), and there is no contradictory evidence as yet to reject this conclusion. The situation is markedly different in organellar group I introns, several of which seem to have repeatedly and independently invaded mitochondrial genomes (Turmel et al., 1995 ; Cho et al., 1998 ; Cho and Palmer, 1999 ; Goddard and Burt, 1999 ; Holst-Jensen et al., 1999 ; Palmer et al., 2000 ). The sole group I intron in plastomes, that of the trnL gene, may predate endosymbiosis and is not thought to derive from post-endosymbiotic invasion of the genome (Besendahl et al., 2000 ).

Function
The primary function of a group II intron, whether in a chloroplast, mitochondrion, or prokaryote genome, is to self-direct its extrication from gene transcripts prior to translation of the mRNA into a protein. This process requires two rounds of autocatalytic chemical reactions, termed "splicing reactions." Splicing refers to the capacity of the intron to break the ribonucleic acid chain of the pre-mRNA transcript at the exon boundaries and reconnect the strand after the intron's removal. Splicing completely excises the intron from the disrupted gene transcript, allowing the pre-mRNA to continue its maturation pathway and subsequent translation. Failure to properly or efficiently remove the intron from the transcript prevents further transcript processing and translation; the protein is not synthesized, and presumably both the organism and the intron are strongly selected against.

These reactions define the primary functional phase of an intron and occur while the intron and host gene are a pre-mRNA transcript. Thus, in terms of evolutionary constraints, changes in the intron DNA sequence should be considered in terms of its RNA counterpart. This detail has significant implications when using G2 introns for comparative sequence analysis and phylogeny construction.

The two stages of intron-directed cis-splicing reactions are as follows (Jacquier, 1996 ; Podar, Perlman, and Padgett, 1998 ; Holländer and Kück, 1999 ; Jurica and Stoddard, 1999 ; Costa, Michel, and Westhof, 2000 ). The first stage consists of the folding of a pre-mRNA intron transcript into its secondary and tertiary structural formation, called a "ribozyme" (a term designating a catalytic RNA). This folding brings a single adenine in domain VI (Fig. 1) against the now neighboring intron/exon boundaries. The proximity of the adenine to the G1 nucleotide (the first nucleotide of the 5' end of the intron) triggers a nucleophilic attack, and a transesterification reaction cleaves the ribonucleic acid at the 5' intron-exon boundaries to form a structure known as a "lariat" (Schmelzer and Müller, 1987 ; Jacquier and Jacquesson-Breuleux, 1991 ). The second stage of splicing involves another transesterification reaction involving the 3' intron/exon boundary, rejoining the exon pre-mRNA fragments together and releasing the intron, still in lariat form.



View larger version (23K):
[in this window]
[in a new window]
 
Fig. 1. Stylized structure of group IIB introns—the most frequent class of G2 introns in the plastome—showing secondary structure and tertiary interactions based on Michel, Umesono, and Ozeki (1989) , Jacquier (1996) , and Toor, Hausner, and Zimmerly (2001) . Nomenclature of structural features follows Michel, Umesono, and Ozeki (1989) . Core structural elements consistent across all group II introns are drawn here in bold; light gray outlines indicate structures that may vary in length or terminal loop configurations. Roman numerals I–VI mark the six domains of group II introns. Lowercase letters (a, b, c, and d) mark subdomain helices of domain I. Lower case roman numerals (i–iv) indicate stem segments within each subdomain. IBS1, IBS2, and IBS3 are tertiary interaction sites in the 5' and 3' exons that correspond to intron domain I elements EBS1, EBS2, and EBS3, respectively. Other known tertiary interactions are indicated by Greek symbols next to the interactive sites. Conserved nucleotides are indicated as a single character state (e.g., A, G, or length-restricted sites N) or as purines (R) and pyrimidines (Y). The oversized "A" in domain VI (also marked with an asterisk) is the initiation nucleotide for group II intron splicing reactions. Nucleotide positions represented as dots signify partial 5' and 3' exon sequences. Group IIA introns differ slightly in structure in the domain I d helix—EBS2 is the terminal loop of a small helix just upstream of d2, and there is an additional small helix at the EBS3 region (see Toor, Hausner, and Zimmerly, 2001 ). The rpl16 intron is missing subdomains a and b in domain I and subsequently the {alpha} tertiary interaction with the d3 stem bulge

 
Although autocatalytic splicing occurs in vitro under conditions unusual to the cell (e.g., Peebles et al., 1986 ; Schmelzer and Schweyen, 1986 ), efficient in vivo splicing almost always requires the assistance of a catalytic enzyme (Michel and Ferat, 1995 ; Jacquier, 1996 ; Tanner, 1999 ). This enzyme, a reverse-transcriptase-like protein referred to as a "maturase," joins the pre-mRNA transcript prior to the splicing reactions to form a ribonucleoprotein complex (RNP complex; see Zimmerly et al., 1995a ; Cousineau et al., 2000 ). The RNP complex consists of at least the ribozyme form of the intron and a maturase. In many prokaryotic G2 introns, the maturase of the RNP complex is intron-specific and is coded for by the intron itself. When present, the maturase open reading frame (ORF) in both prokaryotes and eukaryotes resides in domain IV (Mohr, Perlman, and Lambowitz, 1993 ), which may also serve as a binding site for the maturase in the RNP complex (Wank et al., 1999 ). This gives rise to the curious circumstance in the trnK sequence region of a gene (matK) within an intron (trnK intron) within another gene (trnK).

Toor, Hausner, and Zimmerly (2001) surveyed 39 maturase ORFs from prokaryotic and organellar G2 introns and identified six phylogenetic lineages that correspond closely with individual G2 intron structural categories. For example, all maturases from class IIA1 mitochondrial introns belong to a single maturase lineage. Furthermore, their survey of 142 known group II introns revealed that most prokaryote G2 introns have ORFs, about half of lower eukaryote G2 introns maintain ORFs, but only two of the 42 higher eukaryote organellar G2 introns still have ORFs coding for a functional maturase (matK in the chloroplast, matR in the mitochondrion). Their data suggest a model of G2 intron evolution in which nearly all chloroplast and mitochondrial G2 introns have lost their functional maturases, and those G2 introns still maintaining them have each coevolved with their maturase since ancient times. These findings concur with those of an earlier study by Mohr, Perlman, and Lambowitz (1993) .

Chloroplast introns have managed to persist despite the loss of their unique maturases. They still seem to require an RNP complex for efficient splicing in vivo (e.g., Holländer and Kück, 1999 ), but it is thought that the maturase matK can successfully form RNP complexes with any of the G2 introns in the chloroplast (Mohr, Perlman, and Lambowitz, 1993 ; Ems et al., 1995 ; Vogel, Börner, and Hess, 1999 ). This arrangement may have freed the remaining G2 introns in the plastome from having to maintain their own ORFs, a situation that could in part be responsible for the high levels of sequence variation noted in many chloroplast intron domain IVs (e.g., Learn et al., 1992 ; Downie et al., 1998b ; Downie, Katz-Downie, and Watson, 2000 ). Not surprisingly, there is evidence that many chloroplast G2 introns contain degenerate maturase ORFs in their domain IV helix (Toor, Hausner, and Zimmerly, 2001 ).

Maturases are also reverse transcriptases and have several reverse transcriptase (RT) domains in their coding sequence. This enables some G2 introns to move about in their host genomes (Lambowitz and Belfort, 1993 ; Mueller et al., 1993 ; Zimmerly et al., 1995b ; Eickbush, 1999 ; Jurica and Stoddard, 1999 ). Mobile G2 introns are mostly known from prokaryotes and yeast, and several examples of G2 intron transpositioning have been demonstrated within the yeast mitochondrial genome (Mueller et al., 1993 ; Lazowska, Meunier, and Macadre, 1994 ; Moran et al., 1995 ; Yang et al., 1998 ; Sellem, Begel, and Sainsard-Chanet, 2000 ). Besides having complete RT domains, the typical maturase of a mobile group II intron contains a "zinc finger" that aids in targeting by sequence recognition (a process known as "homing"; Jurica and Stoddard, 1999 ; Mohr et al., 2000 ). It is this feature of some G2 introns that has made them candidates for medical applications, such as gene therapy in humans (Tanner, 1999 ; Guo et al., 2000 ; Mohr et al., 2000 ).

Mobility in chloroplast group II introns has not been detected. The lack of a zinc finger and complete reverse transcriptase domains in matK (Mohr, Perlman, and Lambowitz, 1993 ; Young and dePamphilis, 2000 ) is consistent with the expectation that matK RNP complexes do not possess homing capabilities.

A recent discovery has suggested that splicing of at least some chloroplast G2 introns in maize may also involve two nuclear-coded gene products, CRS1 and CRS2 (Jenkins, Kulhanek, and Barkan, 1997 ; Vogel, Börner, and Hess, 1999 ; Till et al., 2001 ). It is suspected that CRS1 is a required cofactor in atpF splicing reactions and that CRS2 may be a cofactor in group IIB intron RNPs, the most frequent class of introns in the plastome.

There is sound in vivo and in vitro experimental evidence that group II intron function is largely consistent in both the mitochondrion and the chloroplast (Herdenberger, Holländer, and Kuck, 1994 ; Holländer and Kück, 1998 , 1999 ). One in vivo system developed by Herdenberger, Holländer, and Kuck (1994) for point mutation studies of group II introns introduces the mitochondrial rI1 group II intron from the green algae Scenedesmus obliquus into the chloroplast gene tscA of Chlamydomonas reinhardtii. The inserted intron efficiently completes its splicing reactions in the chloroplast, enabling proper translation of the tscA protein. Such studies highlight the probable universality of mechanisms involved in G2 intron splicing reactions.

Function is inextricably linked to structure in G2 introns. We can therefore infer that site-specific mutations that terminate function in one G2 intron will likely have the same effect in other G2 introns if such mutations occur in homologous structural positions. As discussed in the final section of this paper, this concept may have important applications when using G2 introns for molecular phylogenetics.

Structure
Jacquier (1996) estimated that a G2 intron would need no less than 600 nucleotides to maintain all structural features involved in proper splicing. The standard group II intron folding model was created by identifying conserved secondary structures of folded RNA intron sequences among a wide variety of organisms and genomes (Michel and Dujon, 1983 ; Michel, Umesono, and Ozeki, 1989 ; Michel and Ferat, 1995 ). The Michel, Umesono, and Ozeki (1989) model still remains the best estimate of G2 intron structure and has largely been validated by in vitro and in vivo experimental investigations, including point mutation studies (Peebles et al., 1995 ; Abramovitz, Friedman, and Pyle, 1996 ; Holländer and Kück, 1999 ), chemical footprinting (Konforti, Liu, and Pyle, 1998 ; Costa, Michel, and Westhof, 2000 ), and NAIM analysis (Boudvillain and Pyle, 1998 ; Boudvillain, de Lencastre, and Pyle, 2000 ). Updated detailed models of the four group II intron structural categories can be found in Toor, Hausner, and Zimmerly (2001) .

Although group II introns from organisms as diverse as cyanobacteria, Euglena, higher plants, and fungi share little in the way of nucleotide sequence similarity, sequences from each organism can be folded into the same core secondary structure. The Michel, Umesono, and Ozeki (1989) model has six main domains, multiple subdomains, and a nomenclatural system (Fig. 1). The general structure consists of six major structural helices that radiate from a "central wheel" of single-stranded RNA segments. Domain I (D1; Table 2) is the most complex, with many structurally important subhelices, and it typically comprises more than half of the total intron sequence. This domain interacts strongly at the tertiary level with domains V and VI (Boudvillain and Pyle, 1998 ; Konforti, Liu, and Pyle, 1998 ) and with external binding sites in the flanking exons. Domains II (D2) and III (D3) are considerably shorter and vary in length in plants (Learn et al., 1992 ). These domains seem to contribute relatively little to tertiary structure and splicing efficiency (Kwakman et al., 1989 ; Koch et al., 1992 ; Konforti, Liu, and Pyle, 1998 ). Domain IV (D4) can be quite large in chloroplast G2 introns, and in the trnK intron it is the site of the maturase ORF, matK. Domain V (D5) is the most highly restricted in terms of length and sequence variation (Michel, Umesono, and Ozeki, 1989 ; Learn et al., 1992 ) and is almost always 34 nucleotides in length in plants. Domain V is not known to possess any binding sites with the mRNA substrate, and its high degree of conservation is most likely due to its fundamental role in ribozyme folding (Peebles et al., 1995 ; Jacquier, 1996 ; Konforti, Liu, and Pyle, 1998 ). Domain VI (D6) has tertiary interactions with domains I and V (Koch et al., 1992 ; Dib-Hajj et al., 1993 ; Podar and Perlman, 1999 ) and may vary in length, usually in its terminal loop.


View this table:
[in this window]
[in a new window]
 
Table 2. Function and interactions of the six group II intron domains

 
Tertiary interactive sites include both stem and nonpairing nucleotides that interact internally with nucleotides in other regions of the ribozyme or externally with the flanking exons (the external and internal binding sites, EBS and IBS, respectively). Helix D1d (Domain I, subdomain d) contains the three known external binding sites, EBS1, EBS2, and EBS3. These segments of the ribozyme interact directly with exon internal binding site sequences IBS1, IBS2, and IBS3, respectively (Michel, Umesono, and Ozeki, 1989 ; Costa, Michel, and Westhof, 2000 ).

There are several intraribozymic interactive sites in a group II intron sequence that assist in giving the mature ribozyme its functional tertiary structure. Those proposed or identified by experimental evidence are indicated by Greek symbols in Fig. 1 (see Table 3 for details). Some of these interactions are essential for splicing, an example being {gamma}{gamma}'. Holländer and Kück (1999) were able to demonstrate in vivo that splicing of an intron in the chloroplast depends on the ability of {gamma}{gamma}' to form a Watson-Crick (canonical) pairing. The {zeta}{zeta}' interaction conserves the sequence identity of the terminal helix in domain V and the bulge in D1d1 (Peebles et al., 1995 ). Other interactions, such as {eta}{eta}', may not be as essential, for the absence of this interaction does not significantly diminish splicing efficiency in those introns investigated (Koch et al., 1992 ).


View this table:
[in this window]
[in a new window]
 
Table 3. Tertiary interaction sites in group II introns determined by experimental investigations

 
In terms of functional importance of secondary structures, domains I and V are by far the most crucial for catalytic requirements of the ribozyme, followed by domain VI (Dib-Hajj et al., 1993 ; Pyle, 1996 ; Boudvillain and Pyle, 1998 ; Konforti, Liu, and Pyle, 1998 ; Costa and Michel, 1999 ). Domains II, III, and IV are of lesser importance (Koch et al., 1992 ), although each probably participates either in a particular step of splicing or in enhancing the overall catalytic reaction (Costa et al., 1997 ; Boudvillain and Pyle, 1998 ). All domain boundaries (the primary stem of domain helices) are highly conserved in G2 introns (Michel, Umesono, and Ozeki, 1989 ; Saldanha et al., 1993 ).


    MITOCHONDRIAL GROUP II INTRONS
 TOP
 ABSTRACT
 INTRODUCTION
 GROUP II INTRON STRUCTURE...
 MITOCHONDRIAL GROUP II INTRONS
 CHLOROPLAST INTRON LOSS IN...
 MUTATION PATTERNS IN GROUP...
 TECHNIQUES
 ALIGNMENT AND ANALYSIS
 GROUP II INTRONS: A...
 LITERATURE CITED
 
There are nearly an equivalent number of group II introns in plant mitochondrial genomes (mtDNA) as in plant chloroplast genomes. Like plastome introns, mitochondrial G2 introns are thought to be derived by vertical descent (Palmer et al., 2000 ). Only a few studies to date have assessed the phylogenetic utility of mitochondrial group II introns in plants. The rps3 intron has shown minor variation at the interspecific level in Betula (Laroche and Bousquet, 1999 ), and the nad1b/c intron has recently been surveyed for phylogenetic characters in higher plants (Demesure, Sodzi, and Petit, 1995 ; Bakker et al., 2000 ; Freudenstein and Chase, 2001 ).

Two factors of mtDNA evolution may be partly responsible for the apparent difficulty in developing mitochondrial group II introns as phylogenetic tools. The first factor is the slow rate of synonymous substitution in mitochondrial DNA, which is estimated to be almost five times slower than that of chloroplast DNA (Wolfe, Li, and Sharp, 1987 ; Schuster and Brennicke, 1994 ). The second factor is the frequency of recombination in plant mtDNA that can sometimes relocate exon and intron elements of a disrupted gene to separate regions of the genome. The result can be a fragmented intron, of which one or more domains are scattered through the genome (Chapdelaine and Bonen, 1991 ; Wissinger, Schuster, and Brennicke, 1991 ; Knoop and Brennicke, 1993 ; Malek and Knoop, 1998 ; Sainsard-Chanet, Begel, and d'Aubenton-Carafa, 1998 ). One such recombination event gave rise to a "tripartite" intron in Oenothera berteriana (Knoop, Altwasser, and Brennicke, 1997 ), in which three separate fragments of a group II intron must now be brought together as post-transcriptional pre-mRNAs to form a functional splicing ribozyme. The reaction is referred to as trans-splicing (see Bonen, 1993 ; Knoop and Brennicke, 1993 ; Doetsch et al., 2001 ) and is thought to be the general mechanism for the splicing of all fragmented G2 introns in the mitochondrial genome. A targeted mitochondrial intron that is fragmented could make polymerase chain reaction (PCR) amplification difficult or impractical.


    CHLOROPLAST INTRON LOSS IN ANGIOSPERMS
 TOP
 ABSTRACT
 INTRODUCTION
 GROUP II INTRON STRUCTURE...
 MITOCHONDRIAL GROUP II INTRONS
 CHLOROPLAST INTRON LOSS IN...
 MUTATION PATTERNS IN GROUP...
 TECHNIQUES
 ALIGNMENT AND ANALYSIS
 GROUP II INTRONS: A...
 LITERATURE CITED
 
A few group II introns have been lost in certain chloroplast genomes. Independent cases of intron loss with host gene persistence are rare in higher plants (Downie et al., 1991 ; Palmer and Delwiche, 1998 ), although there are documented cases of multiple independent intron losses in closely related taxa (e.g., Bauhinia, Lai et al., 1997 ; Medicago, Downie et al., 1998a ). Intron absence from a plastome can merely reflect the loss of the host gene and not a specific intron loss event (Doyle, Doyle, and Palmer, 1995 ). In cases when the host gene is still present but the intron is not, the loss of the intron is presumed to have occurred by recombination between a post-splicing cDNA and the original disrupted gene sequence (Palmer and Delwiche, 1998 ).

Several higher plant lineages have been extensively surveyed for the presence of chloroplast group II introns. The rpl2 intron has the most widespread reported losses, with absences in members of at least 17 angiosperm families: Aizoaceae, Amaranthaceae, Basellaceae, Cactaceae, Caryophyllaceae, Chenopodiaceae, Convolvulaceae, Cuscutaceae, Didiereaceae, Droseraceae, Fabaceae, Geraniaceae, Menyanthaceae, Nyctaginaceae, Phytolaccaceae, Portulacaceae, and Saxifragaceae (Downie et al., 1991 ; Doyle, Doyle, and Palmer, 1995 ; Lai et al., 1997 ). The rpl16 intron is absent in certain members of the Geraniaceae, Goodeniaceae, and Plumbaginaceae (Downie and Palmer, 1994 ; Campagna and Downie, 1998 ). The rpoC1 intron is missing in members of the Aizoaceae, Cactaceae, Fabaceae, Goodeniaceae, Passifloriaceae, and the Poaceae (Downie, Llanas, and Katz-Downie, 1996 ; Katayama and Ogihara, 1996 ; Downie et al., 1998a ). The rps12 intron has been lost in at least three members of Anemone (Hoot and Palmer, 1994 ). The absence of the rps16 intron in Epifagus (Wolfe, Morden, and Palmer, 1992 ) and several genera in Fabaceae (Downie and Palmer, 1992 ; Doyle, Doyle, and Palmer, 1995 ) is due to loss of the rps16 gene itself in these plastomes.


    MUTATION PATTERNS IN GROUP II INTRONS
 TOP
 ABSTRACT
 INTRODUCTION
 GROUP II INTRON STRUCTURE...
 MITOCHONDRIAL GROUP II INTRONS
 CHLOROPLAST INTRON LOSS IN...
 MUTATION PATTERNS IN GROUP...
 TECHNIQUES
 ALIGNMENT AND ANALYSIS
 GROUP II INTRONS: A...
 LITERATURE CITED
 
Molecular function and structural requirements are widely expected to be a source of complex mutation patterns in nucleic acids (e.g., Learn et al., 1992 ; Downie, Katz-Downie, and Cho, 1996 ; Hickson et al., 1996 ; Kellogg and Juliano, 1997 ; Ballard et al., 1998 ; Soltis and Soltis, 1998 ; Hershkovitz, Zimmer, and Hahn, 1999 ). All biological molecules have physical structure, and selection preserving that structure for functional purposes may translate to heterogeneous mutation patterns in its underlying DNA sequence. As such, it is expected there is a direct link between molecular function, molecular structure, and mutational dynamics in nucleic acids.

For example, conservation of amino acids in a protein can lead to codon-specific substitution patterns in protein-coding sequences due to the flexibility of the genetic code (e.g., Reeves, 1992 ; Olmstead, Reeves, and Yen, 1998 ; Berg, 1999 ; McClellan, 2000 ). Conservation of the active site in the RuBisCo protein complex restricts the number of mutable sites in the rbcL sequence of plant chloroplast genomes (Kellogg and Juliano, 1997 ). Conformational requirements of ribosomal RNA can influence the degree, distribution, and nature of nucleotide change in rDNA (e.g., Hickson et al., 1996 ; Soltis and Soltis, 1998 ; Hershkovitz, Zimmer, and Hahn, 1999 ). Finally, group II intron function in higher plants may create mutation rate variation among ribozyme structures experiencing differing functional constraints (Learn et al., 1992 ; Clegg et al., 1994 ; Downie et al., 1998b ).

Heterogeneous mutation patterns in the rpl16 intron in Myoporaceae
As part of an ongoing phylogenetic analysis of Myoporaceae (a probable lineage of Scrophulariaceae sensu Olmstead et al. [2001] ), the chloroplast rpl16 intron was sequenced by the author for 46 taxa representing nearly 30 morphologically defined lineages. The rpl16 intron is among the fastest evolving sequence regions in the plastome (Wolfe, Li, and Sharp, 1987 ; Small et al., 1998 ; Downie, Katz-Downie, and Watson, 2000 ) and is often used for inter- and infrageneric phylogeny estimation in plants (Table 1). Mutation patterns were assessed across the entire sequence as well as on a partition by partition basis. The results were presented at a recent conference (Kelchner et al., 2000 ) and will be summarized here to illustrate the heterogeneous manner of mutation accumulation in a chloroplast group II intron.

The mutation data was compiled by a direct tally of mutations across a matrix of aligned sequences. In general, a tally of mutations is not an optimal approach to mutation pattern assessment for at least three reasons: first, a tally cannot accommodate superimposed mutation events; second, it does not take into account any influence that historical relationship may have on the distribution of character state variation; and third, it does not adequately test the possibility that observed heterogeneity in mutation patterns may still be the result of a largely stochastic mutation process under a uniform model. However, in this study sequence variation is very low, what variation exists is largely autapomorphic, and there is no available independently derived phylogeny for the taxa. A statistical test of the difference between observed and expected mutation patterns is not readily applicable, for without a model of mutation and a phylogeny to map change upon, it is difficult to determine expected values for the manner and distribution of mutations in these sequences.

The probable recency of the family's origin and the abundance of autapomorphic change in the rpl16 intron sequences provide an interesting opportunity for estimating mutational "tendencies" in a group II intron. The majority of observed rpl16 intron mutations across Myoporaceae are autapomorphic (110 substitutions), and potentially informative character state transformations are relatively few in number (38 substitutions). Tallying mutations across such a sequence matrix should minimize the potential influence of hierarchical structure on observed mutation patterns. From the very low p distance values between sequences (0.00% to 1.73%) we would expect that superimposed substitutions are limited in number (but certainly not impossible; see Kelchner and Clark, 1997 ).

Sequence alignment followed the criterion of Kelchner (2000) , which integrates structural and mutation class arguments for character homology with the conventional sequence similarity approach. Secondary structures for each sequence in its RNA form were determined using the domain-by-domain folding method (see below, Techniques: Inferring G2 intron secondary structures). Three data partitions were considered: partitioning by each of six G2 intron domains (D1–D6); partitioning by four structural categories of stem, loop, bulge, and single-strand interhelix sequence (in the manner of Vawter and Brown, 1993 ); and a partition consisting of the entire intron sequence.

All rpl16 intron nucleotide characters in Myoporaceae were readily assigned to domain partitions because of the distinctiveness of domain boundary sequences (Michel, Umesono, and Ozeki, 1989 ). The classification of nucleotides into four structural categories was more difficult. Multiple minimum free energy foldings exist for terminal loops in helices D3 and D4. In these cases, nucleotides not decisively placed in a structural category were classified as "ambiguous" and removed from the analysis. Difficulties notwithstanding, 822 of the 953 nucleotides in the aligned matrix (86.25%) could be assigned to a structural class.

Rate heterogeneity ({alpha} parameter) for the unpartitioned data set was estimated under the HKY85 + {Gamma} likelihood model. The tree and parameter estimation analysis took nearly three weeks on a G3 computer using PAUP*4 beta 4 (Swofford, 1998 ). Substitution class frequencies, base composition, and distribution of variable and potentially informative characters were calculated for all partitions.

If selective constraints are consistent across all nucleotide sites in a G2 intron sequence, then each subset (partition) of an intron sequence should reflect the mutation pattern of the entire sequence as a whole. Any strong deviation of mutation patterns among partitions or between a partition and the entire sequence should indicate the presence of heterogeneous mutation processes in the data. Furthermore, if the group II intron sequences were under minimal selective constraints and evolving in a neutral fashion, we would expect to find nearly equal base composition (i.e., 25% frequency of each nucleotide), a more or less equal distribution of substitutions among sites, and about twice as many transversions as transitions. Each character partition, if sufficiently large in sample size, would also be expected to show these patterns of mutation.

The results of the mutation pattern assessment for the rpl16 intron in Myoporaceae are presented in Fig. 2. Overall, there does not seem to be a consistency in mutation pattern between all partitions, suggesting that an heterogeneous mutation process underlies this data. Several of the following points are particularly interesting in the context of phylogenetic analysis.



View larger version (67K):
[in this window]
[in a new window]
 
Fig. 2. Mutation patterns in character partitions of 46 rpl16 intron sequences in Myoporaceae. Partitions were created for all domains (domain I–VI; D1–D6) and four structural categories (S, stems; L, loops; B, bulges; I, interhelical sequence). All nucleotides were allocated to one of six domain categories; 822 of 953 nucleotides (86%) could be unambiguously assigned to stem, loop, bulge, or interhelical sequence partitions. (A) Relative number of variable and potentially informative character changes by domain; domain V (D5) showed no observable mutations. (B) Relative number of variable and potentially informative character state changes by structural category; "All" indicates the value when assessed across the entire sequence matrix. (C) Frequency of substitution classes across the entire sequence matrix. (D) Transition bias in each partition, expressed as the transition : transversion (Ti : Tv) ratio. (E) Base composition by structural category. (F) Base composition by domain

 
In the Myoporaceae rpl16 intron matrix, the percentage of potentially informative characters for phylogenetic analysis seems evenly distributed between domain and structural classes (Fig. 2A, B). This is despite partition differences in the number of variable characters, which are particularly high in both domain II and the bulge nucleotides. Some authors (e.g., Dixon and Hillis, 1993 ; Miyamoto et al., 1994 ) have considered an a priori weighting scheme for characters occurring in stem structures with the expectation that stem nucleotides have lower rates of mutation due to their role in secondary structure formation. However, there is a nearly equal distribution of informative characters between all partitions of the Myoporaceae data, even though the number of variable characters in the stem partition is lowest among the structural categories. This suggests that at the level of divergence in Myoporaceae rpl16 intron sequences, no particular domain or structural category would be expected to contribute unequally to a phylogenetic analysis.

There is also evidence of variation in substitution types by partition, as well as unequal frequency of substitution classes across the entire sequence. The most common substitution class in the matrix is A/G, which is ten times more frequent than C/G substitutions (Fig. 2C). Transitions are more frequent than transversions when averaged across the matrix, but there is variation in the degree of the transition : transversion ratio between partitions. The stem category of structural partitions shows a very high transition rate, nearly four times higher than the transition rate across the entire matrix (Fig. 2D). This is perhaps the most dramatic example in the study of a structural partition deviating from an averaged sequence value for a mutation category.

In terms of base composition, loops, bulges, and interhelical sequences are particularly rich in A, loops having almost twice the A content of stems (Fig. 2E). In the domain partitions, domains I–IV and domain VI are all high in A/T content, although domain IV has nearly twice the frequency of T as domain V (Fig. 2F). Base composition in domain V approaches equivalency for all nucleotide states, perhaps due to strong functional constraints resulting in a relative increase in G/C content.

Although this was not a statistical test of mutation dynamics, the variation of mutation patterns between subsets of a group II intron sequence is consistent with the findings of other researchers (e.g., Learn et al., 1992 ; Downie, Katz-Downie, and Cho, 1996 ; Downie, Katz-Downie, and Watson, 2000 ) and suggests that heterogeneous modes of mutation may be a general feature of group II introns. The presence of heterogeneous processes in G2 intron sequence data has important implications for their use in phylogenetic analysis (see below, Alignment and analysis).

Parameter values in likelihood analysis try to account for such mutational biases, but are usually estimated in a likelihood framework using the entire sequence. Figure 2, however, illustrates that parameter values derived from the entire sequence may differ from those estimates derived within specific partitions. For example, a transition : transversion ratio for the entire rpl16 intron data in Myoporaceae is 1.37 : 1, but for nucleotides in the stem category this ratio is more than 5 : 1. Applying the total sequence average of 1.37 to an analysis of Myoporaceae rpl16 intron sequences would be treating 57% of the categorized nucleotides (those in RNA stem positions) under an improper value for transition rate.

Constraints on sequence evolution
Site-specific limitations on character state transformation
Several sites in G2 intron sequences experience a restriction in potential character states due to specific functional requirements of a nucleotide or secondary structure. Many of these sites have been tested experimentally by point-mutation studies to assess their influence on splicing reaction efficiency. Figure 1 indicates the many sites that are thought to be restricted solely to purines (R) or pyrimidines (Y). Most of these nucleotides are involved in tertiary interactions with other regions of the ribozyme. Other nucleotides are highly conserved in all group II introns and are examples of the "invariable" character in phylogenetic analysis (see Lockhart et al., 1996 ; Steel, Huson, and Lockhart, 2000 ). Two specific cases include the A in the D6 bulge (marked in bold with an asterisk, Fig. 1) and the primary 5' intron nucleotide G (also referred to as the "G1" nucleotide; Holländer and Kück, 1999 ). Both nucleotides are involved in the transesterification reactions that cleave the pre-mRNA substrate, and any change in character state for either site will prevent splicing.

Mikheeva et al. (2000) recently investigated the conservation of the sequential GA nucleotide couplet just upstream of domain III (Fig. 1). The GA couplet is present at this structural location in almost all group II introns, which suggests these nucleotides possess a functional importance. Mikheeva et al. (2000) found that the dinucleotide contributes to the second step in intron splicing reactions and occupies an important spatial position in the ribozyme tertiary structure. Alteration or deletion of the GA led to highly reduced splicing efficiency.

The {gamma}{gamma}' interaction must pair canonically (G-C, A-U) or significant reduction in splicing efficiency results (Holländer and Kück, 1999 ). We may therefore infer that any change in one {gamma} nucleotide must be accompanied by a simultaneous and compatible mutation in its {gamma} partner to retain Watson-Crick pairing of these two sites. Site mutation studies of the D5 loop sequence, GAAA, greatly reduced splicing efficiency (Chanfreau and Jacquier, 1994 ), probably by limiting its {zeta}{zeta}' tertiary interaction with domain I (Costa and Michel, 1995 ; Jacquier, 1996 ).

Group II intron loops consistently demonstrate very high A and T content in comparison to the relatively G/C rich stem nucleotides, a feature reminiscent of loop sequence in group I introns and mitochondrial and nuclear RNAs (e.g., Ballard et al., 1998 ; Gutell et al., 2000 ). Additional character state restrictions that may be present in RNA stems are discussed below (High transition rates in stems).

G-U "wobble" base pairing
The G-U, or "wobble," base pair is a non-Watson-Crick association that is fundamental to nearly every class of RNA, including introns (Varani and McClain, 2000 ). Group II intron folding models invoke a great number of G-U pairings, emphasizing why it is essential to fold intron DNA sequences as RNA transcripts. In their review of G-U pairing and its importance in biological systems, Varani and McClain (2000) list several properties of the pairing that may be invaluable for catalytic RNA. Among these properties are the conformational flexibility that permits "sharp turns" in RNA structures, the positioning of metal ions at active sites, increased electronegativity in the major groove of paired nucleic acid strands to create a recognition site by induced fit or chemical identity, and provision of a thermodynamically viable alternative to canonical base pairing. In group II introns, some G-U pairing can be highly conserved, two examples being the functionally important G-U pair in domain V (Peebles et al., 1995 ; Abramovitz, Friedman, and Pyle, 1996 ; Konforti, Liu, and Pyle, 1998 ; see Fig. 1) and the G-U pairs often surrounding the branch site (the A bulge) in domain VI (Chu et al., 1998 ).

Though less energetically stable, C-A pairing may also be prevalent in group II intron structures. Varani and McClain (2000) suggest that C-A bonding can provide many of the structural features of the G-U pairing. More importantly, perhaps, for mutation dynamics in G2 introns, C-A pairs may be of minimal hindrance in the formation of key stem structures and therefore may not be strongly selected against in certain structural positions.

High transition rates in stems
Stem nucleotide substitutions may be more likely to persist in group II intron sequences if they are transitions. This can be understood in terms of selective constraints conserving stem structure. If stem formation in a ribozyme must be maintained for a functional reason, substitutions occurring in stem nucleotides should only persist if they do not significantly reduce the likelihood of proper stem formation. In nuclear RNA, it has been reasoned that mutation leading to nonpairing nucleotides within a stem must eventually result in compensatory mutation to maintain stem structure (Wheeler and Honeycutt, 1988 ; Dixon and Hillis, 1993 ; Muse, 1995 ; Springer, Hollar, and Burk, 1995 ; Hickson et al., 1996 ). It is proposed that such compensatory mutations happen in a step-wise process over time (Rousset, Pelandakis, and Solignoc, 1991 ; Kraus et al., 1992 ; Gatesy et al., 1994 ), which requires the mispairing to persist until a compensatory mutation can take place.

Consider a case, however, in which an RNA structure may be so highly constrained that any mispairing would result in an altered structure and loss of function. For example, the domain V helix of group II introns is nearly invariable in its stem and loop lengths and is an essential pillar in the tertiary folding of the intron ribozyme (Fig. 3). Point mutation studies of this domain reveal a link between precise structure formation and splicing efficiency of the intron (Peebles et al., 1995 ; Höllander and Kück, 1999 ). In such a structure, selection may only tolerate those mutations that maintain immediate pairing after the substitution—in other words, it would not be possible in such cases to achieve compensatory mutation by a stepwise process.



View larger version (19K):
[in this window]
[in a new window]
 
Fig. 3. The unequal effect of transversion and transition substitutions on the energy of structure formation in a highly constrained group II intron helix, D5ii. (A) A transversion of C to G (circled nucleotides) requires a compensatory change in a second nucleotide site to restore proper structural configuration. Compensatory mutation generally requires the helix to persist as an altered, suboptimal structure until pairing can be restored with a second substitution event. In this case, there is a tenfold increase in energy required to fold the sequence ({Delta}G becomes nearly zero, at –0.6 kcal/mol) after the primary transversion event. (B) A transition at the same site in RNA can maintain relatively favorable folding conditions (in terms of {Delta}G) due to acceptable noncanonical pairings in RNA (G-U and C-A). In functionally constrained formations such as those of group II introns, alteration of the structure may block ribozyme function. Therefore, transitions would be more likely to persist in these stem structures, perhaps contributing to the high frequency of transition mutations observed in group II intron stem nucleotides

 
The need to preserve an RNA stem in a nearly exact form would allow only four possible mutations to persist, each resulting in an acceptable (G-U, C-A) non-canonical pairing for RNA: G to A, C to T, A to G, and T to C. Notably, each of these substitutions are transitions. Mutations may still occur relatively frequently in functionally constrained secondary structures, but we would expect only transitions to persist in those stems that are under the most intense degree of selection.

If correct, this reasoning provides at least one possible explanation for the low number or total absence of observed compensatory mutations in the some group II intron studies (e.g., Laroche and Bousquet, 1999 ; this paper) and the high rate of transitions in intron stems. It also suggests that any RNA sequence demonstrating exceptionally high transition rates may be under intense structural conservation as a stem structure.

Positional rate heterogeneity
As a consequence of variable functional constraints on different structural features, we would expect a certain heterogeneity in mutation rates per site in group II intron sequence data. The phenomenon is referred to as "positional" (Steel and Penny, 2000 ) or "among-site" rate heterogeneity (Kuhner and Felsenstein, 1994 ; Yang, 1994 , 1996 ). Some sites are immutable in G2 introns (discussed above); others, such as nonpairing nucleotides in RNA, may experience particularly high levels of substitution (e.g., Hickson et al., 1996 ; Downie, Katz-Downie, and Watson, 2000 ; and Fig. 2B).

Under a neutral evolution hypothesis, site mutation probabilities are expected to follow a normal distribution. When positional rate heterogeneity is present, site mutation probabilities more closely follow a gamma distribution. The {alpha} parameter of likelihood/distance models describes the shape of a gamma distribution function for site-mutation probabilities. A low {alpha} value (near zero) indicates a highly skewed distribution of mutation rates and strong positional rate heterogeneity; a higher value describes a more equal mutation probability per site (Yang, 1994 ; Swofford et al., 1996 ).

We might expect a certain level of substitution rate heterogeneity between sites in group II intron sequences due to structural constraints that may occasion heterogeneous mutation processes. Interestingly, for the 46 rpl16 intron sequences in Myoporaceae, the {alpha} parameter estimate under an HKY85 + {Gamma} model was "infinite." An infinite value for the {alpha} parameter signifies that site mutation probability is equivalent for all sites—in other words, the estimation method has not detected significant positional rate heterogeneity, as assessed by a full-sequence estimate of the {alpha} parameter under the HKY85 + {Gamma} model in Myoporaceae.

Does this mean a group II intron may have no significant positional rate differences in its sequence? The reality of site mutation rates in a G2 intron may be more complex than it first appears. The author ran the same likelihood analysis (with {alpha} parameter estimated under the given model) on the aligned sequences in Table 4Go for the conserved regions of rpl16 intron sequences for 21 higher plants. This time, the {alpha} parameter estimate was 0.367, indicating significant positional rate heterogeneity in these partitioned sequences from structurally conserved regions. Therefore, at this high taxonomic level across angiosperms, a partition of conserved characters in a group II intron shows a skew in site mutation probabilities that is not detectable in a lower-level analysis of complete intron sequences.


View this table:
[in this window]
[in a new window]
 
Table 4. Matrix of 185 structure-forming nucleotides of the chloroplast rpl16 intron in 22 species of higher plants. Secondary structure nomenclature above each sequence corresponds with the general group II intron model (Fig. 1). Domains I–VI are designated as D1–D6. The 5' and 3' modifiers refer to relative sequence position in a folded single-strand RNA stem (e.g., di5' and di3' together make up the di stem in domain I). The number of nucleotides separating each structural element is included in brackets between labeled structural features. The {psi} symbol indicates the position of the d3 bulge in domain I, which is lacking the {alpha} tertiary interaction in the rpl16 intron and can be highly variable in length. Question marks indicate missing data, usually due to the proximity of the F71 primer to the 5' intron boundary

 

View this table:
[in this window]
[in a new window]
 
Table 4. Continued

 
Apparently, we have more to learn about the nature and possible temporal aspect of positional rate heterogeneity in group II introns (e.g., Sullivan, Holsinger, and Simon, 1996 ). It may be that the infinite parameter value here is a function of the fairly even distribution of informative sites between structural and domain categories in rpl16 intron sequences of Myoporaceae. It may also be a result of assessing the gamma parameter across the entire sequence and not by individual partitions. At present, it may be worth the practitioner's time to carefully assess the value of the {alpha} parameter in terms of both the entire sequence and each constituent partition (structural and domain) before incorporating an {alpha} parameter estimate in a likelihood analysis of G2 intron sequences.

Linked mutations in G2 introns
Nonindependence of nucleotide characters is plainly as much an issue in group II intron sequences as it is in any sequence underlying a structured molecule (Kjer, 1995 ; Huelsenbeck and Neilsen, 1999 ; Kelchner, 2000 ; Tufféry and Darlu, 2000 ; Felsenstein, 2001 ). Nucleotide sites in conserved intron secondary structures evolve in conjunction with their pairing nucleotides in an RNA stem. In the 46 Myoporaceae taxa of this study, 57% of all categorized nucleotides in each rpl16 intron sequence occurred in stem formations, illustrating the extent of nonindependent characters in a G2 intron sequence.

Huelsenbeck and Neilsen (1999) discuss a form of character nonindependence that involves correlated mutation events through time—for example, one mutation may increase the likelihood of additional mutations that are linked with the primary event. Temporally correlated mutations that may occur in group II introns include length variation due to increased slipped-strand mispairing activity in a region of numerous adjacent sequence repeat units (Kelchner, 2000 ). Another example, described by Kelchner and Wendel (1996) , are the multiple minute inversion events linked with the formation of a hairpin just downstream of helix D1d2 in the rpl16 intron. Both situations may give rise to accelerated rates of mutation due to the presence of a "mutational trigger" (Kelchner and Clark, 1997 )—a specific sequence pattern that increases the likelihood of subsequent mutation events linked to that sequence pattern (see also Graham et al., 2000 ).

Nonindependence of characters in a sequence data set can be somewhat alleviated by the application of complex models of character evolution during phylogeny estimation and evaluation of clade support. Compensating for nonindependent characters in G2 intron sequence analysis is discussed in the section ALIGNMENT AND ANALYSIS.


    TECHNIQUES
 TOP
 ABSTRACT
 INTRODUCTION
 GROUP II INTRON STRUCTURE...
 MITOCHONDRIAL GROUP II INTRONS
 CHLOROPLAST INTRON LOSS IN...
 MUTATION PATTERNS IN GROUP...
 TECHNIQUES
 ALIGNMENT AND ANALYSIS
 GROUP II INTRONS: A...
 LITERATURE CITED
 
PCR amplification and sequencing
Polymerase chain reaction amplification of organellar group II introns is usually straightforward. In chloroplast genomes, many G2 introns are usually 600–1200 residues in length, the main exception being trnK, which still carries its functional maturase ORF, matK. The size of these sequence regions make them easy to amplify in standard PCR reactions. Because G2 introns are generally employed at moderate or low taxonomic levels, primers are placed in the surrounding exons of the host gene, which generally demonstrate a much higher level of sequence conservation than the intron as a whole.

Highly conserved structural features of a G2 intron may serve as excellent sites for internal primers, such as the 3' primary stem sequence of domain III and the adjacent 5' primary stem sequence of domain IV. Although this is a case of how intron structure can be exploited for more efficient PCR reactions, some phenomena related to intron structure may negatively affect amplification of double-stranded intronic DNA. Lengthy stem structures can form in single-stranded DNA template during the PCR reaction, particularly if the DNA version of the RNA stem is composed solely of canonical pairings (G-C, A-T). If such a stem has a strong (exceptionally negative) {Delta}G value, this structure in the PCR template could make amplification and sequencing difficult.

One may address this difficulty in a similar manner as countering secondary structure-based problems in ITS and other rDNA. Baldwin et al. (1995) suggest using high-temperature PCR and sequencing reactions in such cases to assist in disassociating secondary structures in the template. Dimethyl sulfoxide (DMSO) can also be helpful in limiting structure formation (Winship, 1989 ), as well as formam