Am. J. Bot. Subscribe to E-TOCs
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when eLetters are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via ISI Web of Science (32)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Hughes, C. E.
Right arrow Articles by Harris, S. A.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Hughes, C. E.
Right arrow Articles by Harris, S. A.
Agricola
Right arrow Articles by Hughes, C. E.
Right arrow Articles by Harris, S. A.
(American Journal of Botany. 2002;89:1057-1073.)
© 2002 Botanical Society of America, Inc.


Systematics

Divergent and reticulate species relationships in Leucaena (Fabaceae) inferred from multiple data sources: insights into polyploid origins and nrDNA polymorphism1

Colin E. Hughes2, C. Donovan Bailey and Stephen A. Harris

Department of Plant Sciences, University of Oxford, South Parks Rd., Oxford, OX1 3RB, UK

Received for publication October 2, 2001. Accepted for publication January 17, 2002.


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 APPENDIX 1
 LITERATURE CITED
 
Previous analyses of species relationships and polyploid origins in the mimosoid legume genus Leucaena have used chloroplast DNA (cpDNA) restriction site data and morphology. Here we present an analysis of a new DNA sequence data set for the nuclear ribosomal DNA (nrDNA) 5.8S subunit and flanking ITS 1 and ITS 2 spacers, a simultaneous analysis of the morphology, ITS and cpDNA data sets for the diploid species, and a detailed comparison of the cpDNA and ITS gene trees, which include multiple accessions of all five tetraploid species. Significant new insights into species relationships and polyploid origins, including that of the economically important tropical forage tree L. leucocephala, are discussed. Heterogeneous ITS copy types, including 26 putative pseudogene sequences, were found within individuals of four of the five tetraploid and one diploid species. Potential pseudogenes were identified using two pairwise comparison approaches as well as a tree-based method that compares observed and expected proportions of total ITS variation contributed by the 5.8S subunit optimized onto branches of one of the ITS gene trees. Inclusion of putative pseudogene sequences in the analysis provided evidence that some pseudogenes in allopolyploid L. leucocephala are not the result of post-allopolyploidization gene silencing, but were inherited from its putative diploid maternal progenitor L. pulverulenta.

Key Words: allopolyploid • domestication • Fabaceae • hybridization • Leucaena • nrDNA • pseudogene • rDNA


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 APPENDIX 1
 LITERATURE CITED
 
There is abundant evidence to suggest that interspecific hybridization and polyploidy have been important processes in the evolution of the small mimosoid legume genus Leucaena Benth. The occurrence of polyploidy within Leucaena has long been known from chromosome counts of L. leucocephala (Tijo, 1948 ; Frahm-Leliveld, 1957 ; Shibata, 1962 ; González, Brewbaker, and Hamill, 1967 ). Subsequent studies have shown that both diploids and tetraploids occur and that within each ploidy level there are two chromosome numbers, 2n = 2x = 52; 2n = 2x = 56; 2n = 4x = 104; 2n = 4x = 112, with five tetraploid species—L. diversifolia, L. confertiflora, L. involucrata, L. leucocephala, and L. pallida (Pan and Brewbaker, 1988 ; Palomino, Romo, and Zárate, 1995 ; Schifino-Wittmann et al., 2000 ; Cardoso, Schifino-Wittmann, and Bodanse-Zanettini, 2000 ). Molecular, morphological, and cytogenetic evidence suggest that at least one of these tetraploids, L. leucocephala, is of allopolyploid origin (Harris et al., 1994 ; Hartman et al., 2000 ). Secondly, the hybridity and parentage of two spontaneous hybrid species (L. xmixtec and L. xspontanea) occurring within the native range of Leucaena in south-central Mexico have been confirmed beyond reasonable doubt based on molecular, morphological, and cytogenetic criteria (Hughes and Harris, 1994 , 1998 ; Hartman et al., 2000 ). Furthermore, the occurrence of five other putative spontaneous hybrids, which do not match any known species and which occur in areas of species overlap, has been suggested (Hughes, 1998b ). Thirdly, analysis of chloroplast DNA (cpDNA) restriction fragment data revealed several instances of incongruence with morphology, suggestive of cpDNA introgression (Harris et al., 1994 ). Finally, evidence for potential hybridization is provided by artificial crossing experiments among 15 species that showed that crossability among Leucaena species is high, with 77% of 120 possible two-way interspecific hybridizations producing viable seed (Sorensson and Brewbaker, 1994 ), and by the growing use of artificial hybrids in domestication and breeding of Leucaena as a forage and agroforestry tree (Brewbaker, Sorensson, and Wheeler, 1989 ; Brewbaker and Sorensson, 1990 ; Sorensson, 1995 ; Hughes, 1998b ).

In a recent monographic treatment of the genus, Hughes (1998a) showed Leucaena to comprise 24 species, including two named hybrids, along with six infraspecific taxa. The genus ranges from Texas, in the United States, to central Peru in South America, with the greatest diversity of species in south-central Mexico and northwest Central America. All species are small- to medium-sized trees that grow mainly in seasonally dry deciduous tropical forests and to a lesser extent in semi-arid thorn scrub forest, dry mid-elevation matorral, and, in the north, subtropical or warm temperate habitats. Several species of Leucaena are widely cultivated for the production of livestock feed, green manure, small wood products, and for soil conservation (Pound and Martínez-Cairo, 1983 ; National Academy of Sciences, 1984 ; Brewbaker, 1987 ; Hughes, 1998b ), and one species, L. leucocephala, is pantropically naturalized and weedy (Hughes and Jones, 1999 ), making Leucaena one of the most common and familiar trees of the tropics. Leucaena is also an interesting genus for evaluating ideas about indigenous plant domestication processes. The indigenous use of the unripe seeds of Leucaena species for food in many parts of south-central Mexico has been widely documented (Whitaker and Cutler, 1966 ; Zárate, 1987 , 1994 , 1997 , 1998 , 1999 ; Casas and Caballero, 1996 ; Hughes, 1998b ), although the full extent and implications of this indigenous use in terms of cultivation, translocation, incipient domestication, and spontaneous hybridization, are only now starting to be more fully understood (Hughes, 1998b ; Zárate, 1998 , 1999 ).

The occurrence of hybrids and allopolyploid species with their reticulate as opposed to divergent histories complicates conventional analyses of species relationships. Joint application of cpDNA and nrDNA markers is well suited to unraveling reticulate from divergent relationships (e.g., Soltis, Doyle, and Soltis, 1992 ). The chloroplast genome is usually nonrecombining and uniparentally inherited, making it useful for tracking haplotype lineages and distinguishing maternal from paternal parents. In contrast, nuclear ribosomal DNA (nrDNA) provides recombining, biparentally inherited markers, potentially identifying hybrid origins that may not be revealed by analysis of cpDNA data alone. Previous work to estimate species relationships within Leucaena and understand polyploid origins has relied on morphological data (Zárate, 1994 ; Hughes, 1998a ), with its inherent limitations for detecting reticulations and disentangling them from divergent relationships (McDade, 1990 , 1992 , 1995 ; Rieseberg and Ellstrand, 1993 ; Rieseberg, 1995 ; Hughes, 1998a ), on analysis of cpDNA restriction fragment data (Harris et al., 1994 ), which, given the maternal inheritance of the chloroplast genome in Leucaena (S. A. Harris, unpublished data), are also of limited value as a sole source of evidence for estimating species relationships or detecting hybrids, and on cytological data (Hartman et al., 2000 ). To address this gap, a new DNA sequence data set for the 5.8S subunit and flanking internal transcribed spacer regions (ITS 1 and ITS 2) of nrDNA has been assembled for a substantial subset of the accessions used in Harris et al.'s (1994) cpDNA study.

Unlike the sole use of cpDNA data, nrDNA data alone can provide direct evidence of reticulate evolution if concerted evolution fails to act across the repeat units contributed by different parent species (Doyle, Doyle, and Brown, 1990 ; Baldwin et al., 1995 ; Buckler and Holtsford, 1996a ; Waters and Schaal, 1996 ; Hershkovitz, Zimmer, and Hahn, 1999 ; Zhang and Sang, 1999 ), and there is a growing number of reports of intraspecific and intra-accession ITS polymorphism potentially attributable to interspecific hybridization (Suh et al., 1993 ; Sang, Crawford, and Stuessy, 1995 ; O'Kane, Schaal, and Al-Shebaz, 1996 ; Buckler, Ippolito, and Holtsford, 1997 ; Campbell et al., 1997 ; Emshwhiller and Doyle, 1998 ; Jobst, King, and Hemleben, 1998 ; Fuertes-Aguilar, Rosello, and Feliner, 1999 ; Kuzoff et al., 1999 ; Vargas et al., 1999 ; Widmer and Baltisberger, 1999 ; Gaut et al., 2000 ). Conversely, there are reports that suggest that concerted evolution has proceeded to homogenize ITS repeat units, even in recent allopolyploids (Wendel, Schnabel, and Seelanan, 1995 ; Ainouche and Bayer, 1997 ). However, it is also apparent that detecting divergent repeat types, especially where they occur at low frequencies, may not be straightforward, suggesting that unless specific search strategies are used, divergent repeat types may be missed (e.g., Buckler, Ippolito, and Holtsford, 1997 ; Lim et al., 2000 ). Where concerted evolution has proceeded such that only a single copy type is present, direct evidence of hybrid parentage is lost, but nrDNA may still provide important evidence of hybrid parentage when compared with other data (e.g., cpDNA). In this paper we analyze the ITS data and explore the implications of these data in combination with a reanalysis of the cpDNA restriction fragment length polymorphism (RFLP) and morphological data for understanding diploid species relationships and polyploid origins within Leucaena. We present evidence of divergent ITS paralogues, including putative pseudogenes within accessions of four tetraploid and one diploid species of Leucaena, and discuss what this means for the origins of these species.


    MATERIALS AND METHODS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 APPENDIX 1
 LITERATURE CITED
 
Taxon sampling
All species and infraspecific taxa of Leucaena except for the two hybrid species are represented in both the cpDNA (101 accessions) and ITS (65 accessions) data sets with most taxa represented by multiple accessions. Taxonomy follows Hughes (1998a) . Accessions and taxon authorities are listed in Appendix 1. Accessions within taxa are numbered 1, 2, 3, etc. for ease of reference in the text, and sequences from different clones within accessions are indicated with letters, e.g., 1a, 1b, 1c. Full voucher details and GenBank accession numbers are archived on the American Journal of Botany Supplementary Data Site at http://ajbsupp.botany.org/v89/hughes/hughes-voucher.doc. Two accessions, L. shannonii-4 and L. diversifolia-8, which were subsumed within composite polymorphic species terminals in the earlier cpDNA analysis of Harris et al. (1994) , were excluded from the cpDNA data matrix analyzed here due to likely contamination or sample mix-up problems. This came to light following reanalysis of the cpDNA data, and initial analysis of ITS sequences from the same DNA samples showed discordant placements of both these accessions in both analyses (C. D. Bailey, unpublished data). The ITS sequences based on new DNA isolations from the original herbarium voucher specimens were both placed with other accessions of the same species.

Outgroup selection
Desmanthus fruticosus and Schleinitzia novoguineesis were chosen as outgroups based on a series of recent morphological and DNA sequence analyses (Luckow, 1997 ; Hughes, 1998a ; Luckow, White, and Bruneau, 2000 ; C. E. Hughes et al., unpublished data) that consistently placed Desmanthus Willd. and Schleinitzia Warburg ex Nevling & Niezgoda as sister groups in a clade that is sister group to Leucaena. These three genera, together with the poorly known monotypic genus Kanaloa Lorence & Wood, form the recently re-circumscribed informal Leucaena group within the tribe Mimoseae (Luckow, 1997 ; Luckow, White, and Bruneau, 2000 ). The original cpDNA analysis (Harris et al., 1994 ) included an additional outgroup, Microlobius foetidus (Jacq.) Sousa & Andrade, but recent analyses (Luckow, White, and Bruneau, 2000 ) indicate that Microlobius C. Presl. is distantly related, and it is not included in the present study. Inclusion of Microlobius created significant restriction site mapping difficulties in the original cpDNA analysis, prompting adoption of the fragment occurrence analysis approach used in that study (see below). Furthermore, inclusion of ITS sequences of distantly related mimosoid taxa (C. E. Hughes et al., unpublished data) significantly complicates alignment of the more variable regions in ITS 1 and ITS 2 and would have necessitated omission of part of the ITS 1 sequence data from the analysis.

Morphology
The morphological data matrix used here comprises 24 characters and is the same as that presented by Hughes (1998a : Table 5) with the following modifications. Firstly, in the interest of maximizing taxon matching in the combined data set, two outgroup species, Calliandropsis nervosus (Britton & Rose) H. M. Hern. & Guinet and Desmanthus balsensis J. L. Contr., were omitted. This means that the following five characters from the original 29-character morphology data matrix are no longer potentially informative: character 3, brachyblasts present/absent; 8, involucel present/absent; 12, staminodial flowers present/absent; 13, floral bracts peltate or sessile; 27, pod dehiscence. Secondly, new chromosome counts (Cardoso, Schifino-Wittmann, and Bodanse-Zanettini, 2000 ; Schifino-Wittmann et al., 2000 ) have been added to replace data previously missing from the original matrix (character 29 in Hughes, 1998a ).

DNA extraction
DNAs were extracted from fresh leaves of greenhouse-grown plants (from seed), herbarium specimens, or silica-gel-dried samples of field-collected leaf material (details at http://ajbsupp.botany.org/v89/hughes/hughes-voucher.doc). DNA isolation followed the cetyltrimethyl ammonium bromide (CTAB) technique of Doyle and Doyle (1987) . Most samples were further purified using caesium chloride gradients (Maniatis, Fritsch, and Sambrook, 1982 ), and DNAs were resuspended in tris-ETDA (TE) or water and stored at –20°C.

Chloroplast DNA restriction data
The cpDNA restriction site data used in this study were previously reported by Harris et al. (1994) . Fourteen 6-base pair (bp) cutters (BamH-I; Bcl-I, Bgl-II, Bsc-I, EcoR-I, EcoR-V, Hind-III, Nru-I, Nsi-I, Pst-I, Pvu-II, Sac-I, Stu-I, and Xho-I) were used to digest total DNA and probed with six Vigna Savi chloroplast DNA sequences (MB1, MB2, MB3, MB5 + MB7, MB9, MB11 + MB12; Palmer and Thompson, 1981 ) for a total of 84 probe-enzyme combinations (listed at: http://ajbsupp.botany.org/v89/hughes/hughes-cpDNA.doc). For the purpose of this study, the original autoradiograms were rescored for the presence/absence of restriction sites (rather than fragments, as previously treated by Harris et al., 1994 ) to minimize the potential problem of scoring two fragments resulting from one restriction site as independent characters (Bremer, 1991 ). In the present analysis, all nonidentical accessions were retained as terminals, where previously Harris et al. (1994) , with a few exceptions, had treated variation between accessions of a taxon as single polymorphic terminals.

Nuclear ribosomal DNA ITS
Polymerase chain reactions (PCR) were run using Qiagen (Qiagen, Crawley, West Sussex, UK) Taq polymerase (final concentrations: about 1.5 units Taq, 100 µmol/L of each dNTP, 1%[v/v] PCR buffer, and 1%[v/v] Q solution, and 0.5 µmol/L of each primer). Amplifications were performed on a Progene thermocycler (Techne Limited, Cambridge, UK). Several combinations of ITS4/ITS5 (White et al., 1990 ) and 17SE/26SE (Sun et al., 1994 ) primers were used to obtain amplifications from all the taxa of interest. All amplifications began with a 3-min 94°C denaturation step, followed by 35 rounds of (1) 1 min at 94°C denaturation; (2) 1 min annealing at 48°C (primer combinations ITS4 + ITS5 and 17SE + ITS4), or 53°C (primer combination ITS5 + 26SE); and (3) a 1-min 72°C extension. The PCR products were cleaned using the Concert Purification System (Life Technologies, Paisley, UK) or Qiagen Gel Extraction Kits for direct sequencing or cloning. Both strands were sequenced using the PCR primers and "Big Dye" termination chemistry (Applied Biosystems, Warrington, UK). The PCR band polymorphism or "dirty" sequence traces for several templates identified the potential for heterogeneous copy types. These products were cloned (pGEM; Promega, Madison, Wisconsin, USA) using one-half the reaction volume described by the manufacturer. Clones were screened for the presence of an ITS insert using the PCR amplification primers and subsequently sequenced. In order to detect the range of maintained ITS polymorphism within accessions of all three subspecies of L. leucocephala, a more elaborate amplification/restriction digestion procedure using an Acc-I restriction site identified in the 5.8S subunit of some L. leucocephala sequences was required (see below).

Sequence alignment
Sequence fragments were edited and joined into contigs using Sequencher (Gene Codes, Ann Arbor, Michigan, USA). Complete sequences were provisionally aligned using ClustalX version 1.8 (Thompson et al., 1997 ) and then adjusted by eye in WinClada (Nixon, 1999a ). ClustalX default parameters for multiple alignments were changed to a gap opening cost of eight and gap extension cost of six to generate reasonable starting alignments. Contiguous gaps were scored as characters using the "simple gap coding" method formalized by Simmons and Ochoterena (2000) . Sequences are available in GenBank (accession numbers are available on the American Journal of Botany Supplementary Data Site at: http://ajbsupp.botany.org/v89/hughes/hughes-voucher.doc).

The ITS pseudogenes
A number of ITS studies have reported the occurrence of potentially nonfunctional pseudogene sequences (Buckler, Ippolito, and Holtsford, 1997 ; Yang et al., 1999 ; Hartmann, Nason, and Bhattacharya, 2001 ). Most attempts to distinguish pseudogenes from functional copies have used pairwise comparisons of base pair differences and the occurrence of insertions/deletions (indels) across sequences of the normally highly conserved 5.8S subunit, with 5.8S variability taken as an indicator of nonfunctionality (Buckler, Ippolito, and Holtsford, 1997 ; Hershkovitz, Zimmer, and Hahn, 1999 ; Yang et al., 1999 ). Other criteria, such as stability of secondary structure and substitution rates at methylation sites, were also used by Buckler, Ippolito, and Holtsford (1997) . They concluded that individual criteria may not be sufficient to identify pseudogenes unambiguously and recommended examination of a suite of sequence characteristics.

Here we have attempted to identify putative pseudogenes using two approaches. The first approach used two different types of pairwise comparisons. First, we identified all base pair and indel differences within the 5.8S region between the outgroup Desmanthus fruticosus and all the other sequences. The absolute number of 5.8S base pair differences, or presence of indels, were assessed as indicators of potential pseudogenes. Second, instead of simply counting absolute differences in the 5.8S region, the percentage of total ITS variation contributed by the 5.8S was calculated by dividing the 5.8S contribution by the total number of base pair differences across the ITS region (ITS 1, 5.8S, ITS 2). The observed percentages of the overall ITS region (corrected for length differences due to indels) made up by the 5.8S were then compared to values that would be expected for a relatively unconstrained 5.8S; i.e., if the 5.8S region contributed variation levels similar to the ITS 1 and ITS 2, the sequence was considered to have been released from selective constraints and to be a putative pseudogene.

Alongside these two pairwise comparison approaches, a tree-based approach to identify possible pseudogenes was also used. This method is based on the principle that relatively unconstrained (e.g., 5.8S) and unconstrained (ITS 1 and ITS 2) regions can be compared to identify if a branch has changed in a manner consistent with a functional or nonfunctional copy. If the region that is typically highly constrained is changing at a rate similar to the relatively unconstrained region, the pattern of substitution is not consistent with functionality. The percentage of variation across the entire sequence contributed by the constrained region should be much less than the representative base contribution, i.e., length, of the constrained region. Thus, if a constrained region such as the 5.8S subunit represents X percentage of the total sequence length we would expect the variation contributed by the constrained region to be roughly X for a pseudogene branch and much less than X for a branch changing in a manner consistent with function. In this case it was expected that a functional 5.8S region would show a much lower rate of change than the two ITS regions, which are considered to be relatively unconstrained and more freely evolving (although short <26-bp conserved ITS regions have been reported by Liu and Schardl, 1994 ; Buckler and Holtsford, 1996b ; Gernandt and Liston, 1999 ; Hershkovitz, Zimmer, and Hahn, 1999 ). The observed percentage of 5.8S change for all branches of ten or more steps was calculated by summing the total number of 5.8S substitutions unambiguously optimized (including autapomorphies) to the branch and then dividing this value by the total number of substitutions (ITS1, 5.8S, and ITS2) optimized along the branch (e.g., 12 5.8S substitutions ÷ 45 total substitutions = 27%). An expected pseudogene percentage for the branch was then calculated by dividing the total length of 5.8S sequence optimized to the branch by the total length of the entire sequence optimized to the branch (e.g., 164 bp 5.8S ÷ 600-bp ITS region = 27% sequence contribution from the 5.8S). Highly constrained, presumably functional, 5.8S regions should be easily detectable from those consistent with lack of function that are changing at rates equivalent to ITS 1 and ITS 2. These pseudogene detection methods are the focus of more detailed discussion elsewhere (C. D. Bailey et al., unpublished data).

The limited size of the combined 5.8S, ITS 1, and ITS 2 region reduces the scope for statistical testing of any of these comparions whether absolute or tree-based. We used t tests to assess the significance of the differences between mean numbers of 5.8S substitutions and percentages of ITS variation from the 5.8S for putative functional and pseudogene copies presented in Table 1.


View this table:
[in this window]
[in a new window]
 
Table 1. Putative nrDNA pseudogenes

 
Data set compatibility and combination for simultaneous analysis
In this study we present the complete cpDNA and ITS gene trees as separate analyses in order to investigate polyploid origins. However, in order to derive a more robust and well-resolved estimate of divergent diploid species relationships we also present a simultaneous analysis of the combined morphology, cpDNA, and ITS data sets. To do this we excluded the five tetraploid species from the combined data matrix in order to circumvent the obvious problems of data set incongruence caused by reticulation. A similar approach was adopted in the separate morphological analysis where blurring of otherwise distinct character states, distortion of species relationships among diploid species, and loss of resolution made inclusion of the tetraploid species problematic (Hughes, 1998a ). The simultaneous analysis thus focuses on relationships amongst the diploid species of Leucaena.

Prior to simultaneous analysis of the combined data, each partition of the combined matrix (i.e., morphology, ITS, and cpDNA) was tested one against another for matrix compatibility using the incongruence length difference (ILD) test (Mickevich and Farris, 1981 ; Farris et al., 1995 ) implemented in WinClada (Nixon, 1999a ). Each of the three pairwise comparisons was made using 1000 random partitions, each analyzed with ten random addition sequences holding one tree per random addition sequence followed by swapping to a maximum of 100 equally most parsimonious trees (program commands: 1000 replicates; mult*10/per replicate; hold/1 per random addition; max*; hold100 per replicate).

In practical terms, the combined matrix was constructed using a concatenation approach (Nixon and Carpenter, 1996 ) fusing individual accessions into a single potentially polymorphic terminal representing each species/infraspecific taxon. For example, the L. pulverulenta terminal with 828 characters combined from ITS (708), cpDNA (96), and morphology (24) encompasses character information from all six L. pulverulenta accessions studied, even if an accession was only present in one of the three matrices. Multiple character states for individual characters were scored as subset or full polymorphisms to encompass precisely all the variation observed for a taxon.

Phylogenetic analysis
All characters were scored as unordered and equally weighted. Parsimony-based analyses were conducted with NONA (Goloboff, 2000 ) generated from WinClada (Nixon, 1999a ) using 1000 random addition sequences, tree bisection and reconnection (TBR), holding 100 trees per replication, and attempting to swap to completion (program commands: hold/100; mult*1000; max*). Preliminary analysis of the cpDNA data suggested that swapping all equally most parsimonious trees to completion would not be possible. Therefore, the parsimony ratchet (Nixon, 1999b ) was also used in an attempt to search a greater portion of tree space than is typically explored using a standard random addition sequence approach. This method involves iterative character weighted and unweighted steps holding few trees per replication in conjunction with many replications to search more efficiently for most parsimonious trees among tree islands (Nixon, 1999b ). Ratchets were conducted using NONA (Goloboff, 2000 ) generated from Winclada (Nixon, 1999a ). Following the guidelines presented by Nixon (1999b) , 100 iterations per ratchet were performed perturbing 20 (about 20%) of the informative characters (weighted step), constraining 10% of the nodes, and holding one tree per iteration. One hundred of these ratchet replicates were run on the cpDNA matrix.

The strict consensus bootstrap approach, which only considers clades to be supported if they are present in all of the equally most parsimonious trees identified within a replicate, was used here as it provides a more accurate and conservative measure of branch support than the more traditional "within replicate" measures (Davis et al., 1998 ). One thousand strict consensus bootstrap replicates each comprising ten random addition sequences and holding 100 trees (program commands: hold/100; mult*10) were spawned from Winclada into NONA.


    RESULTS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 APPENDIX 1
 LITERATURE CITED
 
Simultaneous diploid analysis
Data set combination remains a controversial issue in systematics. The ILD tests showed that while there is no significant incongruence between the cpDNA and ITS data sets (P = 0.232, i.e., 232 out of 1000 random partitions show the same or greater levels of incongruence than the defined partitions), or between the cpDNA and morphology data sets (P = 0.104), comparison of the ITS and morphology data sets shows significant incongruence (P = 0.013). Strict application of the prior agreement approach (Bull et al., 1993 ; Huelsenbeck, Bull, and Cunningham, 1996 ) relying on ILD test results to arbitrate, would suggest that simultaneous analysis of all three data sets is inadvisable. However, even in the face of "significant" incongruence between data sets we believe that data set combination and simultaneous analysis is still valid (Nixon and Carpenter, 1996 ). In cases where incongruence can be attributed to a specific cause, such as reticulation, then exclusion of taxa or sequences to eliminate that incongruence, as was done in the case of tetraploid species, is justified. To that extent we agree with the prior agreement approach. However, in cases where incongruence cannot be attributed to any specific cause, simultaneous analysis remains the best approach to maximize congruence among different independent sources of data to produce the best-supported hypothesis of species relationships (Nixon and Carpenter, 1996 ). In such circumstances a decision not to combine is not based on any real process partitions but is simply arbitrary (Siddall, 1997 ), especially if, as here, the patterns of incongruence are inconsistent, suggesting that interpretation of ILD test statistics as decision-making criteria for whether partitions should be combined may be misplaced (Baker and DeSalle, 1997 ). Furthermore, as currently implemented, the ILD test can confound incongruence and noise, such that significance can be due to high levels of homoplasy in one data partition (Dolphin et al., 2000 ).

Concatentation of the morphology, cpDNA, and ITS data sets into a single combined diploid data matrix and fusion of accessions into 22 single species/infraspecific terminals left 164 potentially informative characters (data matrix available at http://ajbsupp.botany.org/v89/hughes/hughes-matrix1.txt). Standard parsimony analysis identified eight equally most parsimonious trees (length, L = 325; consistency index, CI = 0.60; retention index, RI = 0.70), and the strict consensus is presented in Fig. 1. The combined analysis supports a monophyletic Leucaena (100% bootstrap) with three main clades resolved within the genus. When compared to results of separate analyses of the individual data sets (data not shown), the simultaneous analysis showed greater resolution, especially within Clade 1 (albeit with most subclades only weakly or moderately supported), more robust support for the three main clades, and a novel placement of L. cuspidata (whose position was unstable or unresolved in analyses of the three individual data sets) within Clade 3, albeit this last also with only moderate bootstrap support. Relationships among the three main clades remained unresolved in the combined analysis.



View larger version (33K):
[in this window]
[in a new window]
 
Fig. 1. Simultaneous analysis of the combined morphology, cpDNA, and ITS data sets for diploid species of Leucaena (Fabaceae). Strict consensus of eight equally parsimonious trees: L = 325, CI = 0.60, RI = 0.70. Strict consensus bootstrap values are given above each branch

 
Polyploid analyses
cpDNA
One hundred and thirty-four restriction site characters (listed at http://ajbsupp.botany.org/v89/hughes/hughes-cpDNA.doc), 96 of which were potentially informative, were scored for the 101 accessions in the cpDNA matrix. Fifty-five of the accessions shared identical character state distributions with one or more accessions and these were fused into single terminals prior to analysis, giving a final matrix of 60 terminals (data matrix available at http://ajbsupp.botany.org/v89/hughes/hughes-matrix2.txt).

Standard parsimony analysis was interrupted when 105 000 trees (L = 177, CI = 0.54, RI = 0.85) had been discovered. From the 100 parsimony ratchet runs, 4897 (of 20 000 saved) additional equally most parsimonious trees were found (L = 177). The strict consensus trees calculated from these two analyses were identical (Fig. 2), suggesting that additional searching would be unlikely to identify shorter trees or cause further loss of resolution.



View larger version (38K):
[in this window]
[in a new window]
 
Fig. 2. The cpDNA gene tree for 101 accessions of Leucaena (Fabaceae) species. Strict consensus of 105 000 equally most parsimonious trees: L = 177, CI = 0.54, and RI = 0.85. Strict consensus bootstrap values are given above their respective branches. Accessions of tetraploid species are marked with an asterisk. The numbers following taxon names refer to different accessions of that taxon as listed in Appendix 1 with full details at http://ajbsupp.botany.org/v89/hughes/hughes-voucher.doc.

 
Three main clades with moderate or, in the case of Clade 3, high bootstrap support were resolved in the cpDNA analysis (Fig. 2). However, the analysis was uninformative with respect to relationships among these three groups. In addition, the relationships of L. cuspidata, L. matudae, and a clade comprising L. greggii and L. retusa were unresolved with respect to the three major clades. There is variable resolution within clades, but Clade 1, which contains 10 out of 17 of the diploid species, is virtually completely unresolved beyond grouping accessions within species.

These results are largely consistent with the earlier cpDNA analysis of Harris et al. (1994) . However, beyond the substantial revision of names used by Harris et al. (1994) there are also a number of minor differences in resolution and placement of a few taxa. Differences of this magnitude are to be expected given the differences in methods used, i.e., fragment occurrences rescored as presence/absence of mapped restriction sites, treatment of all accessions as separate terminals rather than as single composite polymorphic terminals, and removal of a number of putative hybrid accessions as well as L. shannonii–4 and L. diversifolia–8, which represented probable contaminant DNAs. Placement of the tetraploid species largely mirrors that found in the earlier cpDNA gene tree of Harris et al. (1994) with strong bootstrap support (98%) for Clade 3. Within Clade 3, the two tetraploid species L. diversifolia and L. leucocephala were placed in a group with the single diploid species L. pulverulenta and L. pallida in Clade 2 as sister to L. pueblana. However, one important difference in this analysis was the placement of L. involucrata in Clade 2 with moderate 73% bootstrap support.

nrDNA ITS
In the early rounds of sequencing, 35 Leucaena accessions produced clean, readily readable traces with few or no polymorphisms. However, preliminary sequencing from about 15 accessions of four of the five tetraploid species, L. confertiflora (both subspecies), L. involucrata, L. leucocephala (all three subspecies), L. pallida, as well as diploid L. pulverulenta, and Schleinitzia novoguineensis, produced "dirty" or overlapping traces, suggesting that heterogenous ITS arrays might be present (e.g., Baldwin et al., 1995 ; Buckler, Ippolito, and Holtsford, 1997 ; Hershkovitz, Zimmer, and Hahn, 1999 ). These PCR products were cloned and resequenced. Intra-accession size variation between clones warranted sequencing of the two or more size classes and at least two clones were sequenced from each accession. Approximately 15 accessions added in the later stages of the study were cloned without preliminary sequencing of PCR products to further sample potential ITS diversity.

Preliminary analysis of cloned sequences of the three allopolyploid L. leucocephala subspecies identified two divergent ITS types. Eight L. leucocephala accessions grouped with what was then a single accession of L. pulverulenta-6, a result that was in line with the maternally inherited cpDNA restriction site analysis (Fig. 2). However, a single highly divergent clone from L. leucocephala ssp. leucocephala-1 grouped with L. lanceolata ssp. sousae-1. Six of the eight L. leucocephala sequences in the core L. leucocephala clade included an Acc-I restriction site in the 5.8S subunit that was absent from all other ITS haplotypes. In order to explore whether other ITS types might be found in other L. leucocephala accessions, 10–20 clones from each of five accessions (L. leucocephala ssp. glabrata-5,6; L. leucocephala ssp. ixtahuacana-1,2; and L. leucocephala ssp. leucocephala-3) were screened for the presence/absence of the Acc-I restriction site. A single clone from L. leucocephala ssp. leucocephala-1 lacked the Acc-I site. The precloning PCR products from these accessions were digested to clarify whether other types might have been amplified; all showed little or no uncut PCR product. Alternative PCR strategies and cocktails were explored but never produced significant amplification of the uncut type in the presence of the Acc-I-digestible type. To further investigate the possible extent of polymorphism, genomic DNAs from accessions of each of the L. leucocephala subspecies were then digested to remove haplotypes containing the Acc-I site as potential PCR templates. Given that the Acc-I restriction enzyme is methylation insensitive, digestion should have removed virtually all Acc-I types and would thus not tend to introduce any bias towards methylated haplotypes. The cleaved genomic DNAs were re-amplified, cloned, and further screened with Acc-I. Subsequent sequencing of clones lacking the Acc-I restriction site identified four additional L. leucocephala sequences representing each of the subspecies that grouped with L. lanceolata ssp. sousae-1 outside the previously identified core L. leucocephala group.

Internal transcribed spacer variation was also detected within accessions of three other tetraploids, L. confertiflora, L. involucrata, and L. pallida, as well as in the diploid species L. pulverulenta in the initial round of cloning and sequencing.

A total of 87 ITS sequences from 65 accessions were generated for the ITS analysis. Alignment and indel coding were relatively straightforward. The final matrix included 671 aligned bases representing 309 potentially informative substitution characters and 37 potentially informative gap characters (data matrix available at http://ajbsupp.botany.org/v89/hughes/hughes-matrix3.txt). Standard parsimony analysis swapped to completion and identified 3618 equally most parsimonious trees (L = 885, CI = 0.54, RI = 0.85); the strict consensus of these trees is presented in Fig. 3.



View larger version (45K):
[in this window]
[in a new window]
 
Fig. 3. The nrDNA ITS gene tree for 65 accessions (87 sequences) of Leucaena (Fabaceae) species. Strict consensus of 3618 equally most parsimonious trees: L = 885, CI = 0.54, and RI = 0.85. Strict consensus bootstrap values are given above each respective branch. Values below selected branches represent strict consensus bootstrap values for the respective clade identified in the analysis excluding potential pseudogene sequences (see DISCUSSION). Sequences of tetraploid species are marked with an asterisk. The numbers following taxon names refer to different accessions of that taxon as listed in Appendix 1 with full details at http://ajbsupp.botany.org/v89/hughes/hughes-voucher.doc. The suffixes (a, b, c) after accession numbers indicate sequences of different clones from a single individual

 
The ITS gene tree was more resolved than the cpDNA tree and, with the exception of a few unresolved taxa and accessions, the three main clades in the ITS strict consensus tree (Fig. 3) are common to the combined diploid and cpDNA analyses (Figs. 1 and 2). Resolution among these three clades was weakly supported in the ITS analysis, and the lack of resolution in Clade 1 mirrors the cpDNA result.

One unexpected feature of the ITS analysis is the strongly supported (100% bootstrap; six unique synapomorphies) grouping of two of the three accessions of L. collinsii ssp. zacapana with the two accessions of L. magnifica. The third accession of L. collinsii ssp. zacapana is placed in a clade with L. collinsii ssp. collinsii and L. trichandra.

Pseudogenes
In both of the pairwise comparisons used to identify pseudogenes, all other sequences were compared against the Desmanthus fruticosus outgroup sequence. This assumes that the Desmanthus sequence itself represents a functional ITS copy type. Several observations suggest that this assumption is reasonable. Firstly, the Desmanthus 5.8S sequence differs from previously published sequences in GenBank across divergent eudicot families including Fabaceae, Scrophulariaceae, Araliaceae, Lythraceae, and Solanaceae, by only one or two base pairs. Secondly, maximally divergent putative functional copies of Leucaena (identified by comparison with D. fruticosus) also differed from members of these same families by at most two base pairs. In contrast, minimally divergent putative 5.8S pseudogenes differed from their closest matching previously published GenBank sequences by at least ten nucleotides across the 5.8S region (maximally divergent types differed by as many as 23 sites). The marked discrepancies between low sequence variation in presumed functional copies across widely divergent taxonomic groups and high variation in presumed pseudogenes support the functionality hypothesis for the Desmanthus sequence.

Based on the assumption that the Desmanthus sequence represents a functional type, simple pairwise comparisons of all 5.8S sequences identified 26 potential pseudogene sequences from the 87 ITS sequences (Table 1; Fig. 4). The division between putatively functional and nonfunctional types was a discrete one given this measure. Putative functional copies differed from Desmanthus by a maximum of 5-bp differences, while sequences with 11–20 differences were interpreted as potential pseudogenes, and the difference between the mean values for putative functional and pseudogene copies is highly significant (Table 1). The discrepancies between functional and nonfunctional types are further exaggerated by the occurrences of multibase deletions (13–31 bp) in the 5.8S subunit in four of the presumed pseudogene sequences (Table 1). Deletions from the highly constrained functional 5.8S subunit are often taken as an indicator of lack of function (e.g., Buckler, Ippolito, and Holtsford, 1997 ).



View larger version (41K):
[in this window]
[in a new window]
 
Fig. 4. One of the equally parsimonious ITS gene trees of Leucaena (Fabaceae) species (for which the strict consensus is shown in Fig. 3 ) showing putative ITS pseudogene clades identified using a tree-based approach. Names of clades containing pseudogene sequences and their included terminals are in bold large font. The percentage of variation provided by the 5.8S subunit relative to the overall pairwise divergence is shown for each branch of ten steps or longer. Figures above branches are observed/expected 5.8S percentages of overall variation. Figures below branches are the actual number of 5.8S substitutions optimized on that branch/overall ITS region substitutions (i.e., branch length). Values in italics indicate nonpseudogene lineages while values in bold indicate pseudogene lineages. Sequences of tetraploid species are marked with an asterisk. The numbers following taxon names refer to different accessions of that taxon as listed in Appendix 1 with full details at http://ajbsupp.botany.org/v89/hughes/hughes-voucher.doc. The suffixes (a, b, c) after accession numbers indicate sequences of different clones from a single individual.

 
The second pairwise comparison method (5.8S differences relative to all differences across the complete ITS region) provided a less clear-cut but still highly significant distinction between pseudogenes and functional copies. Presumed functional copies (identified using the first method) showed 2.7–7.5% pairwise divergence, contributed by their 5.8S subunits, while presumed pseudogenes showed 9.5–18.5% variation from the 5.8S (Table 1). No putatively nonfunctional ITS types reached the level of 5.8S variation that would be expected for a pseudogene, i.e., that it was evolving at a rate similar to the flanking ITS regions (Table 1). Although less clearcut than the absolute 5.8S comparisons, these results do not conflict with those comparisons, i.e., all putative pseudogenes had >9.5% divergence and all putative functional copies had <7.5% divergence.

The tree-based approach to distinguishing functionality identified four clades containing potential pseudogenes whose constituent sequences correspond to those identified in the pairwise comparisons (Fig. 4). The observed percentage divergences for the 5.8S regions of the putative nonfunctional copies closely match the expected values for a pseudogene sequence (assuming equal rates of change across the entire ITS region) in three of the four pseudogene clades (Fig. 4). One subclade of pseudogene clade D (L. leucocephala ssp. leucocephala-3b and 1c) showed a lower than expected level of variation.

Accurate detection of potential pseudogene sequences is important in order to be able to assess how comprehensively nrDNA diversity has been sampled. We have detected and sequenced no functional nrDNA copies for three out of the five L. pulverulenta accessions, three out of the nine L. leucocephala accessions, and one out of five L. confertiflora accessions suggesting that ITS diversity still remains undersampled in this study. However, all taxa except L. leucocephala ssp. leucocephala are represented by at least one putatively functional nrDNA copy.


    DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 APPENDIX 1
 LITERATURE CITED
 
Diploid species relationships
Simultaneous analysis of the combined morphology, cpDNA, and ITS data sets provided a more robust and resolved hypothesis of diploid species relationships (Fig. 1) than separate analyses (results not shown) of any of the individual data partitions. There was strong support for three major clades within Leucaena, although the placement of L. cuspidata within Clade 3 had only moderate support. None of the three data sets, nor the simultaneous analysis, provided any support for the two sections Macrophylla and Leucaena formally designated by Zárate (1994) , nor for recognition of L. greggii and L. retusa as segregate genera as advocated by Britton and Rose (1928) , corroborating previous discussion of the inadequacies of these classification schemes (Hughes, 1998a ). The three clades recognized within Leucaena are clearly correlated with geography; Clade 3 species are restricted to northeast Mexico and Texas (north of the volcanic axis and east of the Chihuahuan desert), Clade 2 species (excluding a few probably cultivated outliers of L. esculenta; see Hughes, 1998a for discussion) occur at mid-elevation inland sites in south-central Mexico, south of the volcanic axis and north of the Tehuantepec Isthmus, and Clade 1 species comprise a larger group that occurs in seasonally dry tropical forest in Pacific coastal/southern Mexico, and Central and South America. Lack of resolution among these three clades precludes more detailed biogeographic interpretation.

The great morphological diversity among diploid species within Clade 1 stands in contrast to the marked lack of molecular variation and resolution within this clade in the cpDNA and ITS gene trees (Figs. 2 and 3). The Clade 1 diploid species (Fig. 1) encompass the full range of quantitative leaf diversity and all three pollen types found within the genus as a whole, as well as diverse flowering shoot and anther gland types (Hughes, 1998a ), but the separate cpDNA and ITS gene trees provide minimal resolution within Clade 1. Comparable examples of morphological change outstripping molecular change have been found in other groups (e.g., Oxalis, Emshwhiller and Doyle, 1998 ; Afromomum, Harris et al., 2000 ). This means that the resolution within Clade 1 in the simultaneous analysis, albeit much of it weakly supported, is mainly provided by the morphological data, and the Clade 1 topology (Fig. 1) mirrors that found in the analysis of morphology alone (Hughes, 1998a ; data not shown). The relevance of the simultaneous analysis including morphology is also demonstrated by the strong support for inclusion of L. matudae in Clade 2 and L. greggii and L. retusa in Clade 3 (Fig. 1), which contrasts with their unresolved placement in the cpDNA gene tree (Fig. 2) and weakly supported placement in the ITS gene tree (Fig. 3).

Leucaena magnifica and L. collinsii ssp. zacapana
The unexpected placement of two accessions of L. magnifica in a strongly supported group with two accessions of L. collinsii ssp. zacapana in the ITS gene tree is not mirrored in the cpDNA analysis, where L. magnifica was placed in a weakly supported group with L. shannonii and several accessions of L. trichandra. In the morphological analysis (data not shown), L. magnifica was placed in a clade with L. shannonii, L. salvadorensis, and L. lempirana, while in the combined analysis (Fig. 1) it was placed as sister to L. shannonii. Since its discovery in 1984, L. magnifica has always been considered to be either a sister species or, as originally described, a subspecies of L. shannonii (Hughes, 1991 , 1998a ; Harris et al., 1994 ). However, two other studies provide evidence suggesting a possible association between L. magnifica and L. collinsii ssp. zacapana. Firstly, isozyme studies by Chamberlain (1993) showed that one population of L. magnifica shared isocitrate dehydrogenase (IDH) isozyme patterns with L. collinsii ssp. zacapana that were not present in other nearby L. magnifica populations, nor parapatric populations of L. shannonii. Secondly, Harris (1995) presented an analysis of RAPD data that grouped L. collinsii ssp. zacapana with L. magnifica (referred to in that study as L. shannonii ssp. magnifica). Taken together these data suggest possible gene exchange between L. magnifica and L. collinsii ssp. zacapana. The distributions of these two taxa confirm that gene exchange between them is a possibility. Leucaena magnifica is endemic to a small area in the Department of Chiquimula in southeast Guatemala, adjoining the distribution of L. collinsii ssp. zacapana, which is endemic to the Motagua Valley system (distribution maps in Hughes, 1998a ). Populations of the two taxa occur in close proximity to each other near the villages of Ipala, San Jose La Arada, and El Carrizal 10–20 km south of Chiquimula. Further work to improve population sampling and resolution and support within Clade 1 is needed to shed light on the potentially reticulate relationships of L. magnifica.

Internal transcribed spacer polymorphism
Identification of heterogeneous intra-individual nrDNA arrays is a critical issue for understanding ITS gene trees and has important implications for inferring species phylogenies (Sanderson and Doyle, 1992 ; Buckler, Ippolito, and Holtsford, 1997 ; Denduangboripat and Cronk, 2000 ). Early ITS studies rarely detected multiple types within individuals (Baldwin et al., 1995 ), even in allopolyploids (e.g., Wendel, Schnabel, and Seelanan, 1995 ; Ainouche and Bayer, 1997 ; Yang et al., 1999 ). However, reports of ITS diversity within genomes are now much more common (e.g., Suh et al., 1993 ; Sang, Crawford, and Stuessy, 1995 ; O'Kane, Schaal, and Al-Shebaz, 1996 ; Campbell et al., 1997 ; Emshwhiller and Doyle, 1998 ; Jobst, King, and Hemleben, 1998 ; Fuertes-Aguilar, Rosello, and Feliner, 1999 ; Kuzoff et al., 1999 ; Vargas et al., 1999 ; Widmer and Baltisberger, 1999 ; Gaut et al., 2000 ), suggesting that such variation may be the rule rather than the exception (Buckler, Ippolito, and Holtsford, 1997 ; Hershkovitz, Zimmer, and Hahn, 1999 ). At the same time, it is increasingly clear that detection of intra-accession ITS variation may not always be straightforward. The effects of PCR selection, PCR drift, secondary structure, and copy number (e.g., Baldwin et al., 1995 ; Buckler, Ippolito, and Holtsford, 1997 ; Hershkovitz, Zimmer, and Hahn, 1999 ; Lim et al., 2000 ) mean that direct sequencing of pooled PCR amplification products, and even Southern analysis, may fail to detect ITS diversity within genomes (e.g., Volkov et al., 1999 ). Our attempts to sample ITS diversity within accessions of the tetraploid L. leucocephala, for which an elaborate amplification/restriction digestion procedure was needed to identify the range of maintained polymorphism, bear this out. Other recent studies have also revealed some of the complexities and difficulties associated with sampling ITS diversity (Buckler, Ippolito, and Holtsford, 1997 ; Lim et al., 2000 ; Hartmann, Nason, and Bhattacharya, 2001 ), suggesting that directed strategies using more sensitive techniques, such as those used here, are needed. Given that most nrDNA studies have not used such strategies (e.g., Kovarík et al., 1996 ; Hershkovitz, Zimmer, and Hahn, 1999 ; Volkov et al., 1999 ; Yang et al., 1999 ), negative results, especially for known polyploids, need to be interpreted with caution.

With few exceptions, heterogeneous intra-individual ITS arrays have been associated with polyploidy or multiple nucleolar organizing regions (NORs) (e.g., Campbell et al., 1997 ; Hershkovitz, Zimmer, and Hahn, 1999 ). The question remains why such polymorphisms persist in the face of concerted evolution, which in many cases, and even for some apparently recently derived allopolyploids (e.g., Wendel, Schnabel, and Seelanan, 1995 ; Ainouche and Bayer, 1997 ; Yang et al., 1999 ), appears to be highly effective across ITS homeologues (Baldwin et al., 1995 ). Internal transcribed spacer polymorphisms may persist when concerted evolution is not fast enough to eliminate different repeat types in the face of high rates of mutation or gene flow/migration (e.g., Hartmann, Nason and Bhattacharya, 2001 ) or recent interspecific hybridization (Campbell et al., 1997 ). Concerted evolution has been suggested to proceed faster within than between rDNA loci (Arnheim, 1983 ; O'Kane, Schaal, and Al-Shebaz, 1996 ; Campbell et al., 1997 ; Hershkovitz, Zimmer, and Hahn, 1999 ), so for at least some allopolyploids, concerted evolution may proceed independently in each of the parental genomic contributions (Suh et al., 1993 ), allowing two different nrDNA types to persist for longer than if the species was a typical diploid (Campbell et al., 1997 ). Concerted evolution can also be disrupted due to loss of sexual recombination or to location of nrDNA loci on nonhomologous chromosomes (Campbell et al., 1997 ).

Obviously one or more of these mechanisms may be involved in maintaining ITS polymorphism in Leucaena. With the exception of the diploid L. pulverulenta, all the Leucaena ITS polymorphisms occur in polyploids, suggesting that maintained polymorphisms were mostly associated with mechanisms linked to polyploidy (Thompson and Lumaret, 1992 ; Soltis and Soltis, 1999 ). There is circumstantial ethnobotanical and biogeographic evidence to suggest that L. leucocephala may be a recent hybrid (Hughes, 1998a ) and therefore that concerted evolution within this species has simply not reached completion. However, this alone is unlikely to explain the ITS diversity identified within all the polyploid Leucaena species or even all the variation found within L. leucocephala. The ITS polymorphism for L. leucocephala that exists within Clade 2 is most parsimoniously interpreted as having been passed to L. leucocephala from one of its diploid parents, L. pulverulenta (further discussion below). In this case, multiple NORs on nonhomologous chromosomes in L. pulverulenta, and its allopolyploid derivative, are the most likely cause of the maintained polymorphism. Cytogenetic evidence to support this hypothesis was found by Hartman et al. (2000) , who identified six major and two minor NORs in L. leucocephala.

For the other Leucaena polyploids there is neither a clear indication of recent origin, nor of an obvious pattern of maintained polymorphism derived from a diploid progenitor. In these cases, and for polymorphism identified in L. leucocephala that was not obviously derived from a diploid progenitor, persistence of ITS variation is more likely attributable to limited parental genome interaction in the combined genomes of these polyploid Leucaena species.

Pseudogene identification
The discovery of intra-accession ITS polymorphism raised the possibility that some sequences represent nonfunctional nrDNA pseudogenes (e.g., Buckler, Ippolito, and Holtsford, 1997 ). The two pairwise tests used here agreed with respect to which sequences represent potential pseudogenes. However, the second test, based on the relative contribution of 5.8S variation, was less decisive because the variation across the putative pseudogene 5.8S subunit was somewhat lower than would be expected for relatively unconstrained variation (Table 1). Buckler, Ippolito, and Holtsford (1997) , using Kimura distances, observed similarly lower than expected levels of 5.8S variation among putative pseudogenes in Gossypium, Nicotiana, Tripsacum/Oryza, Winteraceae, and Zea. They suggested two possible explanations for these discrepancies. First, they point out that when two functional nrDNA arrays diverge (within a genome), the ITS regions will diverge faster than the 5.8S subunit, until functionality is lost, as discussed by Baldwin et al. (1995) . Second, they observed that the base composition substitution model for the ITS vs. 5.8S comparisons might be too simple.

The tree-based approach presented here for characterizing pseudogenes identified four clades containing potential pseudogenes. These included all the putative pseudogenes identified by the pairwise comparisons (Fig. 4; Table 1). The percentage of variation from the 5.8S subunit was calculated on all branches longer than ten steps. Shorter branches were not considered because the level of variation was too small to provide a meaningful comparison. What is striking is that nearly all pseudogene branches analyzed in this way showed levels of variation in their 5.8S region close to that expected for ITS 1 and ITS 2, contrary to the comparable pairwise comparison method. Thus, the tree-based approach removed from consideration the possibility that ITS 1 and ITS 2 variation, prior to silencing, might be confounding estimates of relative 5.8S variation in those putative pseudogenes (Buckler, Ippolito, and Holtsford, 1997 ).

One pseudogene lineage (the L. leucocephala ssp. leucocephala-1b and 3c subclade in pseudogene Clade D) showed a lower percentage of 5.8S variation than would be expected for a pseudogene. Given that this branch has a length of 46 steps, this result is unlikely to be due to random bias caused by short branch length. Base substitution model discrepancies, as suggested by Buckler, Ippolito, and Holtsford (1997) , provide a potential explanation. However, in this case we cannot rule out the possibility that ITS 1 and ITS 2 variation, prior to silencing, might still be confounding estimates of relative 5.8S variation (Buckler, Ippolito, and Holtsford, 1997 ).

Nearly all nonpseudogene clades and some subclades that include pseudogenes were subtended by branches that were too short (≤10 steps) to assess subsequent behavior of derived branches. Within the clades that contain pseudogenes, branches derived from a pseudogene node were all considered to be potential pseudogenes, although this need not be the case. Additional character information might suggest whether potential reversions back to functionality following a pseudogene event would be possible.

Phylogenetic analysis of pseudogenes
Potential nrDNA pseudogenes are sometimes removed a priori from phylogenetic consideration (e.g., Yang et al., 1999 ). Inability to align sequences is one good reason for excluding them. Another concern associated with the inclusion of pseudogenes in phylogenetic analyses is the potentially spurious placement of terminals due to long-branch attraction (Felsenstein, 1978 ). While this is clearly a legitimate worry, it is not, a priori, a reason to exclude pseudogenes. The Leucaena ITS gene tree including all functional and potential pseudogene sequences (Fig. 4) shows three groupings, none of which were strongly supported, that may be the result of long-branch attraction; viz. two of the branches supporting a close association of the three basal sequences in pseudogene Clade A, two branches supporting the three L. leucocephala sequences in pseudogene Clade D, and the terminal branches of the L. pallida/L. involucrata subclade of pseudogene Clade B. These all have long branches subtended by relatively short and weakly supported nodes, and placement of these long-branch sequences should be viewed with caution.

In order to assess the effect of the pseudogene sequences on the phylogeny of functional ITS copies, an analysis excluding potential pseudogene sequences was conducted. The strict consensus of this analysis does not differ in topology (minus the pseudogene sequences) from Fig. 3, except in the placement of the L. leucocephala ssp. glabrata-4b sequence, which is transferred from Clade 1 to an unresolved position relative to the three major clades. Exclusion of pseudogenes from the ITS analysis does provide higher bootstrap support for the major clades (data also shown on Fig. 3) although this could be affected by the reduced number of terminals in the analysis.

We believe that inclusion of all available relevant information should provide the most complete understanding of gene diversification and that this is essential for inferring accurate species phylogenies. Inclusion of pseudogenes is potentially even more critical and useful when trying to unravel reticulate relationships among hybrid/allopolyploid taxa where duplication of function may lead to pseudogene formation. This is borne out by the current analysis of pseudogene sequences. First, inclusion of pseudogene sequences revealed the greater extent of ITS polymorphism providing additional insights into polyploid origins. Second, the resolution provided among accessions of L. leucocephala and L. pulverulenta in pseudogene Clade A is greater than that revealed by functional copies and provides possible evidence of multiple origins of tetraploid L. leucocephala.

Implications for polyploid species origins
The ITS data set provides significant new insights into the origins of the five tetraploid species of Leucaena (summarized in Fig. 5), particularly when viewed alongside the simultaneous analysis of diploid species (Fig. 1) and the maternally inherited cpDNA gene tree (Fig. 2). Each polyploid Leucaena species had at least one sequence whose pl