|
|
||||||||
Systematics |
2Department of Botany, The Field Museum of Natural History, 1400 S. Lake Shore Drive, Chicago, Illinois 60605-2496 USA; 3L. H. Bailey Hortorium and Department of Plant Biology, Cornell University, 466 Mann Library Building, Ithaca, New York 14853-4301 USA
Received for publication August 30, 2001. Accepted for publication February 5, 2002.
| ABSTRACT |
|---|
|
|
|---|
Key Words: crop evolution glutamine synthetase ncpGS oca Oxalidaceae Oxalis tuberosa PCR recombination polyploidy
| INTRODUCTION |
|---|
|
|
|---|
We have continued the study of the origins of O. tuberosa using DNA sequence data from another independently evolving locus, the nuclear gene encoding chloroplast-expressed glutamine synthetase (ncpGS). This locus is single copy in Oxalis, as in most taxa studied to date, and it diverged long ago from the cytosolic-expressed isozymes (Pesole et al., 1991
), so that primers have been designed that amplify only the chloroplast-expressed form (Emshwiller and Doyle, 1999
). In a pilot study, the gene tree of ncpGS was generally congruent with that of ITS, with somewhat more variation among the ncpGS sequences from the species studied than among their ITS sequences (Emshwiller and Doyle, 1999
). Initial attempts to sequence ncpGS from cultivated O. tuberosa directly from polymerase chain reaction (PCR) amplification products showed clear signs of multiple sequences within individual oca plants. This intraindividual sequence heterogeneity of ncpGS suggested that this locus could provide evidence of the origins of all of oca's genomes. Here we report the results of analysis of ncpGS data from cultivated oca and wild Andean Oxalis taxa as they contribute to the elucidation of the origins of the crop.
A note is in order to define what is meant here by the informal name "Oxalis tuberosa alliance," first proposed by de Azkue and Martínez (1990)
for a dozen morphologically similar x = 8 species, but here including additional species, as mentioned above. The alliance probably comprises at least 4050 species. In the past, we have used the "O. tuberosa alliance" and "x = 8 group" interchangeably. However, O. andina has recently been reported to have 16 chromosomes (de Azkue, 2000
), indicating that the "x = 8 group" may also include the clade that was sister to the alliance ("O. andina clade" in Fig. 1) in our prior ITS study (Emshwiller and Doyle, 1998
). We use "O. tuberosa alliance" here in a narrower sense than "x = 8 group," to refer to the clade, on both the ITS and ncpGS gene trees (see RESULTS and Emshwiller and Doyle, 1998
, 1999
; Emshwiller, 1999
), that includes all of the sequences of cultivated O. tuberosa, along with other taxa that either are reported to be based on x = 8 or lack cytological data but excluding the "O. andina clade."
|
| MATERIALS AND METHODS |
|---|
|
|
|---|
Outgroup sampling for ncpGS included three representative species from among those included in the main ITS analyses, including one representative (O. andina) of the clade that was sister to the known O. tuberosa alliance clade on the ITS tree (see above). Two outgroup taxa from among Peruvian Oxalis accessions were added, one of which, O. laxa var. hispidissima (EE744), is a member of a morphological group within Oxalis that had not been included in the previous study. The second additional outgroup, O. megalorrhiza (EE773), is type species of section Carnosae Reiche (O. megalorrhiza is often referred to by the misapplied name O. carnosa Molina; see Dandy and Young [1959]
for an explanation of the history of this name). Its inclusion was intended to test congruence of the molecular tree with cytological data (see below) and morphologically based sectional classifications, because it had published chromosome counts (albeit conflicting, 2n = 14 or 18, see below) and it is morphologically very similar to O. pachyrrhiza, which nonetheless was classified by Knuth (1930)
in a different section.
As we outlined earlier (Emshwiller and Doyle, 1998
), the search for the origins of oca has been impeded by the confused state of Oxalis taxonomy. Hybridization and lack of breeding barriers may be obscuring species limits (Emshwiller and Doyle, 1998
), and it is possible that Andean Oxalis populations experienced repeated cycles of divergence and contact during the history of mountain uplift and later glaciation, leading to a multitude of forms whose species membership is uncertain. In addition, the dependence of taxonomic workers on herbarium specimens has obscured species limits, because some important characteristics are lost upon drying (e.g., distribution of pigments, see also Salter, 1944
; Emshwiller, 1999
, 2002a
) and others are subject to phenotypic plasticity (e.g., swollen petioles). The recent revision of the genus by Lourteig (1994
, 2000
) has greatly clarified its taxonomy. She recognizes 280 species in Oxalis (excluding the South African section Cernuae treated by Salter [1944]
), classified in four subgenera and 28 sections. These sections are in closer agreement with the O. tuberosa alliance recognized here than the prior infrageneric taxonomy of Knuth (e.g., 1930)
. Specifically, the species of the alliance were placed in three sections (Lotoideae Lourt., Herrerae Knuth, and Ortgieseae Knuth) by Lourteig (2000)
, whereas they were placed in seven sections by Knuth. However, there are some cases in which we at least tentatively retain the use of species names that Lourteig (2000)
has reduced to synonymy, because the plants concerned have dissimilar ploidy levels or ncpGS sequences and/or they have morphological differences that persist in common garden conditions (see also Emshwiller, 1999
, 2002a
). In some cases fixed differences support the retention of species names (e.g., O. picchensis, O. unduavensis), but in other cases the morphologically different accessions may or may not ultimately be found to be distinct at the species level. Nonetheless, the provisional use of these subsumed names (e.g., O. weberbaueri, O. oblongiformis, O. staffordiana, and others in the group designated here as the "O. peduncularis clade" [see below]) reflects the observed differences among the populations.
Gene amplification and sequencing
DNA isolations either followed Doyle and Doyle (1990)
or used DNeasy Plant Mini Kits (QIAGEN, Valencia, California, USA) according to the manufacturer's instructions. Amplification of a region of the ncpGS locus that includes four introns was performed using primers GScp687f and GScp994r (Fig. 2) and thermocycling conditions described in Emshwiller and Doyle (1999)
. Amplification products either were cloned (see below) or were sequenced directly, either by manual sequencing as described previously (Emshwiller and Doyle, 1998
) or with an ABI 377 automated sequencer operated by the Cornell Biotechnology Center. Electropherograms were examined and edited using either Chromas 1.43 (McCarthy, 1997
) or Sequencher 3.1 (Gene Codes Corporation, Ann Arbor, Michigan, USA). Direct sequencing of PCR products used the amplification primers or internal primers GScp853f, GScp856r, or GScp911r (Emshwiller and Doyle, 1999
), whereas sequencing of clones used standard primers that anneal to the plasmid. Sequences were determined in both directions, with the exception of individuals that were heterozygous for insertion/deletion differences (indels). Thus, sequences could only be determined in one direction for accessions EE190, EE871, EE960, and ORT1, and part of the sequence is missing for accessions EE511 and EE512, because the latter plants were heterozygous for two indels.
|
Multiple sequence types within individuals: screening by direct sequencing
Once the various sequence classes of cultivated oca and the wild tuber-bearing plant had been identified by cloning and sequencing, a strategy was developed to test whether these same sequence classes were present in a larger sample of individual plants without sequencing large numbers of clones from many individuals. The sample for direct sequencing of amplification products included the same plants that had been used in the molecular cloning experiments, along with six additional morphologically distinct accessions of cultivated oca and three of the wild Bolivian taxon (http://ajbsupp.botany.org/v89/emshwiller/table1). There are relatively few changes that distinguish the various sequence classes found among the clones from these individuals (see below). Thus, electropherogram traces were examined to determine whether there were double peaks (two nucleotides at a single site) at the particular sites that distinguish the sequence classes, or in cases of sequence classes that differ by an indel character, whether the sequence became unreadable (mostly double peaks) past the location of the indel (see Cronn et al. [2002]
for a similar strategy). Because the plants whose sequences were cloned were all heterozygous for several indels, much of the sequence was unreadable if the amplification primers were used for sequencing (although GScp994r was used in one case). Primer GScp911r was designed in exon 10 (Fig. 2) to screen for heterozygosity at a particular site in intron 9, without interference from the indel in intron 10. In order to reduce the effect of "PCR drift" (Wagner et al., 1993
) the products of several (usually 35) 25-µL reactions were pooled, rather than performing a single larger (100-µL) reaction.
Sequence alignment and phylogenetic analysis
As noted previously (Emshwiller and Doyle, 1999
) alignment of the ncpGS sequences of the O. tuberosa alliance is generally straightforward and unambiguous. Therefore, DNA sequences were aligned by visual inspection, using Microsoft WordPad (Windows 95 accessory, Microsoft, California, USA), and further data entry and editing used Dada version 12 (Nixon, 1998
). Ambiguity exists in placement of some gaps in areas of sequence repeats, but because the different placements do not overlap other informative characters, the different placements of gaps should have equal effect on the results of analysis. Overlapping gaps in intron 8 were treated as multistate characters, while binary characters were added to the matrix to represent other gap characters. Heterozygous sites and indels in individuals that were not cloned were coded as subset polymorphisms. Discussion of indel and substitution characters below will follow the numbering of aligned sites as presented in an alignment of ncpGS sequences (Emshwiller, 1999
, Appendix 5.1), which is available as: (1) an aligned "popset" in GenBank including all of the sequence between the amplification primers (729 aligned nucleotide sites), associated with the individual accessions GBAN-AF470234 to GBAN-AF470317 and GBAN-AF098977 to GBAN-AF098984 (the prefix "GBAN-" has been added to Genbank accession numbers to link the online version of the American Journal of Botany with GenBank but is not part of the actual accession number) and (2) an alignment that also includes 23 gap characters, archived at http://ajbsupp.botany.org/v89/emshwiller/appendix1.
Phylogenetic analyses were performed using Nona (version 1.6 for Windows NT; Goloboff, 1998
), using the search strategy hold/50; hold*; mult*100; max* (i.e., initially 50 trees are held from each of 100 replicate analyses, followed by tree bisection-reconnection (TBR) branch swapping on all of the trees found). Clados (version 17; Nixon, 1996
) or WinClada (version 0.9.99m24[beta]; Nixon, 2000
) was used to examine the maximally parsimonious trees (MPTs) and character optimizations. The sources of homoplasy and causes of multiple topologies were explored by running additional analyses to determine the effect of the inclusion and exclusion of particular sequences on the results (e.g., cloned sequences of cultivated oca that appeared to be recombined and some direct sequences from heterozygous accessions, which caused an increase in the number of trees and loss of resolution in the consensus trees; see RESULTS). Trees were rooted with O. laxa var. hispidissima and/or O. pachyrrhiza and O. megalorrhiza, which were suggested to be "basal" (earliest diverging) among the taxa sampled here by the results of independent analysis of 5.8S and ITS2 sequences of a diverse sample of Oxalis species (data not shown) as discussed in Emshwiller and Doyle (1998)
.
| RESULTS |
|---|
|
|
|---|
Indel variation among ncpGS sequences
The indels in intron eight are more numerous, and many are larger, than those in the other three introns. Greater length variation in intron 8 has also been observed in sampled taxa of legumes (J. L. Doyle and J. J. Doyle, Cornell University, unpublished data). Although this length variation could potentially create problems for alignment among more divergent sequences, it was a good source of characters in this study. The discussion of indel variation below refers to positions in the alignment archived at http://ajbsupp.botany.org/v89/emshwiller/appendix1, and the indels as designated at http://ajbsupp.botany.org/v89/emshwiller/appendix2.
Many of the indels (both small and large) seem to have resulted from slippage-like processes (Levinson and Gutman, 1987
; Hancock, 1995
), which can make alignment ambiguous, but not necessarily problematic as long as the indels do not overlap other informative characters. Examples include single base indels in small homopolymer runs (e.g., sites 117, 137, and 608), addition or deletion of dinucleotide repeats (sites 614615 and 620621), and a 20-base duplication (sites 314333). Notably, two deletions that each involve a unique sequence flanked by repeated segments (137148 and 379398) remove most of the T-rich region near the 3' intron splice junction, which is putatively important for intron splicing (Csank, Taylor, and Martindale, 1990
; Ko et al., 1998
). Perhaps these latter deletions are only tolerated because the plants have other functional copies of ncpGS, as they are both probably polyploid (i.e., one cloned accession of oca and one of the wild tuber-bearing Oxalis of Bolivia).
Alignment is uncertain for indels in homopolymer runs, because the indel might be at any position along the run. This is the case for the sixth thymine inserted in a run of five (site 117), for which accession EE249 of O. spiralis is homozygous and accession EE500 of O. picchensis is heterozygous. Coding both plants that have six thymines with the same character state treats these individuals as sharing a homologous indel. However, the separate placement of these two species on the ncpGS trees (see below) indicates that these probably represent separate insertion events.
Sequence heterozygosity
In addition to the ncpGS sequence heterozygosity in cultivated O. tuberosa and in the wild tuber-bearing populations of Bolivia, which is discussed separately below, some of the other wild Oxalis sampled were heterozygous for either substitutions or indels (whereas 24 other sampled plants had a single sequence type for the amplified region of ncpGS). These other heterozygous plants, whose ncpGS sequences were not cloned, were coded as polymorphic for their heterozygous sites or indel characters. We recognise that this expedient would not have been acceptable if the objective of the study were to reconstruct a complete gene tree of all alleles for all accessions, as it did not accurately represent the situation of having multiple sequences in an individual plant. Some cases of polymorphism coding of heterozygous individuals had little or no effect on the analyses, whereas other cases had greater effect. Nonetheless, molecular cloning of the individuals concerned was still not performed because doing so was not expected to contribute to understanding the origins of oca.
This form of coding did not create problems when the heterozygosity occurred in noninformative characters. Unique changes would be autapomorphic in the phylogenetic analyses, but coded as polymorphic the sequences are treated as identical. When the characters were informative, polymorphic coding had variable effects on the phylogenetic analyses. Two polymorphic characters (sites 559 and 591) did not appear as steps on the trees at all. If the alleles in heterozygous EE797 had been coded separately, the apomorphic state at site 559 would have united one of them with those of EE807. Site 591, however, would have been homoplasious, as even if EE746 had been homozygous for the apomorphic state at site 591, it would have appeared on the MPTs as a separate origin from the alleles in EE916 and EE504. Polymorphic coding of four informative substitutions (sites 1, 7, 144, 390) in EE871, a plant from an area in which hybridization may have been occurring, caused additional rearrangements in the "O. lucumayensis group" (see below), but the consensus tree was unchanged.
Considerably more effect on the phylogenetic analyses was caused by EE511 and EE512, which were heterozygous for two indels ("s" and "u"). Part of the sequences between those two indels could not be read without cloning, which was not undertaken due to the focus of this project on the origins of oca. Indel "s" is shared with several other taxa and is a non-homoplasious character on the MPTs, whereas indel "u" was found only in EE511, EE512, and EE960 (this deletion was not observed in homozygous form in any of the plants sampled). EE511 and EE512 are also heterozygous for three informative substitutions, which can be used to infer the positions of the (presumably) two sequences on the MPTs (see Phylogenetic results 2, below). As expected, inclusion of these sequences, coded as polymorphic, caused a dramatic increase in the number of MPTs and loss of resolution in the O. peduncularis clade. On the other hand, although EE960 is heterozygous for six substitutions, it had only a single indel, so its entire sequence was read in at least one direction. Inclusion of EE960, even if coded as polymorphic at the pertinent sites, did not cause an increase in the number of trees, but joined the base of the clade that would have included both of its separate sequences (see Phylogenetic results 2, below), which are not as divergent as those of EE511 and EE512. While this resembles the behavior of hybrids in morphological analyses (e.g., McDade, 1990
, 1992
), it does not truly represent the situation of having two different alleles in the plant, since if the two sequences had been cloned, they might have each joined separate branches.
Most of the accessions in the examples above, or different plants of the same population, have been inferred to be diploids by flow cytometry (Emshwiller, 2002b
), with the exception of EE960, which was no longer alive at the time of the flow cytometry study. Thus, their sequence heterozygosity seems to represent either normal allelic polymorphism or interspecific hybridization. Accession EE500, in contrast, is inferred to be tetraploid, but it may be autopolyploid, as its sequences differ only very slightly (see description of its heterozygosity for a homoplasious single base insertion in a homopolymer region, above).
Cloned sequences of cultivated oca and wild tuber-bearing Oxalis
Chloroplast-expressed GS sequences were determined for 36 molecular clones from three morphotypes of cultivated oca (11 clones from MHG884, 10 from MHG913, and 15 from 35·04) and eight clones from one accession (EE259) of the unnamed Bolivian wild tuber-bearing taxon. However, some of these clones (originally designated class "A") were later determined to be contaminants in the PCR reactions (six of those of MHG884 and one of MHG913, but none of 35·04). Figure 3 shows only the variable sites (and indel characters) among the cloned sequences, excluding the contaminant sequence class (these sequences were similar or identical to that of accession EE184 of O. spiralis and EE359 of O. mollissima). A "hypothetical ancestor" sequence was included (showing the states at the base of the x = 8 group) so that the apomorphic states of each sequence can be seen more easily.
|
Among these kinds of possible PCR artifacts, shuffling of sequences by PCR recombination was especially challenging for the interpretation of possible homeologous loci. PCR recombination is the formation of artifactual sequences in vitro that combine the features of different template sequences in a heterogeneous reaction mixture, such as different alleles, paralogues, or homeologues, and is thought to occur when uncompleted PCR products act as primers in subsequent cycles, re-annealing to different templates (reviewed in Cronn et al. [2002]
). Rather than being a rare phenomenon, recombinants can make up a large proportion of PCR products, particularly when cloning is used as an intermediate step (Jansen and Ledley, 1990
[25% recombinants]; Bradley and Hillis, 1997
[43%]; Cronn et al., 2002
[up to at least 89% in different gene systems]). In theory, the proportion of recombinant artifacts might be even higher in an octoploid, because, with a higher number of different template sequences, there might be a correspondingly higher chance of re-annealing to one of the "wrong" sequence types. Thus, although recombination among sequence classes could truly occur in an octoploid, we suspect that the recombination observed here is probably an artifact of cloning PCR products.
Certain of the cloned ncpGS sequences appear to be recombined because they have a mixture of the character states found in other sequences (Fig. 3). A simple example is EE259 clone 1, which resembles class "B" clones 2, 3, 8, 9, 10, and 11 (from the same plant) at the 5' end and middle, but resembles class "D" clone 7 at the 3' end. In this case a single recombination event would be inferred, but other examples may represent the products of several recombinations (e.g., MHG884 clone m2; Fig. 3). In this study, unlike the situation in cotton (Cronn et al., 2002
), the diploid progenitors are not known a priori. Nevertheless, in most cases the nonrecombinant conditions could be inferred through a comparison of diagnostic nucleotides in ncpGS sequences of wild Oxalis species, as well as information from the phylogenetic context (see below). In the example above, the character states at the 5' and middle of EE259 clone 1 would place it in one part of the tree, while those at the 3' end would place it in a different part of the tree. In some cases the inclusion of the putative recombined sequences in cladistic analyses increased the numbers of MPTs and caused collapse of some nodes in the consensus tree; in other cases their inclusion only increased homoplasy and tree length, as the recombined characters appeared as extra steps on the trees. Clones that do not appear to be recombined cause neither loss of resolution nor increase in homoplasy and join the same three positions on the ncpGS tree that are discussed below (Phylogenetic results 2).
DNA polymerases that lack a proofreading ability, such as were used here, are prone to substitute incorrect nucleotides occasionally, a phenomenon commonly referred to as Taq error. Although these data alone cannot distinguish artifacts from real substitutions with certainty, singleton changes (shown in lowercase letters in Fig. 3) that occur in only one cloned sequence and are not shared by any sequences from wild species are more likely to be the results of Taq error than substitutions that are shared by different clones, plants, or species. Some singleton substitutions might be mutations that do exist in the plant, but the underlined bases would represent non-synonymous substitutions and so may be particularly suspect. Mistaken nucleotides that result from Taq error will usually be autapomorphic in the results of phylogenetic analysis, so Taq error is considered less of a problem for this study than PCR recombination.
Variation within cloned sequence classes
The cloned sequences that do not appear to be recombinants seem to fall into three different classes, designated at the right of Fig. 3 as classes "B," "C," and "D." These sequence classes group in different places on the MPTs in phylogenetic analyses (see below). All three classes were present in each of the three plants of cultivated oca in the cloned sample, whereas class C was absent in one of nine oca plants screened by direct sequencing (see below). Two sequence classes, B and D, are present in the cloned accession (EE259) of the wild tuber-bearing taxon of Bolivia, and three additional accessions sequenced directly.
Variation was also observed among the sequences within each of the sequence classes. Although some of the differences among sequences may be artifactual, others probably represent true allelic variation. For example, accessions 35·04 and EE259 each have deletions that are not found in the other oca or Bolivian wild tuber-bearing accessions that were sequenced directly. Although length variation can also be an artifact of PCR (Fenton, Malloch, and Germa, 1998
), the deletion in EE259 was confirmed in direct sequences performed under the same conditions as the other accessions. Some substitutions were shared by clones from more than one of the plants, so they were probably real variants, although they sometimes appear in more than one sequence class, making it uncertain to which class they really belong (e.g., it is unclear which sequence class was truly on the same strand with the transition substitution at site 1 in both the alignment and Fig. 3).
It is noteworthy to have encountered ncpGS sequence variation among plants in this very small sample of three cloned accessions. Oca cultivars are variable in traits such as tuber morphology and pigmentation, nutritional factors, insect resistance, phenology, and yield (Castillo, 1974
; Poma Machaca, 1976
; Bustinza López, 1979
; Cortés Bravo, 1984
; King and Gershoff, 1987
; Arbizu et al., 1997
). However, among molecular markers only low levels of variation are reported for isozymes (del Río, 1990
), tuber proteins (Stegemann, Majino, and Schmiediche, 1988
; Shah, Stegemann, and Galvez, 1993
), and random amplified polymorphic DNAs ([RAPDs]; A. Donayre, Universidad Nacional Mayor San Marcos, Lima, Peru, personal communication; G. Piedra, Instituto Nacional de Investigación Agropecuaria, Quito, Ecuador, personal communication), but variability appears to be greater in AFLP markers in initial assessments (Tosto and Hopp, 2000
; E. Emshwiller, unpublished data). At this level of sampling it is not possible to determine whether the ncpGS sequence variation among plants represents multiple origins of polyploidy or domestication, mutations that arose after the origin of the crop, or loss of alleles through sexual recombination.
Phylogenetic results 1: analyses excluding cloned sequences
Separate analyses were run with and without the cloned sequences of cultivated oca and the Bolivian wild tuber-bearing taxon and the accessions with sequence heterogeneity and missing data that caused loss of resolution when included, as discussed above (i.e., EE511, EE512, and EE960). One of the 20 MPTs that resulted from an analysis that excluded these sequences is shown in Fig. 4, which also indicates the branches that collapse in the consensus tree. For purposes of the following discussion three of the clades in Fig. 4 are designated as the "O. lotoides group," the "O. lucumayensis group," and the "O. peduncularis clade." Although the latter clade is resolved as monophyletic in all analyses of ncpGS data, the O. lotoides group and O. lucumayensis group are resolved in various ways in the analyses that include the cloned sequences, so they are not necessarily monophyletic groups (in an analysis of combined ITS and ncpGS data these two groups join a single clade referred to as the "O. lotoides clade" in Emshwiller [2002a]
).
|
Phylogenetic results 2: analyses including cloned sequences
Analyses that included the three sequence classes of cultivated oca and two classes of the wild tuber-bearing plant EE259 found 208 MPTs, one of which is shown in Fig. 5. Cloned sequences that were judged to be PCR recombinants or contaminants (see above) were excluded to avoid problems associated with including putative recombined sequences in analyses. Heterozygous accession EE871 of O. lucumayensis ssp. lucumayensis was included in the analysis shown (Fig. 5). Analyses that excluded this sequence had only 72 MPTs, but the strict consensus was the same in either case. As above, this analysis excluded the sequences of EE511 and EE512, because when included, their missing data and polymorphic coding caused an increase in the number of MPTs (to 810 trees) and significant loss of resolution (results not shown). However, by assuming nonrecombination of characters found in the parts of these sequences that were determined, the positions on the tree that these sequences would join could be inferred, indicated by the asterisks in Fig. 5.
|
|
Comparison with previous ITS results and cytology
The results of phylogenetic analysis of ncpGS sequences are congruent overall with our previous ITS results (Emshwiller and Doyle, 1998
). One character on the ITS gene tree conflicts with each of the ncpGS characters that unite the O. lotoides group, the O. lucumayensis group, and the sequences of O. spiralis and O. mollissima (Figs. 4 and 5; see also fig. 3 in Emshwiller and Doyle, 1999
). In the latter case the ncpGS data seem to be more congruent with morphology and species boundaries than those of ITS.
There is somewhat more divergence overall among sequences of ncpGS than ITS, allowing more resolution of relationships, particularly in the O. peduncularis clade (Figs. 4 and 5). Only one uninformative substitution in the entire ITS region distinguished the sequences of the purchased plants of O. peduncularis (PED1) and O. herrerae (HERR1) (Emshwiller and Doyle, 1998
), and the sequence of the latter was identical to that found in O. peduncularis, O. villosula, O. tabaconasensis, and O. herrerae by Tosto and Hopp (1996)
. Additional sampling has not revealed any informative ITS variation among the taxa in this clade (Emshwiller, 2002a
), whereas their ncpGS sequences form a clade supported by a total of nine characters. The greater divergence in ncpGS sequences than those of ITS is not seen among all taxa, however. The members of the O. lotoides group have little divergence in either ncpGS or ITS, in spite of considerable morphological diversity. Their ncpGS sequences are not identical, however, as the apparent branch lengths of zero for most of the species in this group (Figs. 4 and 5) masks heterozygosity in at least one autapomorphic site in nearly all of these plants. Nevertheless, although the accessions in the O. lotoides group were collected from a broad geographical area (from northern Peru to Bolivia) and are members of several distinct species, they have little molecular variation in the two loci studied so far.
The only member of the O. lucumayensis group included in our prior ITS study was accession EE289 of O. lucumayensis ssp. subiens (determined as O. sp. aff. distincta at that time; Emshwiller and Doyle, 1998
). Subsequent sampling for ITS in this group (Emshwiller, 2002a
) found two ITS sequence types (designated "B" and "C" in Emshwiller and Doyle, 1998
), differing by a single substitution, that are also found in the O. lotoides group. The occurrence of hybridization among species in the O. lucumayensis group, suggested at first by the observation of morphologically intermediate individuals and supported by multiple ITS sequences in accession EE294 of O. lucumayensis ssp. subiens (Emshwiller and Doyle, 1998
), is further supported by heterozygosity of ncpGS in accession EE871 of O. lucumayensis ssp. lucumayensis. Further study might clarify whether these observations are due to hybridization or simply to highly polymorphic species in this group.
Our previous ITS results (Emshwiller and Doyle, 1998
) supported the monophyly of the cytologically based Oxalis tuberosa alliance (de Azkue and Martínez, 1990
) with the inclusion of additional species for which cytological data are as yet unavailable. The ncpGS data add more support to the alliance, not only because of the congruence of the gene trees and the addition of more molecular synapomorphies of the alliance (including a 31 bp deletion), but also because of the addition of more species reported to share x = 8, such as O. lotoides, O. medicaginea, O. tabaconasensis, O. oblongiformis, and O. ptychoclada (Favarger and Huynh, 1965
; Huynh, 1965
; de Azkue and Martínez, 1990
). The x = 8 clade is retained and enlarged when sequences of these taxa are added.
As mentioned above, O. andina has recently been reported to have 16 chromosomes (de Azkue, 2000
), and thus it also shares x = 8. However, neither ITS (Emshwiller and Doyle, 1998
) nor ncpGS (see below) indicate that O. andina or its allies were involved in the origins of oca. Thus, we consider the "O. andina clade" (Fig. 1) to be the sister group of the O. tuberosa alliance, rather than part of the alliance itself. Nevertheless, the discovery that O. andina also has x = 8 is consistent with a single origin of this base chromosome number within Oxalis, with the modification that the x = 8 clade circumscribes a larger group than the O. tuberosa alliance sensu stricto. Although cytological data are still lacking for many of the taxa whose sequences group in this clade, there are as yet no members of this larger clade known to have a base chromosome number other than eight. The obverse is also true, in that no Oxalis taxa outside of this clade are reported to have x = 8. Although fewer highly divergent taxa were included in the ncpGS sample than that of ITS, the monophyly of the x = 8 alliance is further upheld with the inclusion of the additional outgroups O. laxa var. hispidissima and O. megalorrhiza, neither of which is based on x = 8. Although there are no chromosome number reports for Oxalis laxa var. hispidissima, there is a report of 2n = 18 for the morphologically similar O. micrantha (Naranjo et al., 1982
). Species boundaries between these taxa have been delimited in various ways by different workers (as inferred from specimen annotations), and O. micrantha var. setifera is considered a synonym of O. laxa var. hispidissima by Lourteig (1988
, 2000
). A similar confusion surrounds chromosome number reports for O. megalorrhiza. A count of 2n = 18 is reported by de Azkue (2000)
for O. pachyrrhiza and for O. carnosa (= O. megalorrhiza, see above). The same number was reported by Diers (1961)
for O. solarensis Knuth, also considered to be a synonym of O. megalorrhiza (Pool in Brako and Zarucchi, 1993
; Lourteig, 2000
). However, Heitz (1926
; cited in Federov, 1974
) reported a count of 2n = 14 for O. carnosa. Nevertheless, neither of these conflicting counts would include O. megalorrhiza in the x = 8 alliance.
Screening of oca and wild tuber-bearing accession EE259 by direct sequencing
Direct sequencing of ncpGS from accessions of cultivated oca and the wild tuber-bearing taxon using internal primers was employed at first to test whether all of the sequence classes (possible homeologues) found among the cloned sequences were present in a larger sample and later to confirm that the cloned class A sequences had derived from contamination. In the course of this screening of direct sequences, one other anomalous result was encountered. In eight out of nine oca accessions it was possible to confirm the presence of sequence classes B, C, and D. However, direct sequences of accession 02·08 showed no sign of the two characters, an indel and a substitution (positions 38 and 49, respectively, in Fig. 3; and positions 503 and 654 in alignment), that distinguish the class C sequences. The absence of this sequence class from accession 02·08 was confirmed by direct sequencing of two separate preparations of PCR products (from the same template genomic DNA), each of which comprised pooled products of 35 amplification reactions. Thus this plant has sequence classes B and D, but does not have class C.
Variation was also observed among direct sequences of ncpGS of three additional plants from the Bolivian wild tuber-bearing populations, only one of which (EE260) was collected from a locality relatively close to that of EE259. All three plants lacked two autapomorphies found in the cloned individual EE259 (a 20-base deletion and a nearby substitution) that distinguish the class B sequences of EE259 from those of oca. Thus, the class B sequences of these other three plants are better matches with those of oca than are the cloned class B sequences of EE259. Additional variation appears in the form of an apomorphic character state that occurs (as heterozygous) in two of the four plants. Little can be concluded from this variability at this level of sampling. However, it does indicate that diversity among the wild populations may allow future studies to identify which were involved in the origins of cultivated oca and to study the possibility of multiple origins of polyploidy or domestication.
| DISCUSSION |
|---|
|
|
|---|
The paucity of information about oca makes the interpretation of its sequence data even more complicated. Unlike the situation in more thoroughly studied polyploid crops, there are no prior hypotheses of its origin derived from morphological or cytological data for the molecular data to test and some of the cytological reports are conflicting. Most recent workers have found oca to have 2n = 8x = 64 in well over 100 oca accessions from diverse areas of the Andes (de Azkue and Martínez, 1990
; Medina Hinostroza, 1994
; Valladolid, Arbizu, and Talledo, 1994
; Valladolid, 1996
; Vinueza Vela, 1997
). Vinueza Vela (1997)
grouped oca chromosomes into eight homologous sets according to their form and banding pattern, but homeologous genomes were not distinguished. However, there have been conflicting chromosome counts reported in both older and more recent reports (e.g., Heitz, 1927
; Talledo and Escobar, 1995
; Hayano Kanashiro, 1998
). Lower euploid chromosome numbers have been reported for some cultivated oca accessions, notably from communities in areas of Bolivia where wild tuber-bearing populations occur (Guamán, 1997
). Screening of ploidy levels in ten cultivated oca accessions by flow cytometry found that these plants had DNA contents roughly four times that of diploid species in the alliance, thus confirming that they are in the octoploid range (Emshwiller, 2002b
). This does not preclude the possibility of aneuploidy, however. Vegetative propagation and dispersal by humans mean that oca may not be under the same selection pressures to regain fertility by reducing meiotic abnormalities that operate in seed-propagated species. It is also possible that populations of different ploidy levels were domesticated or that the octoploid level was reached after original domestication at lower ploidy levels. Further screening of other oca cultivars may resolve whether some have different ploidy levels.
Genomes of Oxalis tuberosa
In the absence of prior information from chromosome pairing studies, the results of analysis of ncpGS sequences of oca and wild Oxalis provide the first evidence of the different genomes present in Oxalis tuberosa.
Among the three ncpGS sequence classes found within each of the three genotypes of oca, two classes, B and D, were found in all nine oca accessions sampled by either molecular cloning or direct sequencing. Thus these two sequence classes exhibited fixed heterozygosity, which is consistent with the hypothesis that oca is allopolyploid. There can be problems in using fixed heterozygosity in a single gene, by itself, to infer allopolyploidy. Clonal propagation, which is the rule in cultivated oca, has been shown to maintain fixed heterozygosity in diploid parasitic protozoa (Tibayrenc, Kjellberg, and Ayala, 1990
). Autopolyploids can also appear to exhibit fixed heterozygosity because polysomic segregation leads to rare homozygotes, which can escape detection if sampling is insufficient (Vogel et al., 1999
). However, in the case of ncpGS data from oca, we have information about the orthologous sequences present in diploids, as well as phylogenetic information from the ncpGS gene tree. Sequence classes B and D grouped in morphologically divergent subclades within the O. tuberosa alliance (Figs. 4 and 5), and no diploids were found with sequences from both of these clades. With this phylogenetic information the occurrence of fixed heterozygosity is stronger evidence of allopolyploidy, an interpretation that is also reasonable because higher level polyploids (such as octoploids) in nature are rarely complete autopolyploids (Stebbins, 1947
, 1950
). Future studies are planned to test whether fixed heterozygosity is also present at other nuclear loci in oca.
The third sequence class, C, is present in eight out of nine oca plants sampled by either cloning or direct sequencing. The class C sequences appear to be another homeologous locus, representing a third genome type in octoploid oca, an interpretation that would be straightforward if this class were present in all plants sampled. However, its absence from one oca accession (02·08) leaves open several possible explanations (see also Fig. 5.7 in Emshwiller, 1999
): (1) It is possible that 02·08 is not octoploid, and so it does not have all the genomes present in other accessions, which would imply that there is ploidy level variation in cultivated oca. This plant was no longer available alive at the time that the flow cytometry study was conducted, so its ploidy level is unknown. (2) All oca accessions may be octoploid, but may have had multiple origins of polyploidy, in which some of the octoploids were formed without the contribution of the genome donor with the class C ncpGS sequence. (3) The class C sequence may have been lost in at least one oca lineage. Recent studies have demonstrated rapid sequence elimination in some polyploids even within a few generations after their formation (Parokonny et al., 1994
; Song et al., 1995
; Escalante et al., 1998
; Liu, Vega, and Feldman, 1998
; Liu et al., 1998
; Ozkan, Levy, and Feldman, 2001
; Shaked et al., 2001
; see also reviews by Soltis and Soltis, 1999
; Wendel, 2000
; and Pikaard, 2001
). (4) The class C sequence may represent gene flow (e.g., introgression between wild and cultivated populations) after the origins of the octoploid. The class C sequence of oca is not geographically restricted to the range of the wild species, O. picchensis, that also has this sequence class (see below). Thus, it would be necessary to suppose that the oca genotypes that contain this sequence were extensively selected and dispersed, by either natural or human means, to explain their predominance (eight out of nine) in the sampled accessions. More problematic is the requirement of this hypothesis for gene flow across differing ploidy levels. (5) The class C and class D sequences join different branches within the same subclade (the O. peduncularis clade) on the ncpGS gene tree. Some diploid taxa have been found to be polymorphic for sequence types that fall in similarly separated parts of that subclade (see asterisks in Figs. 4 and 5), suggesting the possibility that classes C and D represent alleles at homologous loci. However, other wild Oxalis taxa have been found that have better matches for each of these two sequence classes, and none have been found to be polymorphic for sequence classes C and D or sequences that join them on the gene tree, so it seems more likely that these two sequence classes in oca derive from separate species.
The first three possibilities above are consistent with the idea that the class C sequences are indeed homeologous loci and thus that at least some oca cultivars have three different genomes, represented by the B, C, and D sequence classes. Given current data, these possibilities seem less problematic than the latter two. As an octoploid, the crop may theoretically be derived ultimately from four diploid progenitor species, or at least have four homeologous paired sets of chromosomes. However, if the ncpGS sequence classes that have been distinguished among the cultivated oca clones do indeed represent the homeologous loci, there appear to be three classes, rather than the four homeologous loci that might theoretically be possible. Thus oca seems to be an autoallopolyploid, but the mode of origin and which of the genomes might be present in greater copy number than the others is yet unknown.
Putative progenitors of O. tuberosa
Among the wild Oxalis populations that were sampled for ncpGS, two taxa have sequences that match those of the different sequence classes of cultivated oca. One of these is the unnamed wild tuber-bearing taxon from Bolivia, in which the different populations sampled (one accession whose sequences were cloned and three that were sequenced directly) all have sequence classes B and D. Two of the cloned class D sequences of oca (accession 35·04, clones 11 and 16) are identical to one sequence from the wild taxon (accession EE259, clone 7). There is intraspecific variation among the class B sequences of both oca and the wild tuber-bearing populations, so they are not necessarily identical, but they do share a set of characters (see above and Fig. 6) that do not occur together in any other Oxalis sampled. As in the case of cultivated oca, the fixed heterozygosity of these two sequence classes, which join morphologically different subclades within the O. tuberosa alliance, provides evidence to support the conclusion that the wild Bolivian tuber-bearing populations are probably also allopolyploids. However, cytological information for these populations is as yet unknown, because living material was not available for analysis. Although triploid numbers have been reported for some Bolivian wild tuber-bearing Oxalis (Guamán, 1997
), these counts have not been independently confirmed. Some of the wild tuber-bearing Oxalis sampled in this study (i.e., EE259 and EE260) were collected from populations with all three style morphs present (most Oxalis species are tristylous), suggesting that the plants in these populations are reproducing by seed (Emshwiller and Doyle, 1998
). Thus, it is unlikely that they could be odd polyploids, which are usually sterile (Allard, 1960
).
The class C sequences of oca, on the other hand, were shared with O. picchensis, another wild tuber-bearing species found in the department of Cusco, Peru (the sequences of MHG913 clone 8 and 35·04 clones 3 and 5 are identical to that of O. picchensis). Estimation of DNA content by flow cytometry indicates that this taxon is tetraploid (Emshwiller, 2002b
). It is probably autotetraploid because the two plants sequenced had a single sequence class (one was heterozygous for a single one-base indel).
One interpretation of these data is that these wild tuber-bearing taxa (the populations of Bolivia on the one hand and O. picchensis on the other) may both be progenitors of domesticated oca. The Bolivian wild tuber-bearing Oxalis taxon may itself be a hybrid of two as yet unknown progenitors and possibly may be either tetraploid or hexaploid. Further hybridization with O. picchensis may have resulted in octoploid O. tuberosa.
In an autoallopolyploid, one of the homeologous genomes is present in greater copy number than the other(s). In the absence of information on chromosome pairing behavior, the ploidy level of the wild tuber-bearing taxon of Bolivia, or the mode of polyploidization (e.g., "asexual polyploidization," "unilateral sexual polyploidization," or "bilateral sexual polyploidization" sensu Mendiburu and Peloquin, 1976
), we can only speculate about the dosage of each genome. The relative intensity of the class C peaks in direct sequences of oca is much lower than the others, which might argue for the possibility that the octoploid could have been formed by unilateral sexual polyploidization (i.e., if the wild tuber-bearing taxon of Bolivia were hexaploid, it might have contributed a 2n [=6x] gamete that joined with a normal 1n [=2x] gamete from O. picchensis). This cannot be considered definitive evidence of gene dosage, however, because PCR amplification conditions can favor one sequence type over another (Wagner et al., 1993
).
Even with the caveats discussed above, current data and sampling support both of the wild tuber-bearing taxa as the best candidates as progenitors of domesticated O. tuberosa. These two taxa were the only ones sampled that had sequences that matched those of the various sequence classes of oca and grouped in the same places with the oca sequences on the ncpGS tree. These are also the only members of the O. tuberosa alliance that bear tubers. Tubers have also been observed in accessions of O. boliviana Britton (or perhaps O. rigidicaulis Knuth, which usually considered a synonym of O. boliviana, e.g., Lourteig, 2000
, but which may be distinct from that taxon) from Oxapampa, in the department of Pasco, Peru (AAV5413, housed in living collections of the International Potato Center, Lima, Peru). Ho