Patterns of Nuclear Genetic Variation in the Poecilogonous Polychaete Streblospio benedicti
- Department of Biology and Center for Genomics & Systems Biology, New York University, 12 Waverly Place, New York, NY 10003, USA
The evolution of marine larvae is replete with transitions in trophic mode, but little is known about how, at the genetic level, these transitions are achieved. Basic parameters, including the number of underlying loci, their molecular characteristics, and the population-genetic processes that drive transitions remain unknown. Streblospio benedicti, an abundant benthic polychaete with heritable poecilogony, provides a unique genetically tractable system for addressing these issues. Individuals of S. benedicti vary in diverse aspects of development. Some females produce small, planktotrophic larvae, whereas others produce large, yolky larvae capable of settling without feeding. Here, I present estimates of basic features of nuclear genetic variation in S. benedicti to lay the foundations for subsequent efforts to understand the genetic basis of poecilogony. Sequence of ∼20 kb of random nuclear DNA indicates that the nucleotide composition, at 62.1% A + T, is typical of lophotrochozoan genomes. Population-genetic data, acquired by sequencing two loci (∼2500 bp) in multiple animals of each developmental morph, indicate that the morphs exhibit very little differentiation at random loci. Nucleotide heterozygosity () is ∼0.5–1% per site, and linkage disequilibrium decays within a few kilobases ( ∼ 3 × 10−3 per site). These data suggest that genetic mapping by association will require a high density of markers, but linkage mapping and identification of regions of elevated inter-morph differentiation hold great promise.
The evolution of marine larvae is characterized by multiple independent transitions from obligate planktotrophy, with a long phase of larval dispersal, to obligate lecithotrophy, with dispersal abbreviated or abandoned (Strathmann 1978; Wray and Raff 1991; Smith et al. 2007). Many species exhibit intermediate phenotypic states, including facultative planktotrophy (Allen and Pernet 2007) and environmentally plastic development (Rice and Rice 2009). Evolutionary transitions from one fixed mode of development to another require, in addition, intermediate population-genetic states, as alleles that contribute to lecithotrophic development accumulate in genetically variable populations. Although all populations with derived lecithotrophic larvae have necessarily passed through a stage of developmental polymorphism, with heritable variation in developmental mode due to segregating alleles with effects on life history, only one species is known to exhibit such polymorphism in contemporary populations: the spionid polychaete Streblospio benedicti.
Since the discovery of developmental variation within and among S. benedicti populations (Levin 1984), this species has become a paragon of poecilogony and the subject of an extensive literature on development and ecology. Because trophic mode in S. benedicti is now understood to include facultative planktotrophy (Pernet and McArthur 2006), I refer to the two ends of the developmental spectrum as the “small-egg” and “large-egg” morphs, with the former encompassing the obligate planktotrophs and the latter including both facultative planktotrophs and obligate lecithotrophs.
The two developmental morphs differ in a large and diverse array of features, starting with the ova, which differ in volume by a factor of six (Levin and Bridges 1994). In the earliest stages of embryogenesis, the proportion of the zygote segregated into the polar lobe differs between morphs, as do the relative sizes of the blastomeres and micromeres (McCain 2008). Later in development, the morphs exhibit differences in the timing and pattern of development of the gut and coelem (Gibson et al. 2010; Pernet and McHugh 2010). Several traits exhibit discrete differences: larvae from small eggs produce long, conspicuous larval bristles that are absent from larvae from large eggs (Levin 1984), and they also have several cell types, including bacillary cells and some mucous cells, that large-egg larvae fail to generate (Gibson et al. 2010). Phenotypic differences between larval morphs are reflected in differences in transcriptome complexity (Marsh and Fielman 2005). Animals of the small-egg morph are released from the maternal brood pouches with fewer setigers than is true of the large-egg morphs and they spend longer in the water prior to settlement (Levin and Bridges 1994). Later still, animals of the two morphs differ in age and size at maturity and adults differ in brood size and in the investments of carbon and nitrogen in each brood (reviewed by Levin and Bridges 1994).
The heritable nature of peocilogony in S. benedicti was established through a series of meticulous and creative studies (reviewed by Levin and Bridges 1994). Experimental crosses confirmed the substantially genetic basis for the alternative larval types (Levin et al. 1991). Importantly, intermediate forms, putative hybrids between the discrete morphs described earlier, occur in natural populations (Levin and Huggett 1990), and their attributes, along with those of experimental intercross progeny, demonstrate that the genetic basis of developmental mode in S. benedicti is complex and multifarious, with different genetic bases for different aspects of the larval syndromes. Moreover, molecular population-genetic data from the mitochondrial genome indicate that the two developmental modes share alleles and are not genetically differentiated, at least at this single locus (Schulze et al. 2000; Mahon et al. 2009). These findings suggest that the molecular genetic basis for larval evolution in S. benedicti must involve multiple unlinked loci that are differentiated between the larval morphs on a genetic background homogenized by gene flow.
The experimental tractability of a species with heritable poecilogony holds out promise for the identification and molecular characterization of the causal alleles that underlie life-history transitions. The distribution of these causal alleles in natural populations and their temporal and spatial fluctuations could, in principle, be directly measured. Such studies would provide exceptional insight into the intersection of development, evolution and ecology.
Here, I provide a preliminary account of several aspects of the nuclear genome and its patterns of variation in S. benedicti as a first step toward creating tools to discover the causal alleles underlying the diverse differences between planktotrophic and lecithotrophic modes of larval development. I describe four attributes—nucleotide composition, heterozygosity, linkage disequilibrium, and population differentiation—that will be valuable for future efforts to map genes by linkage and association.
Materials and methods
Adult S. benedicti were collected from Newark Bay, New Jersey, and Long Beach, California. I collected animals from intertidal mud at the north end of Bayonne Park, Bayonne, New Jersey (40°41′11″N, 74°06′48″W). Animals from Long Beach were a generous gift from Bruno Pernet. From each locality, broods from at least six females were observed and in every case the Bayonne females produced small feeding larvae with long larval bristles and the Long Beach females produced large larvae, lacking larval bristles, and capable of settling without feeding. These animals are facultative planktotrophs and larvae fed Dunaliella salina ate it. The Bayonne population has been observed for 3.5 years and the larval phenotype has been consistent. Californian populations consist exclusively of the large-egg developmental phenotype (Levin 1984). Streblospio benedicti on the West Coast of North America is likely a recent introduction from East-Coast populations (Schulze et al. 2000).
Individual worms were maintained in artificial seawater (Instant Ocean®). After worms were isolated from all mud for at least 24 h to purge their guts and were washed several times with artificial seawater, they were frozen at −80°C. DNA was isolated from each worm either by phenol–chloroform extraction or by Qiagen’s DNeasy kit.
To generate genomic clones for sequencing, I digested 400 ng of genomic DNA isolated from a pool of eight Bayonne individuals with the restriction enzymes EcoRI and HindIII (NEB) for 8 h at 37°C and recovered digested DNA in several size ranges by extraction from a 0.8% agarose gel using Qiagen’s gel extraction kit. I ligated this DNA to the plasmid pUC19, which I had similarly double-digested and gel extracted and had also treated with shrimp alkaline phosphatase for 1 h at 37°C and then cleaned with a Qiagen PCR cleanup kit, and transformed One Shot® TOP10 chemically competent Escherischia coli (Invitrogen) with the ligation product. After blue-white screening on LB Amp X-gal plates, I picked colonies and miniprepped clones with Qiagen’s miniprep kit. Insert size and number were checked by digestion. The ends of 11 single-insert clones were Sanger sequenced by Genewiz, Inc. (South Plainfield, NJ, USA). The resulting sequences are deposited in Genbank as JS684826–JS684847.
To compare AT/GC ratios to other lophotrochozoans, I generated data sets of 22 fragments, size-matched to the S. benedicti clone-end reads, from among the largest 100 scaffolds in four genome assemblies, all downloaded January 9, 2012: Capitella_sp1.fasta.gz (Capitella teleta), Helobdella_robusta.fasta.gz (Helobdella robusta), Lotgi1_assembly_scaffolds.fasta.gz (Lottia gigantea), and aplCal1.fa.gz (Aplysia californica). The first 3 genomes were downloaded from the website of the Joint Genome Institute, and A. californica from the UCSC Genome Bioinformatics Site. I performed only this limited analysis of these genomes as a condition of their use prior to publication.
For five clones, I designed polymerase chain reaction (PCR) primers in Primer3 (Rozen and Skaletsky 2000) to amplify regions of ∼700 bp at each end. Three of these clones were not pursued further, as PCR amplified many nonspecific products or direct sequencing of PCR products revealed problems with paralogy. For the remaining two inserts, named locus 6B1 and locus 6B3, each end was amplified and bidirectionally sequenced from a panel of individual worms from each population. Chromatograms were analyzed using CodonCode Aligner 126.96.36.199.
Because of the considerable heterozygosity at indel and short tandem repeat loci, the haplotypes of many individuals were determined experimentally, either by isolating bands corresponding to individual alleles on agarose gels or by deconvoluting overlapping sequences from chromatograms. The remaining haplotypes for each sequenced region were inferred in PHASE 2.1.1 (Stephens et al. 2001; Stephens and Donnelly 2003), with experimentally determined haplotypes input as known. The number of sites with ambiguous phasing was very small and these were all singletons with two exceptions. Haplotypic phase connecting the two sequenced ends of each locus was also determined by PHASE, with the original clone sequence provided as a known haplotype.
Population genetic summary statistics (, , Tajima’s D, Rmin) for each of the four sequenced regions were estimated from phased haplotypes using libsequence (Thornton 2003), which also estimated P-values, from 1000 coalescent simulations, for the null hypothesis that Tajima’s D is drawn from a population at neutral mutation–drift equilibrium. Biallelic indel polymorphisms were treated as single nucleotide polymorphisms (SNPs) in these analyses.
The population-effective recombination rate, , was estimated in PHASE as the median of 10,000 samples from the posterior distribution of estimates of background recombination rate (Li and Stephens 2003). Patterns of linkage disequilibrium were analyzed in terms of pairwise r2 and four-gamete tests in Haploview (Barrett et al. 2005), using unphased data for all biallelic variants. These pairwise r2 values were also used to estimate from the relationship between r2 and physical distance, by nonlinear regression according to the formula of Weir and Hill (1986), implemented by the nls function in R 2.14.1 (R Development Core Team 2011).
I tested for differentiation between the Bayonne and Long Beach populations at each variable site using Fisher’s exact test, implemented in R, and a haplotype-based permutation test, implemented in PHASE. The latter test takes into account similarity among haplotypes and phase uncertainty, as described in the PHASE documentation, and estimates of P-values are based on 1000 permutations.
Eleven genomic fragments were cloned and end sequenced. The resulting 19,191 bp of sequence is 62.1% A + T, with the 22 sequenced regions ranging from 48.6% to 72.4%. This nucleotide composition is typical of lophotrochozoans, based on computational analysis of similar data sets for the annelids C. teleta and H. robusta and the mollusks L. gigantea and A. californica (Fig. 1).
To estimate the population-genetic parameters that govern association mapping power and precision, I generated population resequencing data for both ends (labeled a and b) of two clones, 6B1, and 6B3. The resulting data sets included 2532 bp of sequence (as measured on the reference clone sequence) for each of 12–13 animals of the large-egg morph, from Long Beach, California, and 17 animals of the small-egg morph, from Bayonne, New Jersey. Data are presented in Fig. 2 and summaries are provided in Table 1. The two loci are putatively autosomal, as animals of both sexes exhibited heterozygosity at each locus.
The sequenced regions contained 105 variants, including 11 variants inside a 1036 bp insertion polymorphism (as this insertion was not completely sequenced in most animals, these 11 variants were not included in subsequent analyses and are not represented in Fig. 2 or Table 1). The variants include 13 biallelic insertion/deletion polymorphisms, 89 SNPs, and 3 multiallelic repeat polymorphisms (two poly-A tracts of variable length and one complex polymorphism involving adjacent variable-length tracts of poly-G and poly-C; the latter could not be scored reliably and was excluded from further analysis). Of the 89 SNPs, two were triallelic. Counting each of these sites as both a transition and a transversion, the SNPs include 52 transitions and 39 transversions.
For three of the four sequenced regions, single samples exhibited homozygosity for rare alleles, suggestive of allelic dropout. At locus 6B1b, two samples failed to amplify entirely. Overall, however, the distribution of P-values for tests of Hardy–Weinberg equilibrium was consistent with expectations based on equilibrium.
Comparing the two populations, there are no fixed differences and no high-frequency private alleles (Fig. 2). At locus 6B3, none of the variable sites exhibits significant differentiation between populations, according to Fisher’s exact test applied to allele counts. One site reaches nominal significance (P = 0.016), with a minor allele frequency of 0.03 at Bayonne and 0.29 at Long Beach, but the statistical significance does not withstand correction for multiple tests. At locus 6B1, 11 of 50 sites are differentiated at P < 0.05, and although none withstands Bonferroni correction, the distribution of P-values is inconsistent with homogeneity of allele frequencies across populations according to Fisher’s combined probability test (χ2 = 144.03, df = 100, P = 0.003). The two most differentiated sites are a SNP at position 666, with allele frequencies of 0.23 at Bayonne and 0.62 at Long Beach, and a SNP at position 3921, which has a private allele at frequency 0.24 in the Long Beach population.
The results of these single-marker tests, which ignore linkage and genealogy, are corroborated by a haplotype-based permutation test, implemented in PHASE, which takes into account similarity among haplotypes and uncertainty in estimation of phase. The haplotype test finds significant differentiation between the populations at locus 6B1 (P = 0.012) but not at locus 6B3 (P = 0.150).
The estimated nucleotide heterozygosities () at each locus and for each population are very similar, ranging from 0.4% to 1.2% (Table 1). The allele frequency spectrum is consistent with neutral equilibrium, as indicated by the similarity of the two estimators of the population-effective mutation rate ( = 4Neµ, where Ne is the effective population size and µ is the mutation rate), and , and the consequent nonsignificance of Tajima’s D statistic.
All four of the sequenced regions show evidence of recombination (Rmin > 0). Estimates of the population-effective recombination rate ( = 4Nec, where c is the meiotic recombination rate) are similar in magnitude to the estimated population-effective mutation rate, , at 3.42 × 10−3 for locus 6B1 and 1.47 × 10−3 for locus 6B3. The history of recombination is reflected by the significant decay of linkage disequilibrium with distance (Fig. 3). Using the formula of Weir and Hill (1986), nonlinear regression of r2 on distance (for pairs of loci with minor allele frequencies of 0.1 or greater) is significant at each locus (6B1: t = 4.25, P = 3.08 × 10−5; 6B3: t = 2.41, P = 0.018) and provides estimates of similar to those found by PHASE ( = 3.88 × 10−3 for locus 6B1, 3.93 × 10−3 for 6B3).
The extent of linkage disequilibrium is illustrated in Fig. 4 by the pattern of r2 between each pair of loci with frequencies of minor allele greater than 0.1, and by the pattern of locus-pairs that exhibit all four allelic combinations (i.e. that pass the four-gamete test and therefore have been separated by historical recombination). These plots demonstrate that LD extends over hundreds of basepairs but is largely absent on kilobase scales. Over short physical distances, LD is heterogeneous, with some adjacent SNPs exhibiting complete equilibrium.
The findings reported here provide an outline of the features of nuclear genetic variation that will help guide subsequent efforts to map the alleles responsible for developmental evolution in S. benedicti.
Although nucleotide composition varies substantially among species of animals (Lynch 2007), S. benedicti’s nucleotide composition, estimated from ∼20 kb of data sampled at 11 random loci, is very similar to that of other lophotrochozoans. Knowledge of nuclear base composition plays a role in genetic mapping by facilitating the selection of restriction enzymes with the optimal density for reduced-representation methods of genome sequencing (Etter et al. 2011), which will be useful for mapping by both linkage and association.
Although I sampled only two loci for population-genetic analysis, they provide a remarkably consistent picture of nuclear genetic variation. Nucleotide heterozygosity is on the order of 0.5–1% per site, linkage disequilibrium decays on a scale of hundreds of base pairs, and differentiation between populations that differ in developmental mode is nearly absent.
The findings presented here extend the mitochondrial results of Schulze et al. (2000), Mahon et al. (2009), and Gibson et al. (2010) to the nuclear genome: random loci show extensive sharing of alleles between developmental morphs.
The estimate of nuclear nucleotide heterozygosity is about half that found previously for mitochondrial protein-coding sequence (Schulze et al. 2000; Mahon et al. 2009; Gibson et al. 2010). Given the evolutionary constraints on protein-coding sequence, the lower effective population size of the mitochondrial genome, and the greater susceptibility of mitochondrial genes to variation-reducing effects of selection at linked sites, these results imply that the nuclear genome has a much lower mutation rate than does the mitochondrial genome, a well-known phenomenon in animals.
Estimated nuclear nucleotide heterozygosity is relatively low for an invertebrate (Lynch 2007) and is perhaps surprising given the extraordinarily large census size of S. benedicti (Stehlik et al. 2005). One explanation may be that the enormous contemporary population size of S. benedicti is a recent phenomenon, a scenario made plausible by the colonization of the entire west coast of the United States by S. benedicti introduced there within the last 100 years (Schulze et al. 2000). However, recent expansion of the population is not evident in Tajima’s D statistic.
The pattern of linkage disequilibrium in S. benedicti suggests that the discovery of causal variants by genome-wide association mapping will be a challenging task. LD, as measured by the conventional disequilibrium statistic, r2, governs the power to detect a causal locus by genotyping a nearby locus. If a sample of N individuals genotyped at the causal site provides sufficient power to detect an association, equivalent power at a site in LD will require increasing the sample size by a factor of 1/r2 (Pritchard and Przeworski 2001). Consequently, the structure of LD and the extent of nucleotide heterozygosity together determine the number of sites that must be genotyped to effectively cover the genome in a genome-wide association study. For common alleles, the data indicate that variants within several hundred base pairs tend to exhibit high LD, such that one site is a useful proxy for others. At spatial scales measured in kilobases, LD decays to background levels. Consequently, a marker density of more than 1 kb−1 is probably required to effectively track the entire genome.
The most promising finding for future efforts in mapping is that random nuclear loci exhibit little differentiation between developmental morphs. Loci responsible for phenotypic divergence will therefore stand out as regions of high differentiation on a low-differentiation background. The width of these regions will be governed by interactions between the age of the causal alleles, the pattern and extent of gene flow, the strength of selection, the local rates of recombination, and the epistatic dependencies of the loci in generating larval phenotypes. The addition of transcriptome sequence and high-throughput genotyping to the arsenal of research on S. benedicti (Zakas et al. 2012) points toward a future in which the molecular genetic basis of heritable poecilogony will be revealed.
This work was supported by startup funds from New York University. The symposium was sponsored by the United States National Science Foundation (IOS-1157279), the Company of Biologists, Ltd., the American Microscopical Society, and the Society for Integrative and Comparative Biology, including SICB divisions DEDB, DEE, and DIZ.
I am very grateful to the symposium organizers for the opportunity to participate. I thank Jasmine Nicodemus for assistance with DNA preparations and Bruno Pernet for his advice and gift of animals from Long Beach.
From the symposium “Poecilogony as a Window on Larval Evolution: Polymorphism of Developmental Mode within Marine Invertebrate Species” presented at the annual meeting of the Society for Integrative and Comparative Biology, January 3–7, 2012 at Charleston, South Carolina.
- © The Author 2012. Published by Oxford University Press on behalf of the Society for Integrative and Comparative Biology. All rights reserved. For permissions please email: email@example.com.