FastSimCoal: A Beginner’s Guide to Demographic Inference
What FastSimCoal does
FastSimCoal (fsc or fastsimcoal2) is a coalescent-based simulator and inference tool used to model genetic variation under complex demographic scenarios. It simulates genetic data under specified models (population splits, size changes, migration, admixture) and estimates parameters by comparing observed and simulated site frequency spectra (SFS).
When to use it
- You have genome-wide SNP data summarized as an SFS.
- You want to estimate parameters like divergence times, effective population sizes, migration rates, and admixture proportions.
- You need to compare alternative demographic models using likelihood-based model selection.
Key concepts
- Site Frequency Spectrum (SFS): counts of allele frequencies across polymorphic sites; the primary data summary used by FastSimCoal.
- Coalescent simulation: backward-time simulation of genealogies under demographic models to generate expected SFS.
- Composite likelihood: FastSimCoal computes a composite likelihood of the observed SFS given parameters; it assumes independence among sites.
- Parameter estimation via optimization: the program uses many simulated SFS realizations and an optimization (EM-like) algorithm to find parameter values that maximize the composite likelihood.
Input data and formats
- Observed SFS: multi-dimensional SFS file (unfolded or folded) per population.
- .est file: lists parameters to estimate, bounds, and starting values.
- .par (or .tpl) file: model template describing populations, events, migration matrices, and loci.
- VCF or genotype data: processed into SFS using tools like easySFS, dadi’s scripts, or custom converters.
Installing FastSimCoal
- Download fastsimcoal2 binary from the official repository or release page.
- For macOS/Linux, unpack and move the executable to a directory in PATH, or run from its folder.
- Ensure required dependencies for pre-processing (Python, easySFS) are installed if converting VCFs.
Building a simple model (example)
- Define a two-population split with constant sizes and no migration.
- Create a template (.tpl) with population sample sizes, number of loci, and sequence length per locus.
- Create an .est file with parameters: N1, N2, split time T. Provide realistic bounds.
- Prepare the observed 2D SFS from your data (folded if no outgroup).
- Run fastsimcoal2 to estimate parameters and compute likelihoods:
Code
./fsc26 -t model.tpl -e model.est -n100 -N100000 -L40 -q
- -n: number of optimization cycles; -N: number of simulations per likelihood estimate; -L: number of ECM loops.
Practical tips for beginners
- Start simple: fit basic models first before adding migration or size changes.
- Use folded SFS if you lack a reliable ancestral state.
- Set sensible parameter bounds to avoid long searches in unrealistic space.
- Increase simulations (N) and loops (L) for final runs to get stable estimates; use smaller values for testing.
- Run multiple independent replicates with different starting seeds to check convergence.
- Parallelize by running independent replicates on multiple cores or nodes.
- Check identifiability: some parameters (e.g., migration vs. recent divergence) can be confounded; use model comparison and prior biological knowledge.
Model comparison and validation
- Use Akaike Information Criterion (AIC) or likelihood ratio tests between nested models to compare fits.
- Perform parametric bootstraps: simulate data under the inferred model, re-estimate parameters, and assess confidence intervals and biases.
- Visualize predicted vs. observed SFS residuals to identify misfit.
Common pitfalls
- Mis-specified locus lengths or mutation rates leading to incorrect scaling of time and N.
- Overparameterized models that the data cannot inform.
- Ignoring linkage: SFS-based composite likelihood assumes independence, so include only unlinked SNPs or account for linkage in interpretation.
Example workflow checklist
- Convert VCF → filtered SNPs → unlinked set.
- Generate folded/unfolded SFS.
- Draft simple model (.tpl/.est).
- Test with low N/L, inspect outputs.
- Refine model, increase N/L, run multiple replicates.
- Perform bootstraps for CIs and model checks.
- Report parameter estimates with uncertainty and biological interpretation.
Further learning resources
- FastSimCoal user manual and example files (included with the software).
- Tutorials converting VCF to SFS (easySFS, dadi docs).
- Papers applying FastSimCoal for demographic inference to follow practical examples.
Short example command
Code
./fsc26 -t example.tpl -e example.est -n50 -N50000 -L40 -q
This guide gives a concise starting path for using FastSimCoal to infer demographic history from SFS data.
Leave a Reply