FAQ


Which version of the human genome was used during alignment ?

For the current release all samples were aligned to Hs37d5, which is a modified version of the GRCh37 reference genome developed during the 1000 genomes project, which improves the quality of variant calling when using short-read alignment. ftp://ftp.ncbi.nih.gov/1000genomes/ftp/technical/reference/phase2_reference_assembly_sequence/.

Which transcript set is used ?

For the current release the transcript set corresponds to that of Ensembl Version 75, the final full release pertaining to GRCh37. http://feb2014.archive.ensembl.org/index.html.

Which alignment and variant calling software were used to identify variants ?

For the current release we used BWA-mem (v0.7.8), and GATK Haplotype Caller (v3.6) to generate gVCFs.

Are any variants filtered prior to being uploaded to the platform ?

We perform minimal filtering of very poor quality variants, removing any variants for which the read-depth (DP) was less than 8, or the genotype quality (GQ), as provided by GATK, was less than 20.

Why do I see my variant when I search for it in the individual, but when I preform a trio analysis the variant no longer shows ?

This is most likely because for one of the other members of your trio, the concommitant position does not meet one (or more) of your filtering thresholds. e.g. if DP is set to 20 and GQ to 50, and your candidate de novo variant at Chr1:100,200,345 has a DP of 35 and GQ of 99, it will of course show in the individual analysis. However, with the same filters set for all samples, should it happen that in the same position, the mother is homozygous reference, but with a DP of only 18, and GQ of 99, that position will not be returned by the platform, since for a position to be displayed, all samples must pass the set thresholds. One way to make the variant reappear would be to set the DP to 10 (or anything below 19), for the sample pertaining to the mother, and hence the position would reappear.

When I use the SNV Effect Prediction filters, are they implemented using a logical OR, or a logical AND ?

Currently the filters are implemented as OR i.e. if you select D for mutation taster and D for SIFT then the platform will return variants flagged as D in either, or both.

How are candidate compound-heterozygote mutations identified within the platform ?

Compound heterozygotes are identified at the level of the transcript. For any particular transcript, any affected individuals must have at least one pair of heterozygous mutations that are not found co-segregating in any non-affected individual, nor may either be found in a homozygous alternative state in any non-affected individual. Note that any filters defined prior to performing a compound heterozygote search will be required to be fulfilled by both variants for them to be returned as candidates by the platform.

Can I upload a list of candidate genes ?

Yes. You can paste a comma-separated list of HGNC gene symbols into the Gene Name(s) box in the Gene and Chromosome Coordinates section, and filter based only on those genes. We are not currently performing validation of gene names, so if a gene name is not a correct HGNC symbol (consistent with those used by Ensembl Version 75, no variants will be returned for that name. If in any doubt, please consult http://www.genenames.org/.

Which versions of ExAC, CADD and 1000 genomes project data does the platform use ?

The platform is currently using annotation data from ExAC version 0.1 ftp://ftp.broadinstitute.org/pub/ExAC_release/release0.1/, CADD version 1.0 http://cadd.gs.washington.edu/download, and the first release of 1000GP frequencies.

What is meant by High, Moderate etc. in the Variant Class filter ?

This annotation is provided by SnpEff. Please consult the Effect prediction details section of the SnpEff manual for further detail.

When I examined my data in a different variant browser, I saw less variants. Why ?

One reason may be because we have not restricted our calls to just the capture regions. Meaning that if a particular call appears to be of sufficiently high quality to be regarded as bona fide it will be included in our call set even if it is outwith the target region of the capture kit.