Data Sources
In addition to what is described in this section, to fully understand the data served by the CIP-API please refer to to the following sources of information:
- Cancer analysis technical documentation: https://www.genomicsengland.co.uk/about-genomics-england/the-100000-genomes-project/information-for-gmc-staff/cancer-programme/cancer-genome-analysis/
- Cancer additional information: https://www.genomicsengland.co.uk/about-genomics-england/the-100000-genomes-project/information-for-gmc-staff/cancer-programme/genome-analysis/
- Rare disease analysis technical documentation: https://www.genomicsengland.co.uk/?wpdmdl=15664
- CIP-API user documentation: https://cipapi-documentation.genomicsengland.co.uk/
- Gel Models definition: https://gelreportmodels.genomicsengland.co.uk/(Please note that this documentation contains older and newer versions of the models, in the CIP-API user documentation is described which one of them applies in each case)
- Interpretation Portal Documentation: https://ip-documentation.genomicsengland.co.uk/
These documents not only describe the methodology but also the sources used.
Data sources¶
The following table details the data sources and versions used by the Tiering Interpretation Service when populating Interpretated Genomes in the CIP-API
Tiering Interpretation Service (Cancer and Rare Disease)¶
Data Source Name | Data Source Reference | Resource Type | Analysis Type (used in: cancer/rare disease/ both) | Resource Version |
---|---|---|---|---|
ENSEMBL_gene (Data loaded in cellbase) |
Collection of gft, fasta and gff files for Ensembl ftp://ftp.ensembl.org/pub/release-90/gtf/homo_sapiens/*90.gtf.gz ftp://ftp.ensembl.org/pub/release-90/fasta/homo_sapiens/pep/.pep.all.fa.gz ftp://ftp.ensembl.org/pub/release-90/fasta/homo_sapiens/cdna/.cdna.all.fa.gz ftp://ftp.ensembl.org/pub/release-90/regulation/homo_sapiens/MotifFeatures.gff.gz |
Genomic Entities e.g. genes, transcipts | Both | 90 |
ClinVar (Data loaded in cellbase) |
CellBase ClinVar release data ftp://ftp.ebi.ac.uk/pub/databases/eva/ClinVar/2015/ClinVar_Traits_EFO_Names_260615.csv ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/tab_delimited/variant_summary.txt.gz ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/tab_delimited/variation_allele.txt.gz" |
Clinically Relevant Variants | Both | 2019-06 |
COSMIC (Data loaded in cellbase) |
https://cancer.sanger.ac.uk/cosmic | Clinically Relevant Variants | Cancer | v89 |
1000 genomes project (Data loaded in cellbase) |
https://www.internationalgenome.org/about | Variant Population Frequencies (aka allele frequencies) | Both | Phase3 (original data in assembly GRCh37, it was liftover to GRCh38) |
DiscovEHR (Data loaded in cellbase) |
http://www.discovehrshare.com/ | Variant Population Frequencies (aka allele frequencies) | Both | GHS Freeze 50 (original data in assembly GRCh37, it was liftover to GRCh38) |
GoNL (Data loaded in cellbase) |
http://www.nlgenome.nl/ | Variant Population Frequencies (aka allele frequencies) | Both | Release 5 (original data in assembly GRCh37, it was liftover to GRCh38) |
gnomAD (Data loaded in cellbase) |
https://gnomad.broadinstitute.org/ | Variant Population Frequencies (aka allele frequencies) | Both | 2.0.1 (original data in assembly GRCh37, it was liftover to GRCh38) |
UK10K project (Data loaded in cellbase) |
https://www.uk10k.org/ | Variant Population Frequencies (aka allele frequencies) | Both | N/A data obtain in 2016-02-15 (original data in assembly GRCh37, it was liftover to GRCh38) |
Cancer Analysis Resources | A set of Gene list used in cancer analysis. They are described and version in:https://www.genomicsengland.co.uk/about-genomics-england/the-100000-genomes-project/information-for-gmc-staff/cancer-programme/cancer-genome-analysis/ | Additional Resources | Cancer | v1.11 |
CellBase database version | The cellbase database version. Used internally to control the overall version of our data sources | - | Both | 2.4.0 |
Last update:
2023-03-01