GA4GH SchemaBlocks

Building Blocks and Schemas for GA4GH Implementations

View the Project on GitHub ga4gh-metadata/SchemaBlocks

GA4GH variant

The document describes attributes of the variant object. In its current implementation, valiant (and related genomic objects such as callset) represent extended versions of the original, VCF-derived GA4GH schema. This format may be superseeded or augmented based on current developments in the GA4GH::GKS work stream.

The schema definitions are done in the YAML file.

Variant

Properties of the Variant class

Property Type Format Description
alternate_bases string one or more bases relative to start position of the reference genome,replacing the reference_bases value; for precise variants
biosample_id The identifier ("biosample.id") of the biosample this variant was reported from. This is a shortcut to using the variant -> callset -> biosample chaining.
callset_id string The identifier ("callset.id") of the callset this variant is part of.
created string The creation time of this record, in ISO8601
digest string concatenated unique specific elements of the variant
end array int64 array of 0 (for presise sequence variants), 1 or 2 (for imprecise end position of structural variant) integers
genotype array list of strings, which represent the (phased) alleles in which the variant was being observed
id string The local-unique identifier of this variant (referenced as "variant_id").
info additional variant information, as defined in the example and accompanying documentation
mate_name string Mate name (chromosome) for fusion (BRK) events; otherwise left empty. Accepting values 1-22, X, Y.
reference_bases string one or more bases at start position in the reference genome, which have been replaced by the alternate_bases value; for precise variants
reference_name string Reference name (chromosome). Accepting values 1-22, X, Y.
start array int64 array of 1 or 2 (for imprecise end position of structural variant) integers
updated string The time of the last edit of this record, in ISO8601
variant_type string the variant type in case of a named (structural) variant (e.g. DUP, DEL, BRK ...)

Extended notes and examples on the Variant properties


alternate_bases

one or more bases relative to start position of the reference genome,replacing the reference_bases value; for precise variants

Example

'alternate_bases' : "AC"

biosample_id

The identifier (“biosample.id”) of the biosample this variant was reported from. This is a shortcut to using the variant -> callset -> biosample chaining.

Example

'biosample_id' : "pgx-bs-987647"

callset_id

The identifier (“callset.id”) of the callset this variant is part of.

Example

'callset_id' : "PGX_AM_CS_GSM1690424"

created

The creation time of this record, in ISO8601

Example

'created' : "2017-10-25T07:06:03Z"

digest

concatenated unique specific elements of the variant

Example

'digest' : "4:12282-46465:DEL"

end

array of 0 (for presise sequence variants), 1 or 2 (for imprecise end position of structural variant) integers

Example

'end' : [
  21977798,
  21978106
]

Queries:

the query will return all variants with any overlap of the CDKN2A CDR

db.variants.find( { "reference_name" : 9,  "variant_type" : "DEL", "start" : { $lteq : 21975098 }, "end" : { $gteq : 21967753 } } )

genotype

list of strings, which represent the (phased) alleles in which the variant was being observed

Example

'genotype' : [
  '1',
  '.'
]

id

The local-unique identifier of this variant (referenced as “variant_id”).

Example

'id' : "amvar-8754-7751-1119-8539"

info

additional variant information, as defined in the example and accompanying documentation

Example

'info' : {
  'cnv_value' : '-0.294',
  'cnv_length' : 1205290
}

mate_name

Mate name (chromosome) for fusion (BRK) events; otherwise left empty. Accepting values 1-22, X, Y.

Example

'mate_name' : 14

reference_bases

one or more bases at start position in the reference genome, which have been replaced by the alternate_bases value; for precise variants

Example

'reference_bases' : "G"

reference_name

Reference name (chromosome). Accepting values 1-22, X, Y.

Example

'reference_name' : 8

start

array of 1 or 2 (for imprecise end position of structural variant) integers

Example

'start' : [
  20867740
]

updated

The time of the last edit of this record, in ISO8601

Example

'updated' : "2022-11-11T09:45:13Z"

variant_type

the variant type in case of a named (structural) variant (e.g. DUP, DEL, BRK …)

Example

'variant_type' : "DEL"