Building Blocks and Schemas for GA4GH Implementations
In this schema, a “biosample” as the source of the material of a molecular analysis (e.g. genomic array, sequencing), represents the main “biological item” against which molecular variants are referenced. PXF: See https://github.com/ga4gh-metadata/metadata-schemas/blob/d4ca1b4b36a5e7b3a17db79da9ae03a2114cfcaf/schemas/biometadata.proto#L84-L138 A Biosample refers to a unit of biological material from which the substrate molecules (e.g. genomic DNA, RNA, proteins) for molecular analyses (e.g. sequencing, array hybridisation, mass-spectrometry) are extracted. Examples would be a tissue biopsy, a single cell from a culture for single cell genome sequencing or a protein fraction from a gradient centrifugation. Several instances (e.g. technical replicates) or types of experiments (e.g. genomic array as well as RNA-seq experiments) may refer to the same Biosample. FHIR mapping: Specimen (http://www.hl7.org/fhir/specimen.html).
The schema definitions are done in the YAML file.
Property | Type | Format | Description |
---|---|---|---|
id | string | The local-unique identifier of this biosample (referenced as "biosample_id"). This is unique in the context of the server instance. | |
name | string | A short descriptive name for sample which should be sufficient to distinguish it from other samples in the project or collection. This is a label or symbolic identifier for the biosample. | |
description | string | A free text description of the biosample. This should not contain any structured data. | |
data_use_conditions | Data use conditions applying to data from this biosample, as ontology object (e.g. DUO). | ||
project_id | string | The id attribute of the project that this biosample was collected in. | |
individual_id | string | In a complete data model "individual_id" points to the "id" of the individual ("donor") this biosample was derived from.
In a local context this could be the id attribute in a corresponding "individuals" collection.
|
|
external_references | array | list of reference_class objects with properly (e.g. identifiers.org) prefixed external identifiers and a term describing the relationship | |
geo_provenance | This geo_class attribute ideally describes the geographic location of where the sample was extracted. Frequently this value may reflect either the place of the laboratory where the analysis was performed, or correspond to the corresponding author's institution. | ||
age_at_collection | The age of the individual at time of biosample collection, as Age_class object. | ||
biocharacteristics | array | "biocharacteristics" represents a wrapper list of "Phenotype_class" objects with properly prefixed term ids, describing features of the biosample. Examples would be phenotypes, disease codes or other ontology classes specific to this biosample. In a complete data model (variants - (callsets) - biosamples - individuals), characteristics applying to the individual (e.g. sex, most phenotypes) should be annotated there. | |
info | This is a wrapper for objects without further specification in the schema. | ||
created | timestamp | The creation time of this record, in ISO8601 | |
updated | timestamp | The time of the last edit of this record, in ISO8601 |
The local-unique identifier of this biosample (referenced as “biosample_id”). This is unique in the context of the server instance.
'id' : "AM_BS__NCBISKYCGH-1993"
A short descriptive name for sample which should be sufficient to distinguish it from other samples in the project or collection. This is a label or symbolic identifier for the biosample.
'name' : "Sample BRCA-00429, 2nd replicate"
A free text description of the biosample. This should not contain any structured data.
'description' : "Burkitt lymphoma, cell line Namalwa"
Data use conditions applying to data from this biosample, as ontology object (e.g. DUO).
'data_use_conditions' : {
'label' : 'no restriction',
'id' : 'DUO:0000004'
}
The id attribute of the project that this biosample was collected in.
'project_id' : "ind-cnhl-1293347-004"
In a complete data model “individual_id” points to the “id” of the individual (“donor”) this biosample was derived from.
In a local context this could be the id
attribute in a corresponding “individuals” collection.
'individual_id' : "ind-cnhl-1293347-004"
list of reference_class objects with properly (e.g. identifiers.org) prefixed external identifiers and a term describing the relationship
'external_references' : [
{
'type' : {
'id' : 'cellosaurus:CVCL_0312',
'label' : 'HOS'
},
'description' : 'Cellosaurus cell line identifier',
'relation' : 'provenance'
},
{
'relation' : 'report',
'description' : 'PubMed reference',
'type' : {
'label' : 'Rearrangement of the p53 gene in human osteogenic sarcomas.',
'id' : 'pubmed:2823272'
}
}
]
The query will return all biosamples reported in this publication
db.biosamples.find( { "external_references.type.id" : "pubmed:17440070" } )
This geo_class attribute ideally describes the geographic location of where the sample was extracted. Frequently this value may reflect either the place of the laboratory where the analysis was performed, or correspond to the corresponding author’s institution.
'geo_provenance' : {
'label' : 'Str Marasesti 5, 300077 Timisoara, Romania',
'latitude' : 45.75,
'altitude' : 94,
'longitude' : 21.23,
'country' : 'Romania',
'city' : 'Timisoara'
}
The age of the individual at time of biosample collection, as Age_class object.
'age_at_collection' : {
'age' : 'P56Y',
'age_class' : {
'label' : 'Juvenile onset',
'id' : 'HP:0003621'
}
}
“biocharacteristics” represents a wrapper list of “Phenotype_class” objects with properly prefixed term ids, describing features of the biosample. Examples would be phenotypes, disease codes or other ontology classes specific to this biosample. In a complete data model (variants - (callsets) - biosamples - individuals), characteristics applying to the individual (e.g. sex, most phenotypes) should be annotated there.
'biocharacteristics' : [
{
'description' : 'Pancreatic Adenocarcinoma',
'type' : {
'id' : 'icdot:C25.9',
'label' : 'Pancreas, NOS'
}
},
{
'description' : 'Pancreatic Adenocarcinoma',
'type' : {
'label' : 'Adenocarcinoma, NOS',
'id' : 'icdom:81403'
}
},
{
'type' : {
'id' : 'ncit:C8294',
'label' : 'Pancreatic Adenocarcinoma'
},
'description' : 'Pancreatic Adenocarcinoma'
}
]
The query will return all biosamples with an (exact) class.id of “icdom:81403” in their “biocharacteristics” object list.
db.biosamples.find( { "biocharacteristics.type.id" : "icdom:81403" } )
This call to the distinct funcion will return all bioterms ids for samples having some ncit id; to retrive only the ncit ids, this has to be followed by a regex filter (/^ncit/).
db.biosamples.distinct( "biocharacteristics.type.id", { "biocharacteristics.type.id" : { $regex : /ncit/ } } )
This is a wrapper for objects without further specification in the schema.
'info' : {
'followup_time' : 'P14M',
'death' : 1
}
This query retrieves biosamples with an ISO8601 period value for “followup_time” and a boolean “true” for death.
db.biosamples.find( {"info" : { $elemMatch: { "followup_time.value" : { $regex : /\P/ }, "death.value" : true } } } )
The creation time of this record, in ISO8601
'created' : "2017-10-25T07:06:03Z"
The time of the last edit of this record, in ISO8601
'updated' : "2022-11-11T09:45:13Z"