Annotations

The TCGA Annotations BigQuery table was created based on the contents of the JSON file obtained from the TCGA Annotation manager Web Service API. The deeply nested JSON file was first flattened, and then a subset of the fields were selected to be loaded into the BigQuery table. In the flattening process, sub-level field names were prefixed with the parent name, separated by an underscore. These names have now been updated to reflect their names in the Genome Data Commons (GDC) Annotations API. Please refer directly to BigQuery for the table schema.

Original field name New field name
items_disease_abbreviation project_short_name
items_item entity_barcode
items_itemType_itemTypeName entity_type
annotationCategory_categoryName category
annotationCategory_annotationClassification_annotationClassificationName classification
notes_noteText notes
notes_dateAdded date_created
notes_dateEdited date_edited
not available case_gdc_id
not available case_barcode
not available sample_barcode
not available aliqout_barcode

Sample and Participant barcodes are filled in (ie not null) whenever the “entity_barcode” is at least 16 or 12 characters long, respectively. For example, a “Shipped Portion” would result in a filled in “case_barcode” and “sample_barcode” fields. Please note, however, that the annotation applies only to the item specified in the “entity_barcode” field, the type of the item is specified in the “entity_type” field with the following caveat. If an annotation is on the case, then it applies to all its samples, if on a sample, to all its portions but does not apply to other samples for that case, and so on down to the aliquot, which only applies to that aliquot.

Please note that the TCGA annotations available at the NCI Genomic Data Commons may differ from those found in this BigQuery table. Please let us know if you have questions or concerns about the contents of this table.


Have feedback or corrections? You can file an issue here or email us at feedback@isb-cgc.org.