Biospecimen¶
The
Biospecimen
table contains one row per TCGA sample. Each TCGA sample is
uniquely represented by a
TCGA barcode
of length 16, eg TCGA-2G-AAM4-10A. (For more information on how TCGA barcodes
were created and how to “read” a TCGA barcode, click on the preceding link.)
XML Parsing¶
The TCGA data at the DCC exists in XML files which have been uploaded into Google Cloud Storage. Selected fields from these XML files were then extracted and loaded into the “Biospecimen” table in BigQuery.
Some of the biospecimen values in the XML files are available on a per-slide and/or per-portion basis, and these have been aggregated and averaged. The number of slides and the number of portions per sample is also included in the table.
Filters¶
- Samples for which
is\_ffpe=Truewere removed. - Patients or Samples for which
Projectvalue was notTCGAwere removed.
The following fields were extracted from the ssf XML file:
days\_to\_sample\_procurementtissue\_anatomic\_sitetissue\_anatomic\_site\_descriptiontissue\_anatomic\_site