DNA Copy-Number Segments

The Copy_Number_segments table contains one row per copy-number segment per TCGA aliquot. Each TCGA aliquot is uniquely represented by a TCGA barcode of length 24, eg TCGA-04-1517-01A-01D-0533-01. (For more information on how TCGA barcodes were created and how to “read” a TCGA barcode, click on the preceding link.)

There is also a GDC Copy_Number_segments table that has been reprocessed against the HG38 genomic build.

Platform

DNA Copy-Number data was generated for the TCGA project using the Affymetrix GenomeWide Human SNP 6.0 Array.

Pipeline

DNA Copy-Number data was generated for the TCGA project at the Broad Genome Characterization Center. A DESCRIPTION.txt file is included with each data archive at the DCC describing the algorithms, methods, and protocols used to produce the Level-1, Level-2, and Level-3 data.

ETL Details

Each Level-3 data archive contains 4 output files per sample assayed: two based on the hg18 reference, and two based on the hg19 reference for the TCGA HG19 data table. The BigQuery table is populated only with the files ending with nocnv\_hg19.seg.txt. The num_probes and segment_mean fields in the raw files are sometimes represented using Exponential Scientific Notation (eg 8.7E+07) and were interpreted as integer or floating-point values respectively.

The mapping between TCGA aliquot barcodes and Level-3 data files was obtained from the SDRF file.


Have feedback or corrections? You can file an issue here or email us at feedback@isb-cgc.org.