************************ ISB-CGC Hosted Data Sets ************************ Part of the mission of the ISB-CGC has been to explore the best ways to use the available cloud technologies to provide access to the hosted data. To this end, the hosted data is made available using these three main Google Cloud Platform technologies: * `Google BigQuery `_ (BQ), a massively-parallel analytics engine is ideal for working with data that is essentially tabular in nature. This includes, the high-level clinical, biospecimen, and molecular data from the main NCI programs. It is also where we store a large amount of metadata about files that are more appropriately stored in Google Cloud Storage, as well as genome reference sources (*eg* GENCODE, miRBase, *etc*). All of these datasets and tables are completely *open access* and available to the research community. * `Google Cloud Storage `_ (GCS), a cloud-hosted object-store is used to store other types of (typically binary) data which is typically processed by custom software pipelines. In our case this means the low-level sequence data, in BAM or FASTQ format, as well as pathology and radiology images (in SVS or DICOM format). All controlled-access data is currently only available in GCS -- access to these data requires that a user walk through the required `authentication and authorization steps `_. * `Google Genomics `_ (GG), provides a new way to work with sequence-level data, via the `GA4GH API `_. If and when the research community shifts away from BAM files towards using the GA4GH API, using this technology as our primary data-store may make more sense. Please refer to the sections below for more details about the data available in these three Google Cloud technologies: .. toctree:: :maxdepth: 1 data2/data_in_BQ.rst data2/data_in_GCS.rst data2/data_in_GG.rst