Programmatic Interfaces¶
The changes needed to support multiple programs have rendered the V1 and V2 APIs non-functional
and therefore users must migrage all API calls to the V3 version. Note that this usually means
just a minor adjustment to the URL. Some of the examples in the github repository
may still reference the V1 or V2 API.
Programmatic access to molecular data and metadata within the ISB-CGC platform uses a combination of ISB-CGC APIs and Google APIs, as illustrated by the block diagram on the front page of this documentation.
- The ISB-CGC API provides programmatic access to data and metadata stored in CloudSQL. This includes information describing TCGA patients and samples, data availability, user-created cohorts, etc. In this section of our documentation, you will find more details about using the ISB-CGC API.
- Native Google APIs are used for optimized, high-speed programmatic access to molecular data in BigQuery, Google Cloud Storage, or Google Genomics. Code examples illustrating usage of these Google APIs are available in the ISB-CGC code repositories on github. Additional Google Cloud Platform Documentation for some of the key technologies leveraged by the ISB-CGC platform can be found by following these links:
ISB-CGC API¶
The ISB-CGC API provides an interface to the ISB-CGC metadata stored in CloudSQL, and consists of several “endpoints”, implemented using Google Cloud Endpoints. Details about these endpoints can be found here, and examples illustrating usage from R and Python can be found in our examples-R and examples-Python repositories on github.
Some example use-cases include:
- obtaining a list of patient identifiers based on a defined set of criteria;
- obtaining a list of sample identifiers, associated with a specific patient;
- obtaining detailed metadata about a particular patient or sample;
- creating (or retrieving a previously saved) cohort of patients and samples, based on a defined set of criteria;
- obtaining a list of data files in Cloud Storage, associated with a specific sample, cohort, platform, or data-type (or any combination thereof);
The APIs Explorer can be used to see details about each endpoint, and also provides a convenient interface to test an endpoint through your web browser. Following the link in the previous sentence will take you to a page with a list of APIs, in which each API consists of a set of functionally-related endpoints. Together, these individual APIs make up the ISB-CGC API. (Note that not all of these APIs are intended for direct use by end-users: some are intended for use only by the ISB-CGC Web-App, as described in the information on the first APIs Explorer page mentioned above.)
Cohorts are the primary organizing principle for subsetting and working with the TCGA data.
A cohort is a list of samples and a list of patients.
Users may create and share cohorts using the ISB-CGC web-app and then programmatically
access these cohorts using this API.
(TCGA samples are identified using a
16-character “barcode” eg TCGA-B9-7268-01A,
while patients are identified using the 12-character prefix, ie TCGA-B9-7268, of the sample barcode.
Other datasets such as CCLE may use other less standardized naming conventions).
Usage¶
Endpoints are simple https GET or PUT requests, eg:
V3 TCGA - GET https://api-dot-isb-cgc.appspot.com/_ah/api/isb_cgc_tcga_api/v3/tcga/cases/TCGA-B9-7268
V3 TARGET - GET https://api-dot-isb-cgc.appspot.com/_ah/api/isb_cgc_target_api/v3/target/cases/TARGET-20-PABLDZ
V3 CCLE - GET https://api-dot-isb-cgc.appspot.com/_ah/api/isb_cgc_ccle_api/v3/ccle/cases/FU-OV-1
The first three GET commands above illustrates the usage with the new program-specific V3 endpoints.
The url (without the “GET” command) can also be pasted directly into your browser, like this or this. Packages are available in most languages to allow you to easily perform https GET and PUT requests, such as the httr package for R, and the Python requests library.
In addition, the
Google Python API Client Library
can be used to build a service object which provides a functional interface to the resources defined by the API.
(Examples of this approach can be found in the examples-Python github repo, specifically the
api_test_service*.py scripts.)
Authorization¶
Some, but not all, of the endpoints require authorization. This authorization is not related to controlled-access data: these endpoints do not operate on or directly return any controlled data. Instead, authorization is related to saving or retrieving cohorts because cohorts are private to the user who created the cohort (and anyone the cohort owner has chosen to share the cohort with). Helper scripts, described below, are provided to access these endpoints from the command line.
Note: Prior to using any endpoints that require authorization, a user must have signed into the web application at least once.
Examples¶
from Python¶
Step 1: A python helper-script,
isb_auth.py,
can be used to start the OAuth flow and store the users credentials in a file named ~/.isb_credentials
$ python isb_auth.py
This script will open a new tab in your browser and ask you to sign in with your google identity
(eg your gmail address). The first time, you will also be asked to grant the ISB-CGC application
permission to see your email address.
Once authenticated, your access and refresh tokens are written to
~/.isb_credentials. You may use the --verbose flag when running this script
to see the contents and name of this file.
If you are running this script via ssh (or from Cloud Shell),
the --noauth_local_webserver flag will allow you to obtain a verification code through your local browser.
Step 2: Once you have a ~/.isb_credentials file
(either locally on your laptop, or on a GCE VM, or in Cloud Shell),
you can access any API requiring authentication using another helper-script,
isb_curl.py
$ ## usage: python isb_curl.py {ENDPOINT_URL}
$ python isb_curl.py https://api-dot-isb-cgc.appspot.com/_ah/api/isb_cgc_api/v2/cohorts
from R¶
The Examples-R (ISBCGCExamples) package contains a number of functions that “wrap” the http endpoints calls, making it easier to access your cohorts and query the database.
Step 1: After starting R, and loading the ISBCGCExamples, you can use the R helper script isb_init
to go through the authentication process:
> library(ISBCGCExamples)
> token <- isb_init()
Use a local file to cache OAuth access credentials between R sessions?
1: Yes
2: No
Selection: 1
Waiting for authentication in browser...
Press Esc/Ctrl + C to abort
Authentication complete.
The isb_init function will open a new tab in your browser and ask you to sign in with your google
identity (eg your gmail address). The first time, you will also be asked to grant the ISB-CGC
application permission to see your email address.
Once authenticated, your access and refresh tokens are written to your working directory in a
file named .httr-oauth.
Step 2: Using the endpoints
After authentication, any of the example endpoint functions can be used such as:
list_cohorts(token)
which returns a list of the user’s previously created cohorts. Documentation for these functions can be found in the ISB-CGC github repo, Examples-R under ‘API Endpoints Interface’.
ISB-CGC API (v3)¶
The endpoints have been reorganized to support the multiple programs that now have data in the ISB-CGC. These endpoints are now organized into four different sections: TCGA, CCLE, TARGET and common endpoints.
Please Note: For the create.cohort API for all programs require the user to select inbetween the brackets to view the possible filter(s) for cohort being built.
Details for each of these endpoints can be found below:
Universal Endpoints
TCGA Endpoints
TARGET Endpoints
CCLE Endpoints