The Genomic Data Commons (GDC), is a research program of the National Cancer Institute (NCI).
The mission of the GDC is to provide the cancer research community with a unified data repository that enables data sharing across cancer genomic studies in support of precision medicine.
In Brief
-
Download Data with different methods
-
Data Wrangling
Example 1 Standarized Download
-
Click on
Repository
and chooseCases
to setup interested data: TCGA-LUNG. -
Choose
Files
and select interested demand, and then click onManifest
to download data1. -
Continue, select
Isoform Expression Quantification
to change Data Type and click onManifest
to download data2. -
Keep
Cases
selection, remove all previously selection on the page ofFiles
. -
Select clinical from
Data Category
and bcr xml fromData Format
, then Download data3. -
Count number of lines in all downloaded data.
|
|
- Download the GDC Data Transfer Tool Client.
|
|
- Check the usage of gdc-client.
|
|
- Check usage of gdc-client download.
|
|
- Download Manifest files.
|
|
- Check the information of files downloaded.
|
|
Data Wrangling
Data of overall Information
|
|
R Scripts For Single Sample
- Choose one file of clinical randomly to get the format of sample.
|
|
-
Follow the usage of R - XML Files.
-
Open R.
The xml file is read by R using the function xmlParse(). It is stored as a list in R.
-
Reading XML File.
1 2 3 4 5 6 7 8 9 10 11
# Load the package required to read XML files. library("XML") # Also load the other required package. library("methods") # Give the input file name to the function. result <- xmlParse(file = "nationwidechildrens.org_clinical.TCGA-75-7030.xml") # Print the result. print(result)
-
Get Number of Nodes Present in XML File.
1 2 3 4 5 6 7 8 9 10 11
# Exract the root node form the xml file. rootnode <- xmlRoot(result) # Find number of nodes in the root. rootsize <- xmlSize(rootnode) # Print the result. print(rootsize) output [1] 2
-
Details of the First Node and second Node.
1 2 3 4 5 6
# Exract the root node form the xml file. rootnode <- xmlRoot(result) # Print the result. print(rootnode[1]) print(rootnode[2])
-
XML to Data Frame.
1 2 3 4 5
# Convert the input xml file to a data frame. xmldataframe <- xmlToDataFrame(rootnode[2]) print(xmldataframe) t(xmldataframe) write.table(t(xmldataframe),'tmp')
-
Run R Scripts For All files
-
Complete scripts according to that of single sample.
-
Run Scripts.
Example 2 Fast Download
-
Enter Key words in Google and search: tcga gdc lusc.
-
Click on site 1, and download data of files.
-
Choose and download interested clinical data.
-
Choose and download interested Transcript data.
Run R.Scripts
-
Follow the usage of R-JSON Files and R - XML Files.
-
Run Rjson Script and get matched case id with file names.
In summary
- Anyway, here is the official GDC Documentation on line.