Types of Data
We process and analyze many different
data types, such as:
-
RNA-sequencing
-
gene expression microarray
-
whole-exome and whole genome sequencing
-
ATAC and ChIP- sequencing
-
single cell RNA-sequencing
-
mass cytometry (CyTOF)
-
small RNA- sequencing
Machine Learning
We use artificial intelligence to help you answer your hypotheses.
We model to predict classification, outcomes, and biomarker discovery.
Unsupervised Machine Learning
We perform principal components analysis and different clustering methods
to help you understand the structure of your data.
Supervised Machine Learning
We use single or integrated datasets to build models that can determine biomarkers and predict an outcome,
such as drug sensitivity or classification. We use cross-validation
and various machine learning techniques such as elastic net regression, random forest,
and Support Vector Machine.
Statistical Analysis
Utilizing either individual datasets or integrating
multiple datasets, we can answer hypotheses using
statistical methods, such as Student's t-test, correlation, regression,
and survival analysis. For example, we can determine
if a phenotype of interest is associated with a somatic
mutation using logistic regression. We can perform
survival analysis, using Cox Proportional Hazards Model
to determine if a somatic mutation is prognostic or
predictive of treatment. We can perform either simple
two-way correlation or pairwise correlation of
an entire data set.
Data Mining
There is a wealth of publicly available datasets that can help you learn more about own data.
We can mine these datasets, perform analyses, and integrate them with other datasets, including your own data.
Common Analyses:
-
Cohort Characterization: assessing The Cancer Genome Atlas, GENIE, MSK-IMPACT, and others, to compare
gene expression or determine the mutation frequency of different cancer types.
-
Cell line characterization:analyzing Cancer Cell Line Encyclopedia, Genomics of Drug Sensitivity in
Cancer, Project Achilles, and other publicly available data. These projects have numerous data types available
including knock-out/down screens, gene expression, mutation, drug sensitivity, and more. We can calculate associations,
combine this data with your own data, or build models.
-
Tissue characterization:using GTEx and Human Protein Atlas, we can determine what genes or proteins are expressed
in different healthy human tissues.