What We Do

Types of Data

We process and analyze many different data types, such as:

  • RNA-sequencing
  • gene expression microarray
  • whole-exome and whole genome sequencing
  • ATAC and ChIP- sequencing
  • single cell RNA-sequencing
  • mass cytometry (CyTOF)
  • small RNA- sequencing

Machine Learning

We use artificial intelligence to help you answer your hypotheses. We model to predict classification, outcomes, and biomarker discovery.

Unsupervised Machine Learning

We perform principal components analysis and different clustering methods to help you understand the structure of your data.

Supervised Machine Learning

We use single or integrated datasets to build models that can determine biomarkers and predict an outcome, such as drug sensitivity or classification. We use cross-validation and various machine learning techniques such as elastic net regression, random forest, and Support Vector Machine.

Statistical Analysis

Utilizing either individual datasets or integrating multiple datasets, we can answer hypotheses using statistical methods, such as Student's t-test, correlation, regression, and survival analysis. For example, we can determine if a phenotype of interest is associated with a somatic mutation using logistic regression. We can perform survival analysis, using Cox Proportional Hazards Model to determine if a somatic mutation is prognostic or predictive of treatment. We can perform either simple two-way correlation or pairwise correlation of an entire data set.

Data Mining

There is a wealth of publicly available datasets that can help you learn more about own data. We can mine these datasets, perform analyses, and integrate them with other datasets, including your own data.

Common Analyses:

  • Cohort Characterization: assessing The Cancer Genome Atlas, GENIE, MSK-IMPACT, and others, to compare gene expression or determine the mutation frequency of different cancer types.
  • Cell line characterization:analyzing Cancer Cell Line Encyclopedia, Genomics of Drug Sensitivity in Cancer, Project Achilles, and other publicly available data. These projects have numerous data types available including knock-out/down screens, gene expression, mutation, drug sensitivity, and more. We can calculate associations, combine this data with your own data, or build models.
  • Tissue characterization:using GTEx and Human Protein Atlas, we can determine what genes or proteins are expressed in different healthy human tissues.