FAQ

General FAQs

Correlation Engine applications ("apps") can be accessed from the top of every page within Correlation Engine. This area, called the app menu, allows you to query any app whenever you need it.

To query an app:

Go to the top of any page within Correlation Engine.
Enter your term in the query field (under the app menu).
Click the icon for the app you want to query.

If you're already viewing results for a query term, you can use that same query term to query a different app. Just click the icon for that app.

How do I query an app with a bioset?

This process works a little differently.

To query an app with a bioset:

Find a bioset of interest. (You can find biosets you've imported in the My Studies page, expand results from other apps, or browse the Curated Studies app page.)
Click the mini-QuickView icon next to the bioset's name.
Click the icon for the app you want to query.

You'll also see app icons in results pages: next to a bioset name, or in a menu when you move the cursor over an acceptable query term (e.g., the name of a gene). In these cases, just click the app icon to query it with that particular bioset or query term.

How do I query an app with a sequence region?

To query an app with a sequence region:

Click the "Search sequence regions" link next to the query button, under the app menu.
Choose an organism and a chromosome from the drop-down menus.
Enter the start and stop coordinates. (The region must be less than 10 million bases.)
Click the icon for the app you want to query.

If you want to query with a regular query term, click the "Go back to main search" link.

You can query QuickView, Curated Studies, Disease Atlas, Pharmaco Atlas, Knockdown Atlas, and Genome Browser with a sequence region. If you query Genome Browser with a sequence region, Genome Browser will launch in a new window, focused on that region.

How does Correlation Engine determine the content of query results?

Correlation Engine applications ("apps") use public and private experimental data as their sources—not PubMed or other scientific literature sources. This means that you may potentially discover information that was absent or not well-supported in the original research literature.

For example: If you query Disease Atlas with a gene, the results will display diseases highly correlated with that gene. These results come from experimental studies in which a significant result was found for that gene.

Correlation Engine has scored and ranked all listed diseases using factors such as: study tags, the gene's significance within a study, and the total number of studies for a disease in which that gene was measured.

What are Correlation Engine's data sources?

Correlation Engine uses a combination of public, private, and proprietary information.

Data correlations. The Correlation Engine library of genomic data comes from several public sources, including:

Correlation Engine's curation team also manually curates studies from published literature.

Gene/SNP identifiers. Correlation Engine recognizes commonly used public gene identifiers as well as specific vendor identifiers. Correlation Engine maps individual gene identifiers to standard reference identifiers using the following sources:

To enable seamless comparison across different species, Correlation Engine uses ortholog information from:

Ontologies for semantic tagging. Correlation Engine has developed standardized vocabularies with which to tag its biosets. Sources include:

Phenotypes: SNOMED® Clinical Terms
Tissues: MeSH
Compounds: MeSH and PubChem

How were the auto-complete terms selected?

Correlation Engine's auto-complete and tag cloud terms come from the following ontologies and indexes:

Genes and SNPs:
- Entrez Gene
- UniGene
- Ensembl
- RefSeq
- GenBank
- dbSNP
Phenotypes: SNOMED® Clinical Terms
Tissues: MeSH
Compounds:
- MeSH
- PubChem
Biogroups:
Organisms: MeSH
Authors: PubMed

How are Correlation Engine results scored and ranked?

Correlation Engine uses proprietary algorithms to calculate and rank the diseases most significantly correlated with a queried gene, SNP, sequence region, bioset, or biogroup.

First, we identify individual biosets that are significantly correlated with your query term. Based on the statistical significance of these correlations, we then rank all of the studies that contain correlated biosets.

For example, when ranking Disease Atlas results for a queried gene, we consider the following:

The total number of disease-specific studies in which the gene was measured;
The number of disease-specific studies in which the gene was found to be significant;
The ranks of the gene in each of the disease studies;
The consistency of the gene's association across the disease studies.

Depending on which app you've queried, we group correlated studies together based on our gene indexes, standardized vocabularies, and semantic tags (e.g., Disease Atlas results are grouped by disease). We call this process "categorization".

During categorization, we apply additional statistical criteria, such as correction for multiple hypothesis testing. Then we rank the diseases by statistical significance. We assign a numerical score of 100 to the most significant result, and normalize the other results' scores to the top-ranked result.

For a detailed description of Correlation Engine's methods, please see our paper, "Ontology-Based Meta-Analysis of Global Collections of High-Throughput Public Data" (Kupershmidt et al., 2010) in PLoS ONE.

Is a low-scoring result nonsignificant?

No. Correlation Engine scores all results relative to the top-ranked result, whose score is set to 100. Although a low-scoring result might have less statistical significance compared to the top-ranked result, it could still have real biological relevance.

What do the Venn diagrams in my expanded results mean?

The Venn diagram shows you the following information:

How many genes are in your queried bioset or biogroup;
How many genes are in the correlated bioset or biogroup;
How many genes are in both.

The p-value is calculated as the probability that such an overlap would occur by chance (assuming that your bioset or biogroup is not actually correlated with the target bioset or biogroup).

How can I download a list of the studies shown in my query results?

To download a list of studies:

From your results page, click the Correlated Studies tab.
Click the Export Results button.

The Export Results button is available throughout the Correlation Engine website for paid subscribers. This button will export all appropriate details for the results page you are viewing.

What do the abbreviations in the Supporting Data Types column mean?

Here are the definitions for the Supporting Data Types:

CN: DNA Copy Number
DT: Therapeutic
GM: Germline Mutation
GT: SNP GWAS
HA: Histone Acetylation
HM: Histone Methylation
HU: Histone Ubiquitination
ME: DNA Methylation
MI: miRNA Expression
MU: Mutations/Phenotypic
PD: Protein-DNA Binding
RE: RNA Expression
SM: Somatic Mutation
AT: ATAC-Seq

If you're on a results page, you can also move the mouse cursor over each abbreviation to see its definition.

QuickView FAQs

What is QuickView?

QuickView is a Correlation Engine application ("app") that you can query to get a quick, top-level view of all the data and information Correlation Engine has about a particular gene, SNP, sequence region, biogroup, bioset, phenotype, tissue, or compound.

There are several ways to query QuickView:

By default. Go to any non-app page (e.g., Home, My Studies, My Projects), and enter a term into the query field at the top of the page. Then press the Enter key on your keyboard.
By clicking its icon in the app menu. Go to the top of any app page and enter a term into the query field. Then click the QuickView icon .
By clicking its icon in expanded results. Click the mini-QuickView icon next to any query term listed in a results page. Your action will query QuickView with that term.

What can my QuickView results tell me?

QuickView has two tabs that organize your results:

Correlation Engine Summary. This tab lists the information that is available only through Correlation Engine:
- The top five data correlations between your query term and each relevant Correlation Engine app;
- The top five correlated studies from Curated Studies;
- The top five hits from Literature.
- The top five hits from Clinical Trials.
Click any result name to view its expanded result within that app. To see the full query results for that app, you can click an app name, an app icon, or the Explore Results link.
General Info. This tab displays general biomedical knowledge for your query term. This allows you to view a vast range of public information and Correlation Engine data correlations together in one place.

What are the sources for the General Info tab in QuickView?

QuickView compiles information from PubMed, Gene Ontology, dbSNP, PubChem, MSigDB and other public data sources.

Curated Studies FAQs

What is Curated Studies?

Curated Studies is a Correlation Engine application ("app") that lets you browse or query all datasets that you or Correlation Engine have imported.

To query Curated Studies:

Go to the top of any page within Correlation Engine.
Enter your term in the query field (under the app menu).
Click the Curated Studies icon above the query field.

How do I query Curated Studies with a bioset?

This process works a little differently.

To query Curated Studies with a bioset:

Find a bioset of interest. (You can find biosets you've imported in the My Studies page, expand results from other apps, or browse the Curated Studies app page.)
Click the mini-QuickView icon next to the bioset's name.
Click the Curated Studies icon above the query field.

What kinds of query terms can I use with Curated Studies?

You can query Curated Studies with just about any query term or keyword. Results will depend on what type of query term you enter.

For genes, SNPs, sequence regions, biogroups, or biosets: Your results will display as a ranked list of studies in which your query term was found to be significantly correlated.
For concepts (phenotypes, tissues, or compounds): Your results will display as a list of studies that mention, or are tagged with, your query term.

You can also query Curated Studies with keywords that don't belong to the above types of query terms. In this case, Curated Studies will show studies resulting from a text-based search with your query term.

Finally, you can browse studies without using a query term as a starting point from within the app. To do this, go to the Curated Studies home page and click the All Curated Studies button. You'll then see a list of all curated studies that you can filter and browse.

How does Curated Studies differ from other apps?

Results from Curated Studies are only lists of individual studies correlated with or referencing your query term. They are not categorized or grouped in any way. By contrast, Disease Atlas groups studies according to disease, and Genetic Markers groups studies according to gene or SNP.

How do I go back to All Available Studies from a Curated Studies query result?

Click the "All Curated Studies" button.

What can I do with Curated Studies?

You can use Curated Studies to directly inspect public genomic data. This can be useful if you're interested in all studies (general, or correlated to a query term) that have been performed on a specific data type, experimental design, or species.

Because other Correlation Engine apps rank results by statistical significance to your query term, you can also use Curated Studies to look up negative results. For example, you can find out whether a gene of interest was not significant in a particular kind of experimental study.

How do I see the statistics for a specific bioset?

To see the statistics (e.g., p-value, fold change, copy number change, score) for a specific bioset:

Click a study title. This will show the bioset(s) within the study that correlate to your query term.
Click a bioset title. The column types displayed will depend on the data type. For example, a public RNA expression bioset will contain fold change and p-value, while a copy number bioset will show either copy number change or Z-score.
On the bioset view, click on scatter plot to visual bioset data as a 2-dimension plot of genes based on the available statistical data. Users can use drop downs to assign statistics to the X and Y axes and generate a new plot. When a statistic is recognized as a p-value, q-value or similar, a new statistic is computed to provide the option of the negative log10 of that statistic.

How do I see the details of an individual study?

To see the detailed information for a study, click on "Full Study Details" under the entry on the curated studies page. The study details tab provides background information on the study from the original source. The biosets tab shows the list of biosets in the study, the tags associated with those biosets and supplementary files that a user can download. The related genes and biogroups table shows a meta-analysis of all the gene signatures in the study and a meta-analysis of all biosets in the study with pathway enrichment. The images tab displays available associated plots and metrics for the study.

Body Atlas FAQs

What is Body Atlas?

Body Atlas is a unique tool that you can use to find normalized gene expression across all tissues, cell types, cell lines, and stem cells in the Correlation Engine library.

A Body Atlas query for a gene, bioset or biogroup will produce a list of tissues and cell types ranked by relevance. You can also sort your results by absolute gene expression or body system, or across all body systems.

A tissue or cell line (biosource) query will result in a list of genes ranked by expression levels in the queried tissue or cell line. You can also view genes ranked by tissue-specific expression, or cell line specific-copy number variations and mutations.

Use Body Atlas to:

Identify where previously uncharacterized genes are expressed. Use Body Atlas to look for high- or low-expressing cell types and cell lines that can serve as useful model systems for your queried gene.
Identify gene expression patterns. Use Body Atlas to look for tissue or cell line specific gene expression levels that can serve as genetic markers for your experiments.
Find biogroup information. Biogroup results are assigned a p-value based on the overlap between the biogroup and each tissue or cell type. A directional arrow indicates whether overlapping genes are predominantly up- or down-regulated.
Find bioset information. Bioset results are designated as positively or negatively correlated with a tissue or cell type. (This is because biosets contain directional data.)

To query Body Atlas:

Go to the top of any page within Correlation Engine.
Enter your term in the query field (just below the app menu).
Click the Body Atlas icon above the query field.

How do I query Body Atlas with a bioset?

This process works a little differently.

To query Body Atlas with a bioset:

Find a bioset of interest. (You can find biosets you've imported in the My Studies page, expand results from other apps, or browse the Curated Studies app page.)
Click the mini-QuickView icon next to the bioset's name.
Click the Body Atlas icon above the query field.

What tissues and cell types are covered in Body Atlas?

The Body Atlas biosets have been drawn from all RNA expression studies that have used the Affymetrix GeneChip© Human Genome U133 Plus 2.0 Array for human studies, and the Affymetrix GeneChip© Mouse Genome 430A 2.0 Arrays for mouse studies.

We incorporated our data as follows:

128 human tissues from 1,068 arrays
170 human cell types from 1,125 arrays
748 human cell lines from 881 arrays
52 stem cells from 141 arrays
151 mouse tissues from 2,730 arrays
409 mouse cell types from 1585 arrays

How has Correlation Engine normalized gene expression across tissues and cells?

First, we perform a per-chip median normalization on probesets common to all platforms. We then combine these probesets using quantile normalization.

Intensities for probesets unique to particular platforms are rescaled to the same per-chip median; we then fit them by linear interpolation, using the intensities of the common probesets between platforms as a reference.

How does Correlation Engine calculate its scores for biogroup or bioset query results?

The score for a given tissue or cell represents the magnitude of the correlation score between a queried bioset or biogroup, and the gene expression bioset for that tissue or cell.

When you query Body Atlas with a biogroup (a nondirectional set of genes), results include a direction column that indicates the sign of the correlation score.

When you query Body Atlas with a bioset that contains directional information (e.g., gene expression fold change for a condition of interest), the results include a correlation column. This column indicates whether correlation was positive or negative.

What are the different ways to view Body Atlas results?

You can view Body Atlas content in two ways. Gene, biogroup or bioset queries display a list of tissues, cell types, cell lines or stem cells related to the query term. Tissue or cell line queries display a list of gene expression levels corresponding to the query term.

Gene, biogroup or bioset queries in Body Atlas will display tissues grouped by body system as the default view. Click the corresponding tab to view cell types, cell lines, and stem cells. Clicking the name of a body system will jump to that group's results.

To rank Body Atlas results strictly by degree of expression or correlation—without categorizing them into groups—choose "Ranks" from the "View by:" menu. Clicking the name of a body system will highlight tissues and cell types that belong to that body system.

Tissue or cell line queries will display a list of expression levels of all genes in the queried biosource or tissue. Click the corresponding tab to view tissue-specific gene expression or a complete list of somatic mutations and copy number changes in a particular cell line.

What is Body Atlas RNA-seq based (GTEx)?

The Body Atlas biosets have been drawn from RNA-seq expression studies taken from the Genotype-Tissue Expression project (GTEx).

The GTEx project is a publicly funded project that aims to provide a comprehensive atlas of gene expression and regulation across multiple human tissues; additional information can be found at GTEx project.

RNA samples used in GTEx were extracted from normal human tissues, poly-A selected, and sequenced using Illumina 74bp paired-end technology.

For Correlation Engine Body Atlas, we downloaded the raw read data, subjected data to stringent quality controls and processed it using RNA Express 1.0 pipeline. A subset of 505 high confidence samples were used.

RNA Express pipeline was developed at Illumina and is available on BaseSpace.

Tissue specific gene ranks were derived from differential expression p-values (tissue of interest vs all tissues). P-values were calculated by edgeR package.

The Genotype - Tissue Expression (GTEx) Project was supported by the Common Fund of the Office of the Director of the National Institutes of Health (commonfund.nih.gov/GTEx). Additional funds were provided by the NCI, NHGRI, NHLBI, NIDA, NIMH, and NINDS. Donors were enrolled at Biospecimen Source Sites funded by NCI \ Leidos Biomedical Research , Inc. sub contracts to the National Disease Research Interchange (10XS170) , Roswell Park Cancer Institute (10XS171), and Science Care, Inc. (X10S172). The Laboratory, Data Analysis, and Coordinating Center (LDACC) was funded through a contract (HHSN268201000029C) to the The Broad Institute, Inc. Biorepository operations were funded through a Leidos Biomedical Research, Inc. sub contract to Van Andel Research Institute (10ST1035). Additional data repository and project management were provided by Leidos Biomedical Research, Inc. (HHSN261200800001E). The Brain Bank was supported supplements to University of Miami grant DA006227. Statistical Methods development grants were made to the University of Geneva (MH090941 & MH101814 ), the University of Chicago (MH090951 , MH090937 , MH101825, & MH101820 ), the University of North Carolina - Chapel Hill (MH090936) , North Carolina State University (MH101819), Harvard University (MH090948) , Stanford University (MH101782), Washington University (MH 101810), and to the University of Pennsylvania (MH101822). The datasets used for the analyses described in this manuscript were obtained from dbGaP at http://www.ncbi.nlm.nih.gov/gap through dbGaP accession number phs000424.v4.p1.

What tissues and cell types are covered in Body Atlas RNA-seq based (GTEx)?

We incorporated our data as follows::

50 human tissues from 505 RNA-seq samples are represented.

What do the Venn diagrams in expanded Body Atlas results mean?

The Venn diagram shows you the following information:

How many genes are in your queried biogroup or bioset;
How many genes are differentially expressed in the correlated tissue or cell type (which is treated as a bioset);
How many genes are in both.

The p-value is calculated as the probability that such an overlap would occur by chance under the assumption that your bioset is not actually correlated with the tissue or cell type.

How do I use the Body System locator?

When viewing by category, click the name of a body system to jump to that category in the results.

When viewing by rank, click the name of a body system to highlight all tissues or cell types belonging to that category in the results.

How can I export Body Atlas results?

To export Body Atlas results as a .csv file:

Go to your Body Atlas results page.
Click the Export button on the right side of the page.

Genetic Markers FAQs

What is Genetic Markers?

Genetic Markers is a Correlation Engine application ("app") that finds genes and SNPs significantly correlated to a phenotype or compound. Results are ranked in order of statistical significance.

Correlation Engine determines a marker's significance through a meta-analysis that takes into account the marker's rank across all studies tagged with the queried phenotype or compound.

To query Genetic Markers:

Go to the top of any page within Correlation Engine.
Enter your term in the query field (under the app menu).
Click the Genetic Markers icon above the query field.

What kinds of query terms can I use with Genetic Markers?

You can query Genetic Markers with a phenotype or compound. The first tab of the results page (Correlated Genes) will show a list of genes correlated with your query term, ranked by statistical significance. Click the Correlated SNPs tab to view correlated SNPs.

To see a list of studies that support the correlations, click a gene or SNP name.

What types of data are used to rank Genetic Markers results?

Correlation Engine integrates multiple types of genomic data to rank the significance of genes and SNPs tagged with a given phenotype or compound. These data types may include RNA and miRNA expression, SNPs identified through GWAS, epigenetic data, CNVs, and mutation data. In addition, Genetic Markers uses curated data from OMIM, Jackson Labs, and DrugBank.

I queried Genetic Markers with a gene or SNP, but got an error message. How do I find all of the studies or biosets related to a gene or SNP?

To find all of the curated studies that are correlated with a gene or SNP, query Curated Studies. (Genetic Markers only accepts phenotypes or compounds as query terms.)

Querying Curated Studies will return a ranked list of all studies that contain your gene or SNP as a significant result.

I don't see a particular gene or SNP listed in my Genetic Markers results. Does this mean that the gene or SNP is not associated with my query term?

Possibly. A gene or SNP may be absent from results because either a) Correlation Engine has no studies tagged with your query term, or b) studies tagged with your query term are not significantly correlated with the gene or SNP you're interested in.

To find data for a gene or SNP not listed in Genetic Markers results, browse Curated Studies:

Click the Curated Studies icon in the app menu.
Click the All Curated Studies button.
Click the Keyword link in the filter bar.
Enter the name of the gene or SNP. (You can also select the name from the auto-complete menu.)
Click the Apply Filter button.

How do I see the statistics for a specific gene or SNP listed in Disease Atlas results?

To see the statistics (e.g., p-value, fold change, copy number change, score) for a specific gene/SNP:

Click the Correlated Genes or Correlated SNPs tab.
Click a gene or SNP name to see a breakdown of supporting studies by organism and data type.
Click the View Individual Studies button to see all studies in which the specific gene/SNP was found to be significant that have been tagged with your query term.

To expand each study:

Click a study name.
This shows the bioset(s) within the study that correlate to your query term.

To see statistics and scores from a bioset:

Click a bioset name.
The column types displayed will depend on the data type. For example, a public RNA expression bioset will contain fold change and p-value, while a copy number bioset will show either copy number change or Z-score.

Why is the number of studies shown in Correlated Genes or Correlated SNPs different from the number shown in the Correlated Studies tab?

There are two possibilities. The Correlated Studies view shows all of the public or private studies that contain a significant correlation with your query term. The Correlated Genes and Correlated SNPs views, on the other hand, group these studies by category (in this case, by gene or by SNP).

Studies that appear in the Correlated Studies view might be excluded from the Correlated Genes or Correlated SNPs view if they've failed—when grouped—to meet additional scoring and ranking significance criteria (e.g., correction for multiple hypothesis testing).

Differences between study counts can also occur if private studies—studies you imported or those that others have imported and shared with you—contain significant correlations with your query term. Private studies with correlations are shown in the Correlated Studies view, but are not included in the categorization process.

(If you are a Correlation Engine Enterprise user, studies that belong to Enterprise projects are included in categorized results. However, studies that you have created or imported on your own are not.)

Disease Atlas FAQs

What is Disease Atlas?

Disease Atlas is a Correlation Engine application ("app") that finds diseases, traits, conditions, and surrogate endpoints associated with a gene, sequence region, SNP, biogroup, or bioset. Results are grouped by disease and ranked according to statistical significance.

Disease Atlas categories only include the subset of phenotypes that have been specifically tagged "disease". So while you can query other apps with the phenotype "aging", you won't find "aging" among the categories listed in Disease Atlas results.

To query Disease Atlas:

Go to the top of any page within Correlation Engine.
Enter your term in the query field (under the app menu).
Click the Disease Atlas icon above the query field.

How do I query Disease Atlas with a bioset?

This process works a little differently.

To query Body Atlas with a bioset:

Find a bioset of interest. (You can find biosets you've imported in the My Studies page, expand results from other apps, or browse the Curated Studies app page.)
Click the mini-QuickView icon next to the bioset's name.
Click the Disease Atlas icon above the query field.

What kinds of query terms can I use with Disease Atlas?

You can query Disease Atlas with a gene, sequence region, biogroup, or bioset. The Correlated Diseases tab will show a list of diseases correlated with your query term, grouped into broad disease categories. Clicking the name of a disease category will jump to the results for that category.

To display diseases ranked by statistical significance, choose "Categories" from the "View by:" menu. Clicking the name of a disease category will highlight results for that category.

To see a list of studies that support the correlations, click a disease name.

I queried Disease Atlas with a disease but got an error message. How do I find all of the studies or biosets related to a disease?

To find all of the curated studies that are correlated with a disease or other phenotype, query Curated Studies. (Disease Atlas only accepts genes, sequence regions, SNPs, biogroups, or biosets as query terms.)

Querying Curated Studies will return a ranked list of all studies that are tagged with your disease or other phenotype.

I don't see a particular disease listed in my Disease Atlas results. Does this mean that the disease is not associated with my query term?

Possibly. A disease may be absent from results because either a) Correlation Engine has no studies tagged with the disease, or b) studies tagged with the disease are not significantly correlated with your query term.

To find data for a disease not listed in Disease Atlas results, browse Curated Studies:

Click the Curated Studies icon in the app menu.
Click the All Curated Studies button.
Click the Keyword link in the filter bar.
Enter the name of the disease. (You can also select the name from the auto-complete menu.)
Click the Apply Filter button.

How do I see statistics for a specific disease in my results?

To see the statistics (e.g., p-value, fold change, copy number change, score) for a specific disease:

Click the Correlated Diseases tab.
Click a disease to see a breakdown of supporting studies by organism and data type.
Click the View Individual Studies button to see all studies that have a significant correlation to your query term that have been tagged with the specific disease.

To expand each study:

Click a study name.
This shows the bioset(s) within the study that correlate to your query term.

To see statistics and scores from the expanded bioset:

Click a bioset name.
The column types displayed will depend on the data type. For example, a public RNA expression bioset will contain fold change and p-value, while a copy number bioset will show either copy number change or Z-score.

How do I use the Disease Category locator?

When viewing by category, click the name of a disease category to jump to that category in the results.

When viewing by rank, click the name of a disease category to highlight all diseases belonging to that category in the results.

Why is the number of studies shown in Correlated Diseases different from the number shown in the Studies For... tab?

There are two possibilities. Both tabs show all of the public and private studies tagged with a disease that contain a significant correlation with your query term.

The Correlated Diseases view, however, shows your results categorized by disease.

Studies that appear in the Studies For... tab might be excluded from the Correlated Diseases tab if they've failed—when grouped by disease—to meet additional scoring and ranking significance criteria (e.g., correction for multiple hypothesis testing).

Differences between study counts can also occur if private studies—studies you imported or those that others have imported and shared with you—contain significant correlations with your query term. Private studies with correlations are shown in the Studies For... tab, but are not included in the categorization process.

(If you are a Correlation Engine Enterprise user, studies that belong to Enterprise projects are included in categorized results. However, studies you have created or imported on your own are not.)

Pharmaco Atlas FAQs

What is Pharmaco Atlas?

Pharmaco Atlas is a Correlation Engine application ("app") that finds compounds and treatments significantly correlated to a gene, sequence region, biogroup, or bioset. Results are ranked in order of statistical significance.

To use Pharmaco Atlas:

Go to the top of any page within Correlation Engine.
Enter your term in the query field (under the app menu).
Click the Pharmaco Atlas icon above the query field.

How do I query Pharmaco Atlas with a bioset?

This process works a little differently.

To query Pharmaco Atlas with a bioset:

Find a bioset of interest. (You can find biosets you've imported in the My Studies page, expand results from other apps, or browse the Curated Studies app page.)
Click the mini-QuickView icon next to the bioset's name.
Click the Pharmaco Atlas icon above the query field.

What kinds of query terms can I use with Pharmaco Atlas?

You can query Pharmaco Atlas with a gene, sequence region, biogroup, or bioset. The Correlated Compounds tab (the first tab of the results page) will show a list of compounds and treatments that are correlated with your query term, ranked by statistical significance.

To see a list of studies that support the correlations, click a compound name.

I queried Pharmaco Atlas with a compound, but got an error message. How do I find all of the studies or biosets related to a compound?

To find all of the curated studies that are correlated with a compound, query Curated Studies. (Pharmaco Atlas only accepts genes, sequence regions, biogroups, or biosets as query terms.)

Querying Curated Studies will return a ranked list of all studies that are tagged with your compound.

I don't see a particular compound listed in my Pharmaco Atlas results. Does this mean that the compound is not associated with my query term?

Possibly. A compound may be absent from results because either a) Correlation Engine has no studies tagged with the compound, or b) studies tagged with the compound are not significantly correlated with your query term.

To find data for a compound not listed in Pharmaco Atlas results, browse Curated Studies:

Click the Curated Studies icon in the app menu.
Click the All Curated Studies button.
Click the Keyword link in the filter bar.
Enter the name of the compound. (You can also select the name from the auto-complete menu.)
Click the Apply Filter button.

How do I see statistics for a specific compound in my Pharmaco Atlas results?

To see the statistics (e.g., p-value, fold change, copy number change, score) for a specific compound:

Click the Correlated Compounds tab.
Click a compound name to see a breakdown of supporting studies by organism and data type.
Click the View Individual Studies button to see all studies that have a significant correlation to your query term, and which have also been tagged with the specific compound.

To expand a study for more detail:

Click a study name.
This shows the bioset(s) within the study that correlate to your query term.

To see statistics and scores from a bioset:

Click a bioset name.
The column types displayed will depend on the data type. For example, a public RNA expression bioset will contain fold change and p-value, while a copy number bioset will show either copy number change or Z-score.

How do I use the Compound Category locator?

When viewing by category, click the name of a compound category to jump to that category in the results.

When viewing by rank, click the name of a compound category to highlight all compounds belonging to that category in the results.

Why is the number of studies shown in the Correlated Compounds tab different from the number shown in the Studies For... tab?

There are two possibilities. Both tabs show all of the public and private studies tagged with a compound that contain a significant correlation with your query term.

The Correlated Compounds tab, however, shows you results categorized by compound and treatment.

Studies that appear in the Studies For... tab might be excluded from the Correlated Compounds tab if they've failed—when grouped by compound—to meet additional scoring and ranking significance criteria (e.g., correction for multiple hypothesis testing).

Differences between study counts can also occur if private studies—studies you imported, or those that others have imported and shared with you—contain significant correlations with your query term. Private studies with correlations are shown in the Studies For... tab, but are not included in the categorization process.

Knockdown Atlas FAQs

What is Knockdown Atlas?

Knockdown Atlas is a Correlation Engine application ("app") that finds genes whose perturbation affects your query term. Querying Knockdown Atlas is like performing a knockdown, knockout, or overexpression experiment in reverse: You can see which genetic perturbations affect a gene, and how.

To query Knockdown Atlas:

Go to the top of any page within Correlation Engine.
Enter your term in the query field (under the app menu).
Click the Knockdown Atlas icon above the query field.

How do I query Knockdown Atlas with a bioset?

This process works a little differently.

To query Knockdown Atlas with a bioset:

Find a bioset of interest. (You can find biosets you've imported in the My Studies page, expand results from other apps, or browse the Curated Studies app page.)
Click the mini-QuickView icon next to the bioset's name.
Click the Knockdown Atlas icon above the query field.

What kinds of query terms can I use with Knockdown Atlas?

You can query Knockdown Atlas with a gene, sequence region, biogroup, or bioset. The Perturbed Genes tab (the first tab of the results page) will show a list of genes whose perturbation is correlated with your query term, ranked by statistical significance.

To see a list of studies that support the correlations, click the name of a perturbed gene.

Why is the app called Knockdown Atlas if it also contains results from overexpression studies?

Knockdown Atlas shows results from any genetic perturbation experiment, including knockout, gene silencing, and overexpression experiments. However, knockdowns and knockouts are the predominant type of experiment covered.

How do I find all the genes affected by perturbing a specific gene?

To get a list of studies in which a specific gene is perturbed, query Curated Studies.

To query Curated Studies for a specific perturbed gene:

Go to the top of the page and clear any query terms from the query field.
Click the Curated Studies icon .
Click the All Curated Studies button.
Click the Keyword link in the filter bar.
Enter the name of the perturbed gene. (You can also select a name from the auto-complete menu.)
Click the Apply Filter button.
Click the Advanced link.
In the Experiment Design menu, check the Mutant vs. wildtype box.
Click the Apply Filter button.

(Note: Currently, Correlation Engine does not group these studies by gene.)

I don't see a particular gene listed in my Knockdown Atlas results. Does this mean that perturbation of the gene does not affect my queried gene?

Possibly. A perturbed gene may be absent from results because either a) Correlation Engine has no studies in which the gene was perturbed, or b) studies in which the gene was perturbed are not significantly correlated with your query term.

To find data for a perturbed gene not listed in Knockdown Atlas results, browse Curated Studies:

Click the Curated Studies icon in the app menu.
Click the All Curated Studies button.
Click the Keyword link in the filter bar.
Enter the name of the perturbed gene. (You can also select the name from the auto-complete menu.)
Click the Apply Filter button.
Click the Keyword link in the filter bar.
In the Experiment Design menu, check the Genetic Perturbation box.
Click the Apply Filter button.

How do I see the statistics for a specific genetic perturbation listed in Knockdown Atlas results?

To see the statistics (e.g., p-value, fold change, copy number change, score) for a specific genetic perturbation:

Click the Perturbed Genes tab.
Click a gene name to see a breakdown of supporting studies by organism and data type.
Click the View Individual Studies button to see all studies in which the specific genetic perturbation was found to significantly affect your query term.

To expand each study for more details:

Click a study name.
This shows the bioset(s) within the study that correlate to your query term.

To see statistics and scores from a bioset:

Click a bioset name.
The column types displayed will depend on the data type. For example, a public RNA expression bioset will contain fold change and p-value, while a copy number bioset will show either copy number change or Z-score. The direction of the arrow shows the effect on the query term (i.e., an up arrow for up-regulated, a down arrow for down-regulated).

Why is the number of studies shown in the Perturbed Genes tab different from the number shown in the Studies For... tab?

There are two possibilities. Both tabs show all of the public and private studies in which a genetic perturbation significantly affected your query term.

The Perturbed Genes tab, however, shows your results categorized by perturbed gene.

Studies that appear in the Studies For... might be excluded from the Perturbed Genes view if they've failed—when grouped by perturbed gene—to meet additional scoring and ranking significance criteria (e.g., correction for multiple hypothesis testing).

Differences between study counts can also occur if private studies—studies you imported or those that others have imported and shared with you—contain significant correlations with your query term. Private studies with correlations are shown in the Studies For... tab, but are not included in the categorization process.

Biogroups FAQs

What is Biogroups?

Biogroups is a Correlation Engine application (“app”) that shows biogroups for which your queried bioset, phenotype or compound is highly enriched. When you query Biogroups with a bioset, you'll receive a ranked list of biogroups that highly overlap with the bioset.

When you query Biogroups with a phenotype or compound, you'll receive a ranked list of biogroups that highly overlap with biosets tagged with your query term.

To query Biogroups with a phenotype or compound:

Go to the top of any page within Correlation Engine.
Enter your query in the query field (under the app menu).
Click the Pathway Enrichment icon above the query field.

To query Biogroups with a bioset:

Find a bioset of interest. (You can find biosets you've imported in the My Studies page, expand results from other apps, or browse the Curated Studies app page.)
Click the mini-QuickView icon next to the bioset's name.
Click the Pathway Enrichment icon above the query field.

How do I query Biogroups with a bioset?

This process works a little differently.

To query Biogroups with a bioset:

Find a bioset of interest. (You can find biosets you've imported in the My Studies page, expand results from other apps, or browse the Curated Studies app page.)
Click the mini-QuickView icon next to the bioset's name.
Click the Pathway Enrichment icon above the query field.

What does "biogroup" mean?

A biogroup is a collection of genes that are associated with a specific biological function, pathway, or similar criteria. No numerical information is directly associated with a biogroup.

Gene lists represented as biogroups in Correlation Engine come from the following sources:

Gene Ontology (biological processes, cellular components, molecular functions)
MSigDB (canonical pathways, positional gene sets, regulatory motif gene sets)
InterPro (protein families)
TargetScan (predicted miRNA targets)

How do I see the statistics for a specific biogroup listed in my Biogroups results?

To see the statistics for a specific biogroup:

Click a biogroup name to see a breakdown of supporting studies by organism and data type.
Click the View Individual Studies button to see studies containing biosets that highly overlap with the specific biogroup, and which have been tagged with your queried phenotype or compound.

To expand each study:

Click a study name.
This shows the bioset(s) within the study that overlaps with the biogroup.

To see statistics and scores for a bioset-biogroup correlation:

Click a bioset name.
The Venn diagram shows statistics describing the bioset-biogroup overlap. Use the drop-down menu to include all genes, only up-regulated genes, or only down-regulated genes in the comparison.
The column types displayed in the feature list below the Venn diagram will depend on the data type. For example, a public RNA expression bioset will contain fold change and p-value, while a copy number bioset will show either copy number change or Z-score.

I queried Biogroups with a biogroup, but got an error message. How do I find a list of the genes within a pathway, molecular function, protein family, or biological process?

To find all of the genes contained in a biogroup, query QuickView and click General Info on the results page. (Biogroups only accepts phenotypes, compounds, and biosets as query terms.)

How do I find out which biogroups a gene belongs to?

To find out which biogroups a gene belongs to, query QuickView.

To query QuickView:

Go to the top of the page and enter the name of a gene in the query field.
Click the QuickView icon .
On your results page, click the General Info tab.
View the Transcription Factor Binding Sites biogroups at the top of the page.
Scroll down to the bottom of the page to view the Gene Ontology, Pathways, and Protein Family biogroups.

When you see the QuickView icon next to a gene name, you can also click the icon to query QuickView with that gene.

What do the Venn diagrams in expanded Biogroups results mean?

If you queried with a bioset, the Venn diagram shows you the following information:

How many genes are in your queried bioset;
How many genes are in the correlated biogroup;
How many genes are in both.

If you queried with a phenotype or compound, the diagram shows the overlap between the correlated biogroup and a bioset from a study tagged with your query term.

The p-value is calculated as the probability that such an overlap would occur by chance under the assumption that there is no biological link between your bioset and the biogroup.

How do I export results from a Biogroups query?

Click the Export Results button at the top of a Biogroups results page. This will download a list of all correlated biogroups.

To download a list of the genes common to your queried bioset and a specific biogroup:

Click a biogroup name.
Click the Export Data button.

To download the Venn diagram and associated statistics, click the Export Image button.

How does Biogroups compare to other pathway enrichment analysis tools?

Biogroups performs enrichment analysis using canonical gene lists that represent not just pathways, but also protein families, molecular functions, and biological processes.

In addition, Correlation Engine has developed advanced gene set enrichment analysis algorithms that take into account the direction of each gene within a bioset (e.g., up-/down-regulation or amplification), as well as its rank. For more details, please see our paper, "Ontology-Based Meta-Analysis of Global Collections of High-Throughput Public Data", (Kupershmidt et al. 2010) in PLoS ONE.

Literature FAQs

What is the Literature app?

Literature is a Correlation Engine application ("app") that shows a list of PubMed publications that match your query term. Results are listed in order of relevance. To sort the listed articles by publication date, select Date from the drop-down menu above the results.

Correlation Engine provides innovative filtering options by extracting key biomedical terms from the abstracts (and full text, when available) and displaying them as a tag cloud. To further filter and refine your results, click any term in the tag cloud. To specify which kinds of tags are shown in the tag cloud, click any of the filter categories to the right of the blue "filter terms" arrow (e.g., phenotype, tissue).

You can also filter by keyword. To do this, enter a term into the field below the tag cloud and click Filter. Click the "Clear all" button to return to your original, unfiltered results.

Literature differs from other Correlation Engine genomic apps (such as Disease Atlas) in that its results come from text-based searches, rather than data correlations.

What does the News tab do?

The News tab shows news articles related to your search term, sourced from hundreds of publicly available biology- and health-related news publications.

As with Literature and Clinical Trials results, news articles are listed by relevance. The News page, however, allows you to click the Date link to see the list ordered by date. The filter terms option bar (located just above the tag cloud) lets you view only certain subcategories of terms, such as phenotypes.

How does Correlation Engine rank relevant literature matches for a search?

Correlation Engine indexes over 19 million abstracts from PubMed and over 130,000 full-text publications from PubMed Central. For Literature searches, Correlation Engine uses a number of heuristics, including:

An extensive ontology with relationships between terms, synonyms, as well as a term hierarchy;
A customized, domain-specific stop word list and analyzer that emphasizes ontology terms;
The overall authority of the journal in which the paper was published;
Date of publication.

What is a tag cloud?

A tag cloud is a list of relevant terms ("tags") that have been extracted from the text results of your search. These tags are terms that appear throughout the abstracts and article text in your search results. Seeing tags displayed in a tag cloud can help you discover associations you might not have thought of before.

Tags are listed alphabetically within the tag cloud; tags that appear in a larger typeface are more strongly associated to your search term.

Correlation Engine uses only the top 50 results to construct the tag cloud. To see an even more informative tag cloud, select 200 or 1000 from the drop-down menu to the far right of the filter options bar. This action will include that amount of results when the tag cloud is constructed. (Note: Including more results will also increase computing time.)

Clinical Trials FAQs

What is Clinical Trials?

A: Clinical Trials is a Correlation Engine application ("app") that shows all clinical trials from ClinicalTrials.gov that match your query term. Results are listed in order of relevance. To sort the listed trials by date of last update, select Date from the drop-down menu above the results.

Correlation Engine provides innovative filtering options by extracting key biomedical terms from the trial descriptions and displaying them in a tag cloud. To further filter and refine your results, click any term in the tag cloud. To specify which kinds of tags are shown in the tag cloud, click any of the filter categories to the right of the blue "filter terms" arrow (e.g., phenotype, tissue).

You can also filter by keyword. To do this, enter a term into the field below the tag cloud and click Filter. Click the "Clear all" button to return to your original, unfiltered results.

Clinical Trials differs from other Correlation Engine genomic apps (such as Disease Atlas) in that its results come from text-based searches, rather than data correlations.

What is a tag cloud?

Tags are listed alphabetically within the tag cloud; tags that appear in a larger typeface are more strongly associated to your search term.

Search-Related Questions

What does the Studies & Projects page display?

The Studies & Projects section allows you to access studies according to your permissions. In other words, you can view just your studies, just those studies you have access through specific projects, just public studies or all studies.

For a gene search, what does the corresponding link to the Gene Details show?

The Gene Details section provides a summary of gene information. Here you can find information such as alternate names, links to gene orthologs, known transcription factor and miRNA binding sites for the gene, and membership in existing gene ontology lists, pathways and protein families. Additionally, Correlation Engine Professional and Enterprise users can launch the Correlation Engine Genome Browser from this page to view a graphic representation of the gene in chromosomal context.

How does the "auto-complete" function work?

The auto-complete function simplifies the selection of genes, pathways, tissues, authors, SNPs, and other biomedical concepts by providing a drop-down list of matches for you to pick from as you type. In order to provide the most appropriate suggestions, it uses a combination of biological and medical ontologies and other proprietary heuristics. The use of "auto-complete" is optional, and you can simply type in your term and press the Enter key to bypass it.

How does Correlation Engine rank Data Correlation results when I search for my gene of interest?

Correlation Engine ranks all of the studies for a given gene based on the activity of that gene in each individual experiment. For example, if a drug induces the activity of "Gene A" more than any other gene in a dataset ( or "bioset"), Gene A will get the highest rank amongst all genes profiled in that individual study. If you query Gene A in Correlation Engine, the bioset mentioned above, which gives Gene A a ranking of 1, will show up ranked higher than another bioset where, for instance, it is the 5th highest-induced gene and given rank 5. Correlation Engine's algorithms normalize gene ranks based on platform size and other factors.

How does Correlation Engine rank Data Correlation results for a biogroup of interest?

Biogroups represent any set of genes or proteins that share some biological property, such as function or common regulatory motif. Correlation Engine uses proprietary rank-based statistics to correlate biogroups with experimental data. If the majority of genes encoding proteins involved in the MAPK signaling pathwy are highly active in a given bioset (resulting in a corresponding low (or significant) p-value, this bioset will be highly ranked in the results of a Correlation Engine search of MAPK signaling pathway.

How does the Correlation Engine rank results for my tissue, phenotype or compound of interest?

Correlation Engine uses a combination of its proprietary rank-based statistics and various meta-analysis techniques to compute the most significant genes and biogroups associated with a tissue, phenotype, or a compound under investigation. To perform this calculation, Correlation Engine combines all studies related to a given topic to identify the most significant genes and functional trends. This enables you to glean information from the standpoint of "collective experimental intelligence". On the top of the Data Correlations results page, you can see a list of the best-correlated genes and biogroups for your search term, and a list of all the relevant studies below. When you select a ranked gene or a biogroup of interest, you can then access a subset of studies related to your original term but limited to those matching the selected gene or biogroup.

How does filtering of results work?

On the Data Correlations search page you can use filters to narrow down a large list of matching results to a restricted subset according to organism, data type or keyword. Within text-based search results (Literature, Clinical Trials and News), enter any term in the filter box to narrow results.

What criteria does Correlation Engine use to rank relevant literature matches for a search?

Correlation Engine indexes over 19 million abstracts from PubMed and over 130,000 full-text publications from PubMed Central. For its literature search, Correlation Engine uses a number of heuristics, including:

Extensive ontology with relationships between terms, synonyms, as well as a term hierarchy
A customized domain-specific stop word list and analyzer that emphasizes ontology terms
The authority of the journal where the paper was published
Date of publication

Community and My Correlation Engine

Why should I create a personal user profile on Correlation Engine?: A personal profile page on Correlation Engine is an online scientific CV, where you can list positions, degrees, and publications. It is only visible within your organization. Your Correlation Engine profile is linked to our comprehensive literature search, so you can easily claim your journal articles as your own. With a personal profile you can save search results and organize them by project. You can also join the Correlation Engine community once you create a personal profile.
Can I control who sees my profile?: Your personal profile can only be seen by other users within your organization. Users have complete privacy control over their own profiles. Correlation Engine makes it easy for your profile to be seen by only your groups and contacts, only registered members of Correlation Engine, or all users of Correlation Engine. Read our privacy policy.
What is the Correlation Engine Community?: The Correlation Engine community is made up of users and groups in the Correlation Engine system within your enterprise.
What are Correlation Engine contacts?: Correlation Engine contacts are your personal online scientific community - colleagues, lab mates, and collaborators within your organization.
Who can I add as a Correlation Engine contact?: Any registered Correlation Engine user within your organization can be added as a Correlation Engine contact. People who are not yet members of the Correlation Engine community can also be easily invited to join as a contact.
Who can I search for under "People"?: You can search for all registered users of Correlation Engine that have allowed their profiles to be searchable. Please see our privacy page to set your personal privacy settings. If you do not see a colleague listed on Correlation Engine, it is still easy to invite them to join the Correlation Engine community.
Why create groups?: Groups are an easy way to collaborate and communicate with a small group of people you work with or a large number of users with research interests similar to your own. Currently, you can share studies and participate in discussions with other members of the groups. In the future, Correlation Engine will add the ability to share publications, bookmarks, and other types of information with group members.
Can I control who can join a group?: As the creator or administrator of a group, you have full control over group privacy and membership. Groups can be public, allowing all members of the Correlation Engine community able to join, or can be private requiring invitation by the administrator. Groups can also be invisible to the public, so that they do not appear in search results.
Can I share my data with other group members?: With Correlation Engine Enterprise, you can easily upload your own data, compare it with all public data in the Correlation Engine system, and share it with group members. We value your privacy and data security at Correlation Engine. You have complete control over who sees your own data.
What happens when I archive a message?: Archiving messages removes messages from your inbox without deleting them. In the inbox, click on "archived inbox" in the drop down to see all archived messages.

Data Import Questions

What column headers or labels should I have in my data?

The system looks for required key column labels to identify the data section of an uploaded file. The required column(s) must be the first in the dataset. For analysis of gene, miRNA or protein expression, the column label can be 'gene', 'accession', 'symbol', 'protein', and more. For analyses where DNA coordinates are provided, the first columns are chrid, start, and stop. For SNP/GWAS, the header is 'snp' or other alternatives.

For a complete list of required, recommended, and optional custom columns, please download a complete listing in either Pdf or Excel formats.

Can I import my own data privately?

As a Correlation Engine Professional and Correlation Engine Enterprise user, you can upload, save, and correlate your own data with public data.

What is the acceptable data format?

You can easily upload data files to the Correlation Engine platform as processed raw data - results of statistical analysis consisting of gene/transcript/protein/SNP/etc. identifiers or chromosomal coordinates (chromosome, start, stop) and associated statistics (in text, csv or excel file formats). Correlation Engine enables users to import standard statistical columns fields (fold change/log2 fold change/0-N fold change, p-value, score, rank, correlation) and custom columns with numbers and any user-defined titles (a maximum of 5 columns). The Gene identifier column should be in the left-most column or should have the header "Gene name" to be recognized. If the identifiers are DNA coordinates such as for ChIP-Seq or ATAC-Seq data, the first three columns should be "chromosome, start, stop." Example files can be downloaded under the Sample Import files section on the left of the import page. The minimum requirement for upload of your data is that your file contains a list of recognizable identifiers (e.g., a set of genes or coordinates). For experimental data, we strongly recommend including associated statistics in order to improve the quality of the correlation with other data within Correlation Engine. You can import individual files by adding them one by one, or you can zip them into a single file for easier upload. Acceptable formats include text, .csv and Excel (including both .xls and .xlsx files).

What should I upload as associated files?

You can upload report, presentation and any other files associated with a given study. They don't need to be in any particular format but are limited to 1MB. These files are not required to complete data import and they can be added at any time.

How does Correlation Engine rank features in my dataset during import?

Correlation Engine uses standard fields described above to rank features in your gene/protein set. If more than one standard statistical column is present, Correlation Engine automatically picks one of the following columns (in order) for ranking:

Fold change/log2 fold change/0-N fold change
- (Note: log2 and 0-N values are converted to +/-fold change on upload)
P-value
Score
Rank

What type of gene and protein identifiers does Correlation Engine support?

Correlation Engine recognizes most public and standard commercial platform identifiers, including NCBI Gene IDs, Gene symbols, NCBI accession numbers, ENSEMBL IDs, RefSeq identifiers, IPI ids, and custom IDs from most Affymetrix, Illumina, Agilent and GE Healthcare platforms.

Can I upload more biosets into an existing study?

Yes, you can upload bioset files into a new study or an existing study, provided the biosets are from the same organism and data type as the target study.

Why should I tag my data?

Tagging is an important process which provides semantic structure to your data. While it takes just a few seconds to tag data, the benefits are significant. Search results are significantly improved once the data is tagged. Furthermore, tagging can be used to associate your study within an appropriate context and can help contribute to additional computations (Enterprise users). Tagging also helps your colleagues and collaborators quickly understand the biological background of the study.

What criteria should I use to tag my data?

You should tag each of your datasets with the following: 1) the tissue or cell line under study, 2) the phenotype, if applicable, and 3) genetic or chemical modifications (compound or a gene, if applicable). In general, tagging should only describe the main attributes of the experimental design and not of the experimental result or observation (e.g., you shouldn't tag your data with a highly expressed gene you detected in your microarray results).

How does Correlation Engine correlate my data with other data?

Correlation Engine uses proprietary rank-based statistics to compute associations between the data you import and all other experimental data. This allows you to place your experimental results within the context of the world's experiments in order to validate your study results, discover novel associations and trends, and design new experiments. Correlation Engine correlates your data with all biogroups as well, allowing you to discover common features among the genes or proteins that comprise your study. This, in turn, provides a greater understanding of the cellular events contributing to your study's results.

How do I edit studies that I have already imported?

To edit an existing study, click on the "Studies & Projects" link in the left vertical panel of the Correlation Engine homepage. Select the "My Studies" tab, and click on the "Full Study Details" button corresponding to the study of interest. The pencil icon or an "Add" link indicates sections where you can apply changes to "Study Details", "Bioset details", and "associated files". You can also delete any or all biosets within your individual studies.

How do I edit the tags for studies that I have already imported?

To edit tags for an existing study, click the "My Studies & Projects" link in the left vertical panel of the Correlation Engine homepage. Select the "My Studies" tab, and click on the "Full Study Details" button corresponding to the study of interest. Click on "+Add Tags" or "Edit" in the tag section under the "Biosets" tab.

Can I import multiple files as a zip archive?

Yes, you can import multiple files as a zip as long as you do not exceed 35mb per file. When using zip archive tools, be cautious not to include hidden operating system files, such as, the .DS_Store created in Mac operating systems.

How can I bring in results from BaseSpace SequenceHub applications?

You can easily upload data files to the Correlation Engine platform as processed raw data - results of statistical analysis consisting of genes/proteins or custom IDs and associated statistics (in text, csv or excel file formats). Correlation Engine enables users to import standard statistical columns fields (fold change/log2 fold change/0-N fold change, p-value, score, rank, correlation) and custom columns with numbers and any user-defined titles (a maximum of 5 columns).

The Gene identifier column should be in the left-most column or should have the header "Gene name" to be recognized (see the Sample Import files on the left of the import page). The minimum requirement for upload of your data is that your file contains a list of recognizable identifiers (e.g., a set of genes). For experimental data, we strongly recommend including associated statistics in order to improve the quality of the correlation with other data within Correlation Engine. You can import individual files by adding them one by one, or you can zip them into a single file for easier upload. Acceptable formats include text, .csv and Excel (including both .xls and .xlsx files).

How to use BaseSpace Sequence Hub Apps for getting RNA-seq data into Correlation Engine:

For details on uploading the filtered table file from Cufflinks Assembly & DE click here.

For details on uploading the *.deseq.res.csv file from RNAExpress click here.

For details on uploading the Reference FPKM gene values file from RNA-Seq Alignment click here.

For details on upload the *_ChIP-Seq_peaks.narrowPeak or *_ChIP-Seq_peaks.xls file from ChIP-Seq click here.

Meta-Analysis Questions

What is Meta-Analysis?

Meta-Analysis

enables users to query with a collection of individual biosets to derive a consensus gene signature and/or discover sets of commonly regulated biogroups. This allows you to identify the most consistently and highly regulated genes across multiple biosets. Biosets can be mixed and matched from both your private data library as well as the Correlation Engine public library. Meta-Analysis allows users to search up to 150 biosets at a time for correlating genes and biogroups. Alternatively, users may select up to 10 biosets to create a Meta-Analysis query to search all Correlation Engine biosets for correlating signatures.

How do I run Meta-Analysis?

Find biosets to add to Meta-Analysis by clicking on the name of a study of interest. Click the icon that appears to the left of any bioset. The icon's appearance will update to show you that the bioset has been added, and the Meta-Analysis icon at the top of the page will update to display the number of biosets that have been added so far. You can also browse to other pages to find more biosets to add; Meta-Analysis will remember all the biosets you've added.

When you're ready to run Meta-Analysis, click the large Meta-Analysis icon to go to the Meta-Analysis Setup tab. From the Setup tab, you can remove any or all of the biosets from the query. You can also drag and drop individual biosets to reorder how they appear in results.

To view Meta-Analysis results, click a results tab. Results can be viewed as correlated genes, biogroups, biosets, or SNPs (if appropriate).

How do I save the results from Meta-Analysis?

From the results page, choose to "Export Results" to an excel file. Alternatively, you can save the results page as a bookmark for later access.

What can I put into Meta-Analysis?

You can add up to 150 biosets spanning different platforms, organisms, projects and libraries.

Can I change my Meta-Analysis after running it?

Correlation Engine remembers your most recent Meta-Analysis until you sign out. You can continue to add or remove biosets and run the altered query. To start a new query, click the "remove all" link from the Meta-Analysis box.

How do you compute the most significant genes in the Meta-Analysis?

There are a number of parameters which are used for computing the most relevant genes. The most important two parameters are the activity level of a gene in each bioset and the specificity (the number of biosets in which the gene is active).

How do I see SNP Results?

The SNP Results tab is only available if your query includes at least one bioset containing SNP or mutation data. Biosets of the following data types enable the SNP Results tab:

SNP GWAS
Somatic Mutation
Germline Mutation

How do I interpret the score matrix in Meta-Analysis bioset results?

Each correlated bioset has an associated score matrix. The height of each vertical bar in the score matrix represents the score of the correlation between the queried bioset and the correlated bioset.

Absence of a colored bar means that the correlation is insignificant.

The color and direction of each colored bar depend on whether the biosets involved in the correlation are both directional (e.g., RNA expression or CNVs), both non-directional (e.g., SNPs or mutations), or one of each.

Both biosets are directional. The correlation's score is based on the strength of the overlap, or enrichment, between the two biosets. The bar is colored red and appears above the midline if there is an overall positive correlation in the directionality of the overlapping genes—for example, if most of the overlapping genes are down-regulated in both biosets. The bar is colored green and appears below the midline if there is an overall negative correlation in the directionality of the overlapping genes.
Both biosets are non-directional. The correlation's score is based on the strength of the overlap, or enrichment, between the two biosets. The bar is colored orange and always appears above the midline, since there is no directionality to the enrichment.
One bioset is directional and the other is non-directional. We show a bar representing the score of the strongest overlap, or enrichment, between the two biosets. If the strongest overlap is with up-regulated genes, the bar is colored red and appears above the midline. The bar is colored green and appears below the midline if the strongest overlap is with down-regulated genes.

How do I interpret the score matrix in Meta-Analysis biogroup results?

Each correlated biogroup has an associated score matrix. The height of each vertical bar in the score matrix represents the score of the correlation between the queried bioset and the biogroup.

Absence of a colored bar means that the correlation is insignificant.

The color and direction of the colored bars depend on whether the queried bioset is directional (e.g., RNA expression or CNVs) or non-directional (e.g., SNPs or mutations).

The queried bioset is directional. Two scores are represented—one as a red bar above the midline and the other as a green bar below the midline. The first score is based on the strength of the overlap, or enrichment, between the biogroup and the up-regulated genes in the queried bioset. The second score is based on the overlap between the biogroup and the down-regulated genes in the queried bioset.
The queried bioset is non-directional. The correlation's score is based on the strength of the overlap, or enrichment, between the biogroup and the queried bioset. The bar is colored orange and always appears above the midline, since there is no directionality to the enrichment.

How do I interpret the score matrix in Meta-Analysis gene results?

Each correlated gene has an associated score matrix. The height of each vertical bar in the score matrix represents the score of the correlation between the queried bioset and the gene.

Absence of a colored bar means that the correlation is insignificant.

(Only the top 5,000 gene features in a queried bioset are considered in order to decrease potential noise.)

The color and direction of the colored bars depend on whether the queried bioset is directional (e.g., RNA expression or CNVs) or non-directional (e.g., SNPs or mutations).

The queried bioset is directional. The correlation's score is based on the significance of the measurement made for the gene in the queried bioset. The bar is red and appears above the midline if the gene was up-regulated or amplified. The bar is green and appears below the midline if the gene was down-regulated or deleted.
The queried bioset is non-directional. The correlation's score is based on the significance of the measurement for the gene in the queried bioset (or for the associated gene(s) if querying with SNP or mutation biosets). The bar is colored orange and always appears above the midline, since there is no directionality to the measurement.

How do I interpret the score matrix in Meta-Analysis SNP results?

Each correlated SNP has an associated score matrix. The height of each vertical bar in the score matrix represents the score of the correlation between the queried bioset and the SNP. (For non-SNP biosets, bar height represents the score of the correlation with the gene(s) associated with the SNP.)

Absence of a colored bar means that the correlation is insignificant.

The color and direction of the colored bars depend on whether the queried bioset is directional (e.g., RNA expression or CNVs) or non-directional (e.g., SNPs or mutations).

The queried bioset is directional. The correlation's score is based on the significance of the measurement made for the gene. The bar is red and appears above the midline if the gene was up-regulated or amplified in the queried bioset. The bar is green and appears below the midline if the gene was down-regulated or deleted in the queried bioset.
The queried bioset is non-directional. The correlation's score is based on the significance of the measurement for the gene (or for the associated gene(s) if querying with SNP or mutation biosets). The bar is colored orange and always appears above the midline, since there is no directionality to the measurement.

What are Meta-Analysis Visualizations?

Meta-Analysis Visualizations display heatmaps of the information presented in the main Gene Results or Biogroup Results tabs. The images are rendered with standard R language (R Core Team (2023). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/) based analysis using Heatmaply (Galili, Tal, O'Callaghan, Alan, Sidi, Jonathan, Sievert, Carson (2017). "heatmaply: an R package for creating interactive cluster heatmaps for online publishing." Bioinformatics. doi:10.1093/bioinformatics/btx657) and supporting packages. The data is rendered using default distance and cluster functions.

Visualization supports a minimum of 2 and maximum of 5 biosets. Datatypes supported are gene expression, protein expression, DNA methylation and ATAC-Seq. The displayed results are limited to the top 500 consensus ranked genes or biogroups returned. The full data can be acquired as normal from the respective results tabs using the export button.

Library-Related Questions

What is the Public Domain Library?: The Correlation Engine Library contains all public data studies organized into projects to provide you easy navigation through all the Correlation Engine public content. You can pick any studies or biosets of interest and set up advanced queries or just browse through available content. None of the studies or biosets in this library can be edited.
What is the "Company X" Library?: This is the library containing studies and projects proprietary to your specific organization. Only users within your company have permission to access it (unless specified otherwise by an administrator). In order to move data from your private project into this library with organization-wide access users have to have special permission. Please contact Correlation Engine in order to do that. Within the next several release cycles we'll enable your organization's Correlation Engine administrator to set these permissions without Correlation Engine assistance.

Enterprise-Related Questions

Will my organization's data and user activity on Correlation Engine Enterprise be secure?: Correlation Engine provides a highly secure solution for its enterprise customers. Please refer to the Correlation Engine APIs page for more details.
How can my organization upload studies in bulk?: Correlation Engine provides simple APIs to enable you to import studies in batch mode. Please refer to the Correlation Engine APIs page for more details.
Does Correlation Engine provide APIs?: We make a number of APIs available to enable you to bring data into and out of Correlation Engine. Please refer to the Correlation Engine APIs page for more details.
How can we control data sharing and collaboration among different groups?: Correlation Engine provides a feature where each user can create a private group and collaborate and share data only with users within this group.
Can we keep some data private from other users within an organization?: Users can easily control who views and has access to their data, both within their own organization and outside, through privacy settings. You can share data selectively with other individuals by creating a custom group and giving access to only those users that you choose.