Correlation Engine applications ("apps") can be accessed from the top of every page within Correlation Engine. This area, called the app menu, allows you to query any app whenever you need it.
To query an app:
If you're already viewing results for a query term, you can use that same query term to query a different app. Just click the icon for that app.
This process works a little differently.
To query an app with a bioset:
You'll also see app icons in results pages: next to a bioset name, or in a menu when you move the cursor over an acceptable query term (e.g., the name of a gene). In these cases, just click the app icon to query it with that particular bioset or query term.
To query an app with a sequence region:
If you want to query with a regular query term, click the "Go back to main search" link.
You can query QuickView, Curated Studies, Disease Atlas, Pharmaco Atlas, Knockdown Atlas, and Genome Browser with a sequence region. If you query Genome Browser with a sequence region, Genome Browser will launch in a new window, focused on that region.
Correlation Engine applications ("apps") use public and private experimental data as their sources—not PubMed or other scientific literature sources. This means that you may potentially discover information that was absent or not well-supported in the original research literature.
For example: If you query Disease Atlas with a gene, the results will display diseases highly correlated with that gene. These results come from experimental studies in which a significant result was found for that gene.
Correlation Engine has scored and ranked all listed diseases using factors such as: study tags, the gene's significance within a study, and the total number of studies for a disease in which that gene was measured.
Correlation Engine uses a combination of public, private, and proprietary information.
Data correlations. The Correlation Engine library of genomic data comes from several public sources, including:
Correlation Engine's curation team also manually curates studies from published literature.
Gene/SNP identifiers. Correlation Engine recognizes commonly used public gene identifiers as well as specific vendor identifiers. Correlation Engine maps individual gene identifiers to standard reference identifiers using the following sources:
To enable seamless comparison across different species, Correlation Engine uses ortholog information from:
Ontologies for semantic tagging. Correlation Engine has developed standardized vocabularies with which to tag its biosets. Sources include:
Correlation Engine's auto-complete and tag cloud terms come from the following ontologies and indexes:
Correlation Engine uses proprietary algorithms to calculate and rank the diseases most significantly correlated with a queried gene, SNP, sequence region, bioset, or biogroup.
First, we identify individual biosets that are significantly correlated with your query term. Based on the statistical significance of these correlations, we then rank all of the studies that contain correlated biosets.
For example, when ranking Disease Atlas results for a queried gene, we consider the following:
Depending on which app you've queried, we group correlated studies together based on our gene indexes, standardized vocabularies, and semantic tags (e.g., Disease Atlas results are grouped by disease). We call this process "categorization".
During categorization, we apply additional statistical criteria, such as correction for multiple hypothesis testing. Then we rank the diseases by statistical significance. We assign a numerical score of 100 to the most significant result, and normalize the other results' scores to the top-ranked result.
For a detailed description of Correlation Engine's methods, please see our paper, "Ontology-Based Meta-Analysis of Global Collections of High-Throughput Public Data" (Kupershmidt et al., 2010) in PLoS ONE.
No. Correlation Engine scores all results relative to the top-ranked result, whose score is set to 100. Although a low-scoring result might have less statistical significance compared to the top-ranked result, it could still have real biological relevance.
The Venn diagram shows you the following information:
The p-value is calculated as the probability that such an overlap would occur by chance (assuming that your bioset or biogroup is not actually correlated with the target bioset or biogroup).
To download a list of studies:
The Export Results button is available throughout the Correlation Engine website for paid subscribers. This button will export all appropriate details for the results page you are viewing.
Here are the definitions for the Supporting Data Types:
If you're on a results page, you can also move the mouse cursor over each abbreviation to see its definition.
QuickView is a Correlation Engine application ("app") that you can query to get a quick, top-level view of all the data and information Correlation Engine has about a particular gene, SNP, sequence region, biogroup, bioset, phenotype, tissue, or compound.
There are several ways to query QuickView:
QuickView has two tabs that organize your results:
Click any result name to view its expanded result within that app. To see the full query results for that app, you can click an app name, an app icon, or the Explore Results link.
QuickView compiles information from PubMed, Gene Ontology, dbSNP, PubChem, MSigDB and other public data sources.
Curated Studies is a Correlation Engine application ("app") that lets you browse or query all datasets that you or Correlation Engine have imported.
To query Curated Studies:
This process works a little differently.
To query Curated Studies with a bioset:
You can query Curated Studies with just about any query term or keyword. Results will depend on what type of query term you enter.
You can also query Curated Studies with keywords that don't belong to the above types of query terms. In this case, Curated Studies will show studies resulting from a text-based search with your query term.
Finally, you can browse studies without using a query term as a starting point from within the app. To do this, go to the Curated Studies home page and click the All Curated Studies button. You'll then see a list of all curated studies that you can filter and browse.
Results from Curated Studies are only lists of individual studies correlated with or referencing your query term. They are not categorized or grouped in any way. By contrast, Disease Atlas groups studies according to disease, and Genetic Markers groups studies according to gene or SNP.
Click the "All Curated Studies" button.
You can use Curated Studies to directly inspect public genomic data. This can be useful if you're interested in all studies (general, or correlated to a query term) that have been performed on a specific data type, experimental design, or species.
Because other Correlation Engine apps rank results by statistical significance to your query term, you can also use Curated Studies to look up negative results. For example, you can find out whether a gene of interest was not significant in a particular kind of experimental study.
To see the statistics (e.g., p-value, fold change, copy number change, score) for a specific bioset:
To see the detailed information for a study, click on "Full Study Details" under the entry on the curated studies page. The study details tab provides background information on the study from the original source. The biosets tab shows the list of biosets in the study, the tags associated with those biosets and supplementary files that a user can download. The related genes and biogroups table shows a meta-analysis of all the gene signatures in the study and a meta-analysis of all biosets in the study with pathway enrichment. The images tab displays available associated plots and metrics for the study.
Body Atlas is a unique tool that you can use to find normalized gene expression across all tissues, cell types, cell lines, and stem cells in the Correlation Engine library.
A Body Atlas query for a gene, bioset or biogroup will produce a list of tissues and cell types ranked by relevance. You can also sort your results by absolute gene expression or body system, or across all body systems.
A tissue or cell line (biosource) query will result in a list of genes ranked by expression levels in the queried tissue or cell line. You can also view genes ranked by tissue-specific expression, or cell line specific-copy number variations and mutations.
Use Body Atlas to:
To query Body Atlas:
This process works a little differently.
To query Body Atlas with a bioset:
The Body Atlas biosets have been drawn from all RNA expression studies that have used the Affymetrix GeneChip© Human Genome U133 Plus 2.0 Array for human studies, and the Affymetrix GeneChip© Mouse Genome 430A 2.0 Arrays for mouse studies.
We incorporated our data as follows:
First, we perform a per-chip median normalization on probesets common to all platforms. We then combine these probesets using quantile normalization.
Intensities for probesets unique to particular platforms are rescaled to the same per-chip median; we then fit them by linear interpolation, using the intensities of the common probesets between platforms as a reference.
The score for a given tissue or cell represents the magnitude of the correlation score between a queried bioset or biogroup, and the gene expression bioset for that tissue or cell.
When you query Body Atlas with a biogroup (a nondirectional set of genes), results include a direction column that indicates the sign of the correlation score.
When you query Body Atlas with a bioset that contains directional information (e.g., gene expression fold change for a condition of interest), the results include a correlation column. This column indicates whether correlation was positive or negative.
You can view Body Atlas content in two ways. Gene, biogroup or bioset queries display a list of tissues, cell types, cell lines or stem cells related to the query term. Tissue or cell line queries display a list of gene expression levels corresponding to the query term.
Gene, biogroup or bioset queries in Body Atlas will display tissues grouped by body system as the default view. Click the corresponding tab to view cell types, cell lines, and stem cells. Clicking the name of a body system will jump to that group's results.
To rank Body Atlas results strictly by degree of expression or correlation—without categorizing them into groups—choose "Ranks" from the "View by:" menu. Clicking the name of a body system will highlight tissues and cell types that belong to that body system.
Tissue or cell line queries will display a list of expression levels of all genes in the queried biosource or tissue. Click the corresponding tab to view tissue-specific gene expression or a complete list of somatic mutations and copy number changes in a particular cell line.
The Body Atlas biosets have been drawn from RNA-seq expression studies taken from the Genotype-Tissue Expression project (GTEx).
The GTEx project is a publicly funded project that aims to provide a comprehensive atlas of gene expression and regulation across multiple human tissues; additional information can be found at GTEx project.
RNA samples used in GTEx were extracted from normal human tissues, poly-A selected, and sequenced using Illumina 74bp paired-end technology.
For Correlation Engine Body Atlas, we downloaded the raw read data, subjected data to stringent quality controls and processed it using RNA Express 1.0 pipeline. A subset of 505 high confidence samples were used.
RNA Express pipeline was developed at Illumina and is available on BaseSpace.
Tissue specific gene ranks were derived from differential expression p-values (tissue of interest vs all tissues). P-values were calculated by edgeR package.
The Genotype - Tissue Expression (GTEx) Project was supported by the Common Fund of the Office of the Director of the National Institutes of Health (commonfund.nih.gov/GTEx). Additional funds were provided by the NCI, NHGRI, NHLBI, NIDA, NIMH, and NINDS. Donors were enrolled at Biospecimen Source Sites funded by NCI \ Leidos Biomedical Research , Inc. sub contracts to the National Disease Research Interchange (10XS170) , Roswell Park Cancer Institute (10XS171), and Science Care, Inc. (X10S172). The Laboratory, Data Analysis, and Coordinating Center (LDACC) was funded through a contract (HHSN268201000029C) to the The Broad Institute, Inc. Biorepository operations were funded through a Leidos Biomedical Research, Inc. sub contract to Van Andel Research Institute (10ST1035). Additional data repository and project management were provided by Leidos Biomedical Research, Inc. (HHSN261200800001E). The Brain Bank was supported supplements to University of Miami grant DA006227. Statistical Methods development grants were made to the University of Geneva (MH090941 & MH101814 ), the University of Chicago (MH090951 , MH090937 , MH101825, & MH101820 ), the University of North Carolina - Chapel Hill (MH090936) , North Carolina State University (MH101819), Harvard University (MH090948) , Stanford University (MH101782), Washington University (MH 101810), and to the University of Pennsylvania (MH101822). The datasets used for the analyses described in this manuscript were obtained from dbGaP at http://www.ncbi.nlm.nih.gov/gap through dbGaP accession number phs000424.v4.p1.
We incorporated our data as follows::
The Venn diagram shows you the following information:
The p-value is calculated as the probability that such an overlap would occur by chance under the assumption that your bioset is not actually correlated with the tissue or cell type.
When viewing by category, click the name of a body system to jump to that category in the results.
When viewing by rank, click the name of a body system to highlight all tissues or cell types belonging to that category in the results.
To export Body Atlas results as a .csv file:
Genetic Markers is a Correlation Engine application ("app") that finds genes and SNPs significantly correlated to a phenotype or compound. Results are ranked in order of statistical significance.
Correlation Engine determines a marker's significance through a meta-analysis that takes into account the marker's rank across all studies tagged with the queried phenotype or compound.
To query Genetic Markers:
You can query Genetic Markers with a phenotype or compound. The first tab of the results page (Correlated Genes) will show a list of genes correlated with your query term, ranked by statistical significance. Click the Correlated SNPs tab to view correlated SNPs.
To see a list of studies that support the correlations, click a gene or SNP name.
Correlation Engine integrates multiple types of genomic data to rank the significance of genes and SNPs tagged with a given phenotype or compound. These data types may include RNA and miRNA expression, SNPs identified through GWAS, epigenetic data, CNVs, and mutation data. In addition, Genetic Markers uses curated data from OMIM, Jackson Labs, and DrugBank.
To find all of the curated studies that are correlated with a gene or SNP, query Curated Studies. (Genetic Markers only accepts phenotypes or compounds as query terms.)
Querying Curated Studies will return a ranked list of all studies that contain your gene or SNP as a significant result.
Possibly. A gene or SNP may be absent from results because either a) Correlation Engine has no studies tagged with your query term, or b) studies tagged with your query term are not significantly correlated with the gene or SNP you're interested in.
To find data for a gene or SNP not listed in Genetic Markers results, browse Curated Studies:
To see the statistics (e.g., p-value, fold change, copy number change, score) for a specific gene/SNP:
To expand each study:
To see statistics and scores from a bioset:
There are two possibilities. The Correlated Studies view shows all of the public or private studies that contain a significant correlation with your query term. The Correlated Genes and Correlated SNPs views, on the other hand, group these studies by category (in this case, by gene or by SNP).
Studies that appear in the Correlated Studies view might be excluded from the Correlated Genes or Correlated SNPs view if they've failed—when grouped—to meet additional scoring and ranking significance criteria (e.g., correction for multiple hypothesis testing).
Differences between study counts can also occur if private studies—studies you imported or those that others have imported and shared with you—contain significant correlations with your query term. Private studies with correlations are shown in the Correlated Studies view, but are not included in the categorization process.
(If you are a Correlation Engine Enterprise user, studies that belong to Enterprise projects are included in categorized results. However, studies that you have created or imported on your own are not.)
Disease Atlas is a Correlation Engine application ("app") that finds diseases, traits, conditions, and surrogate endpoints associated with a gene, sequence region, SNP, biogroup, or bioset. Results are grouped by disease and ranked according to statistical significance.
Disease Atlas categories only include the subset of phenotypes that have been specifically tagged "disease". So while you can query other apps with the phenotype "aging", you won't find "aging" among the categories listed in Disease Atlas results.
To query Disease Atlas:
This process works a little differently.
To query Body Atlas with a bioset:
You can query Disease Atlas with a gene, sequence region, biogroup, or bioset. The Correlated Diseases tab will show a list of diseases correlated with your query term, grouped into broad disease categories. Clicking the name of a disease category will jump to the results for that category.
To display diseases ranked by statistical significance, choose "Categories" from the "View by:" menu. Clicking the name of a disease category will highlight results for that category.
To see a list of studies that support the correlations, click a disease name.
To find all of the curated studies that are correlated with a disease or other phenotype, query Curated Studies. (Disease Atlas only accepts genes, sequence regions, SNPs, biogroups, or biosets as query terms.)
Querying Curated Studies will return a ranked list of all studies that are tagged with your disease or other phenotype.
Possibly. A disease may be absent from results because either a) Correlation Engine has no studies tagged with the disease, or b) studies tagged with the disease are not significantly correlated with your query term.
To find data for a disease not listed in Disease Atlas results, browse Curated Studies:
To see the statistics (e.g., p-value, fold change, copy number change, score) for a specific disease:
To expand each study:
To see statistics and scores from the expanded bioset:
When viewing by category, click the name of a disease category to jump to that category in the results.
When viewing by rank, click the name of a disease category to highlight all diseases belonging to that category in the results.
There are two possibilities. Both tabs show all of the public and private studies tagged with a disease that contain a significant correlation with your query term.
The Correlated Diseases view, however, shows your results categorized by disease.
Studies that appear in the Studies For... tab might be excluded from the Correlated Diseases tab if they've failed—when grouped by disease—to meet additional scoring and ranking significance criteria (e.g., correction for multiple hypothesis testing).
Differences between study counts can also occur if private studies—studies you imported or those that others have imported and shared with you—contain significant correlations with your query term. Private studies with correlations are shown in the Studies For... tab, but are not included in the categorization process.
(If you are a Correlation Engine Enterprise user, studies that belong to Enterprise projects are included in categorized results. However, studies you have created or imported on your own are not.)
Pharmaco Atlas is a Correlation Engine application ("app") that finds compounds and treatments significantly correlated to a gene, sequence region, biogroup, or bioset. Results are ranked in order of statistical significance.
To use Pharmaco Atlas:
This process works a little differently.
To query Pharmaco Atlas with a bioset:
You can query Pharmaco Atlas with a gene, sequence region, biogroup, or bioset. The Correlated Compounds tab (the first tab of the results page) will show a list of compounds and treatments that are correlated with your query term, ranked by statistical significance.
To see a list of studies that support the correlations, click a compound name.
To find all of the curated studies that are correlated with a compound, query Curated Studies. (Pharmaco Atlas only accepts genes, sequence regions, biogroups, or biosets as query terms.)
Querying Curated Studies will return a ranked list of all studies that are tagged with your compound.
Possibly. A compound may be absent from results because either a) Correlation Engine has no studies tagged with the compound, or b) studies tagged with the compound are not significantly correlated with your query term.
To find data for a compound not listed in Pharmaco Atlas results, browse Curated Studies:
To see the statistics (e.g., p-value, fold change, copy number change, score) for a specific compound:
To expand a study for more detail:
To see statistics and scores from a bioset:
When viewing by category, click the name of a compound category to jump to that category in the results.
When viewing by rank, click the name of a compound category to highlight all compounds belonging to that category in the results.
There are two possibilities. Both tabs show all of the public and private studies tagged with a compound that contain a significant correlation with your query term.
The Correlated Compounds tab, however, shows you results categorized by compound and treatment.
Studies that appear in the Studies For... tab might be excluded from the Correlated Compounds tab if they've failed—when grouped by compound—to meet additional scoring and ranking significance criteria (e.g., correction for multiple hypothesis testing).
Differences between study counts can also occur if private studies—studies you imported, or those that others have imported and shared with you—contain significant correlations with your query term. Private studies with correlations are shown in the Studies For... tab, but are not included in the categorization process.
(If you are a Correlation Engine Enterprise user, studies that belong to Enterprise projects are included in categorized results. However, studies that you have created or imported on your own are not.)
Knockdown Atlas is a Correlation Engine application ("app") that finds genes whose perturbation affects your query term. Querying Knockdown Atlas is like performing a knockdown, knockout, or overexpression experiment in reverse: You can see which genetic perturbations affect a gene, and how.
To query Knockdown Atlas:
This process works a little differently.
To query Knockdown Atlas with a bioset:
You can query Knockdown Atlas with a gene, sequence region, biogroup, or bioset. The Perturbed Genes tab (the first tab of the results page) will show a list of genes whose perturbation is correlated with your query term, ranked by statistical significance.
To see a list of studies that support the correlations, click the name of a perturbed gene.
Knockdown Atlas shows results from any genetic perturbation experiment, including knockout, gene silencing, and overexpression experiments. However, knockdowns and knockouts are the predominant type of experiment covered.
To get a list of studies in which a specific gene is perturbed, query Curated Studies.
To query Curated Studies for a specific perturbed gene:
(Note: Currently, Correlation Engine does not group these studies by gene.)
Possibly. A perturbed gene may be absent from results because either a) Correlation Engine has no studies in which the gene was perturbed, or b) studies in which the gene was perturbed are not significantly correlated with your query term.
To find data for a perturbed gene not listed in Knockdown Atlas results, browse Curated Studies:
To see the statistics (e.g., p-value, fold change, copy number change, score) for a specific genetic perturbation:
To expand each study for more details:
To see statistics and scores from a bioset:
There are two possibilities. Both tabs show all of the public and private studies in which a genetic perturbation significantly affected your query term.
The Perturbed Genes tab, however, shows your results categorized by perturbed gene.
Studies that appear in the Studies For... might be excluded from the Perturbed Genes view if they've failed—when grouped by perturbed gene—to meet additional scoring and ranking significance criteria (e.g., correction for multiple hypothesis testing).
Differences between study counts can also occur if private studies—studies you imported or those that others have imported and shared with you—contain significant correlations with your query term. Private studies with correlations are shown in the Studies For... tab, but are not included in the categorization process.
(If you are a Correlation Engine Enterprise user, studies that belong to Enterprise projects are included in categorized results. However, studies that you have created or imported on your own are not.)
Biogroups is a Correlation Engine application (“app”) that shows biogroups for which your queried bioset, phenotype or compound is highly enriched. When you query Biogroups with a bioset, you'll receive a ranked list of biogroups that highly overlap with the bioset.
When you query Biogroups with a phenotype or compound, you'll receive a ranked list of biogroups that highly overlap with biosets tagged with your query term.
To query Biogroups with a phenotype or compound:
To query Biogroups with a bioset:
This process works a little differently.
To query Biogroups with a bioset:
A biogroup is a collection of genes that are associated with a specific biological function, pathway, or similar criteria. No numerical information is directly associated with a biogroup.
Gene lists represented as biogroups in Correlation Engine come from the following sources:
To see the statistics for a specific biogroup:
To expand each study:
To see statistics and scores for a bioset-biogroup correlation:
To find all of the genes contained in a biogroup, query QuickView and click General Info on the results page. (Biogroups only accepts phenotypes, compounds, and biosets as query terms.)
To find out which biogroups a gene belongs to, query QuickView.
To query QuickView:
When you see the QuickView icon next to a gene name, you can also click the icon to query QuickView with that gene.
If you queried with a bioset, the Venn diagram shows you the following information:
If you queried with a phenotype or compound, the diagram shows the overlap between the correlated biogroup and a bioset from a study tagged with your query term.
The p-value is calculated as the probability that such an overlap would occur by chance under the assumption that there is no biological link between your bioset and the biogroup.
Click the Export Results button at the top of a Biogroups results page. This will download a list of all correlated biogroups.
To download a list of the genes common to your queried bioset and a specific biogroup:
To download the Venn diagram and associated statistics, click the Export Image button.
Biogroups performs enrichment analysis using canonical gene lists that represent not just pathways, but also protein families, molecular functions, and biological processes.
In addition, Correlation Engine has developed advanced gene set enrichment analysis algorithms that take into account the direction of each gene within a bioset (e.g., up-/down-regulation or amplification), as well as its rank. For more details, please see our paper, "Ontology-Based Meta-Analysis of Global Collections of High-Throughput Public Data", (Kupershmidt et al. 2010) in PLoS ONE.
Literature is a Correlation Engine application ("app") that shows a list of PubMed publications that match your query term. Results are listed in order of relevance. To sort the listed articles by publication date, select Date from the drop-down menu above the results.
Correlation Engine provides innovative filtering options by extracting key biomedical terms from the abstracts (and full text, when available) and displaying them as a tag cloud. To further filter and refine your results, click any term in the tag cloud. To specify which kinds of tags are shown in the tag cloud, click any of the filter categories to the right of the blue "filter terms" arrow (e.g., phenotype, tissue).
You can also filter by keyword. To do this, enter a term into the field below the tag cloud and click Filter. Click the "Clear all" button to return to your original, unfiltered results.
Literature differs from other Correlation Engine genomic apps (such as Disease Atlas) in that its results come from text-based searches, rather than data correlations.
The News tab shows news articles related to your search term, sourced from hundreds of publicly available biology- and health-related news publications.
As with Literature and Clinical Trials results, news articles are listed by relevance. The News page, however, allows you to click the Date link to see the list ordered by date. The filter terms option bar (located just above the tag cloud) lets you view only certain subcategories of terms, such as phenotypes.
Correlation Engine indexes over 19 million abstracts from PubMed and over 130,000 full-text publications from PubMed Central. For Literature searches, Correlation Engine uses a number of heuristics, including:
A tag cloud is a list of relevant terms ("tags") that have been extracted from the text results of your search. These tags are terms that appear throughout the abstracts and article text in your search results. Seeing tags displayed in a tag cloud can help you discover associations you might not have thought of before.
Tags are listed alphabetically within the tag cloud; tags that appear in a larger typeface are more strongly associated to your search term.
Correlation Engine uses only the top 50 results to construct the tag cloud. To see an even more informative tag cloud, select 200 or 1000 from the drop-down menu to the far right of the filter options bar. This action will include that amount of results when the tag cloud is constructed. (Note: Including more results will also increase computing time.)
A: Clinical Trials is a Correlation Engine application ("app") that shows all clinical trials from ClinicalTrials.gov that match your query term. Results are listed in order of relevance. To sort the listed trials by date of last update, select Date from the drop-down menu above the results.
Correlation Engine provides innovative filtering options by extracting key biomedical terms from the trial descriptions and displaying them in a tag cloud. To further filter and refine your results, click any term in the tag cloud. To specify which kinds of tags are shown in the tag cloud, click any of the filter categories to the right of the blue "filter terms" arrow (e.g., phenotype, tissue).
You can also filter by keyword. To do this, enter a term into the field below the tag cloud and click Filter. Click the "Clear all" button to return to your original, unfiltered results.
Clinical Trials differs from other Correlation Engine genomic apps (such as Disease Atlas) in that its results come from text-based searches, rather than data correlations.
A tag cloud is a list of relevant terms ("tags") that have been extracted from the text results of your search. These tags are terms that appear throughout the abstracts and article text in your search results. Seeing tags displayed in a tag cloud can help you discover associations you might not have thought of before.
Tags are listed alphabetically within the tag cloud; tags that appear in a larger typeface are more strongly associated to your search term.
Correlation Engine uses only the top 50 results to construct the tag cloud. To see an even more informative tag cloud, select 200 or 1000 from the drop-down menu to the far right of the filter options bar. This action will include that amount of results when the tag cloud is constructed. (Note: Including more results will also increase computing time.)
Correlation Engine indexes over 19 million abstracts from PubMed and over 130,000 full-text publications from PubMed Central. For its literature search, Correlation Engine uses a number of heuristics, including:
The system looks for required key column labels to identify the data section of an uploaded file. The required column(s) must be the first in the dataset. For analysis of gene, miRNA or protein expression, the column label can be 'gene', 'accession', 'symbol', 'protein', and more. For analyses where DNA coordinates are provided, the first columns are chrid, start, and stop. For SNP/GWAS, the header is 'snp' or other alternatives.
For a complete list of required, recommended, and optional custom columns, please download a complete listing in either Pdf or Excel formats.
Correlation Engine uses standard fields described above to rank features in your gene/protein set. If more than one standard statistical column is present, Correlation Engine automatically picks one of the following columns (in order) for ranking:
You can easily upload data files to the Correlation Engine platform as processed raw data - results of statistical analysis consisting of genes/proteins or custom IDs and associated statistics (in text, csv or excel file formats). Correlation Engine enables users to import standard statistical columns fields (fold change/log2 fold change/0-N fold change, p-value, score, rank, correlation) and custom columns with numbers and any user-defined titles (a maximum of 5 columns).
The Gene identifier column should be in the left-most column or should have the header "Gene name" to be recognized (see the Sample Import files on the left of the import page). The minimum requirement for upload of your data is that your file contains a list of recognizable identifiers (e.g., a set of genes). For experimental data, we strongly recommend including associated statistics in order to improve the quality of the correlation with other data within Correlation Engine. You can import individual files by adding them one by one, or you can zip them into a single file for easier upload. Acceptable formats include text, .csv and Excel (including both .xls and .xlsx files).
How to use BaseSpace Sequence Hub Apps for getting RNA-seq data into Correlation Engine:
For details on uploading the filtered table file from Cufflinks Assembly & DE click here.
For details on uploading the *.deseq.res.csv file from RNAExpress click here.
For details on uploading the Reference FPKM gene values file from RNA-Seq Alignment click here.
For details on upload the *_ChIP-Seq_peaks.narrowPeak or *_ChIP-Seq_peaks.xls file from ChIP-Seq click here.
Find biosets to add to Meta-Analysis by clicking on the name of a study of interest. Click the icon that appears to the left of any bioset. The icon's appearance will update to show you that the bioset has been added, and the Meta-Analysis icon at the top of the page will update to display the number of biosets that have been added so far. You can also browse to other pages to find more biosets to add; Meta-Analysis will remember all the biosets you've added.
When you're ready to run Meta-Analysis, click the large Meta-Analysis icon to go to the Meta-Analysis Setup tab. From the Setup tab, you can remove any or all of the biosets from the query. You can also drag and drop individual biosets to reorder how they appear in results.
To view Meta-Analysis results, click a results tab. Results can be viewed as correlated genes, biogroups, biosets, or SNPs (if appropriate).
The SNP Results tab is only available if your query includes at least one bioset containing SNP or mutation data. Biosets of the following data types enable the SNP Results tab:
Each correlated bioset has an associated score matrix. The height of each vertical bar in the score matrix represents the score of the correlation between the queried bioset and the correlated bioset.
Absence of a colored bar means that the correlation is insignificant.
The color and direction of each colored bar depend on whether the biosets involved in the correlation are both directional (e.g., RNA expression or CNVs), both non-directional (e.g., SNPs or mutations), or one of each.
Each correlated biogroup has an associated score matrix. The height of each vertical bar in the score matrix represents the score of the correlation between the queried bioset and the biogroup.
Absence of a colored bar means that the correlation is insignificant.
The color and direction of the colored bars depend on whether the queried bioset is directional (e.g., RNA expression or CNVs) or non-directional (e.g., SNPs or mutations).
Each correlated gene has an associated score matrix. The height of each vertical bar in the score matrix represents the score of the correlation between the queried bioset and the gene.
Absence of a colored bar means that the correlation is insignificant.
(Only the top 5,000 gene features in a queried bioset are considered in order to decrease potential noise.)
The color and direction of the colored bars depend on whether the queried bioset is directional (e.g., RNA expression or CNVs) or non-directional (e.g., SNPs or mutations).
Each correlated SNP has an associated score matrix. The height of each vertical bar in the score matrix represents the score of the correlation between the queried bioset and the SNP. (For non-SNP biosets, bar height represents the score of the correlation with the gene(s) associated with the SNP.)
Absence of a colored bar means that the correlation is insignificant.
The color and direction of the colored bars depend on whether the queried bioset is directional (e.g., RNA expression or CNVs) or non-directional (e.g., SNPs or mutations).
Meta-Analysis Visualizations display heatmaps of the information presented in the main Gene Results or Biogroup Results tabs. The images are rendered with standard R language (R Core Team (2023). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/) based analysis using Heatmaply (Galili, Tal, O'Callaghan, Alan, Sidi, Jonathan, Sievert, Carson (2017). "heatmaply: an R package for creating interactive cluster heatmaps for online publishing." Bioinformatics. doi:10.1093/bioinformatics/btx657) and supporting packages. The data is rendered using default distance and cluster functions.
Visualization supports a minimum of 2 and maximum of 5 biosets. Datatypes supported are gene expression, protein expression, DNA methylation and ATAC-Seq. The displayed results are limited to the top 500 consensus ranked genes or biogroups returned. The full data can be acquired as normal from the respective results tabs using the export button.