Overview of ISMARA results

Introduction

For this guide we use ISMARA results of Illumina Human Body Map 2.0 project. The original data avaliable from the GEO database project page (GSE30611). This is RNA-Seq profiling of 16 human tissues. Here are ISMARA results for this dataset. This guide is illustrated with images which are linked to the corresponding ISMARA pages.

Main page

The main results page contains three main elements. First, the main page provides a table of all regulatory motifs, sorted by the significance that ISMARA assigns to each motif, i.e. the most important regulatory motifs are at the top. Each motif in the list links to results pages for the individual motif. Below the motif tables there is a table of samples, which provides links to results for individual samples. This table also used for grouping samples for sample averaging procedure. On the left of the page is a table with links to results in flat file format, and a button for performing the sample averaging. Finally on the left user have a possibility to search for a gene of interest.

Regulatory motif table

The regulatory motif table lists regulatory motifs ordered by their significance by default. For each regulatory motif we provide, from left to right, its name, its significance quantified as a Z-value, the list of transcription factor genes that can bind to the sites of the motif, a thumbnail image of the inferred activities of the motif across the samples, and a sequence logo of the motif. Clicking on the motif name links to a separate page with detailed ISMARA results for this motif. The gene names are linked to the corresponding pages in the NCBI database. Similarly, the thumbnail images are linked to high resolution pictures.

There is a search box on the top of the table. Users should use this to search for regulators of particular interest. For example, if you type the name of a transcription factor, the table will be automatically reduced to only those entries which match the keyword. There is also a dropdown menu where you can choose how many entries are shown on the main page. At the bottom of the page there are navigation buttons to traverse through the table. You can also sort the table either by Z-value optgroup by a motif name by cliocking on the name of the corresponding column.

There are "grouped" regulatory motifs which include multiple transcription factors. Like "HNF1A_HNF1B" in the oicture above. To avoid cumbersomeness of multiple motif logos in a single table cell we show only a single motif logo from a group. In order to see logos of all transcription factor motifs please click the motif name to open a regulatory motif page.

Sample table

Sample table lists all samples alphabetically. Clicking on a sample name opens a sample page with more detailed results for the sample of interest.

If you select "Perform sample averaging" option from navigation menu from the left then this table is used for grouping samples. The sample averaging is explained underneath.

Downloads

Sonme of ISMARA's results can be downloaded in plain text format from the links provided on the lefthand side of the main page. Each link under the "Downloads" section links to a file containing one or more tab-separated tables.

Activity table - is a table with fitted activities. First row correspond to names of regulatory motifs. First column corresponds to sample names. It could be direcly read with R function read.table() or Pandas function read_table().
Activity delta table - is a table with standard deviations of fitted activities from the table above. It is the same format as the activity table. The first row contains names of regulatory motifs. The first column corresponds to sample names. It could be direcly read with R function read.table() or Pandas function read_table().
Regulatory interactions - this tar archive contains directory with files named by regulatory motifs. Each file contains a table with targets of a motif. First column corresponds to promoter name, second contains log-likelihood score of a target, third is a regulator name, all ther columns contain transcripts associated with the promoter. Associated transcript record includes transcript id, gene name, gene id and gene description, separated by | symbol. One promoter could be associated with multiple transcripts.
Motifs sorted by significance - this file contains two columns: first column contains regulatory motif names, second column contains Z-values (significance). The file is sorted by Z-values. These are the same as first two columns in the motif table.
Download the whole report - this tar archive includes all html pages, images and data files above. Users are strongly recommended to download this archive for off-line browsing since online report will be removed after two weeks from the ISMARA server.

Log-likelihood target score is a relative decrease in square-deviation between predicted and observed signal in case of taking in acoount regulation of the target by the regulatory motif. You can find detailed explanation in section 1.9 of supplementary material for the ISMARA paper.

Sample averaging

Contrasts between sample groups

Experiments are often performed in multiple replicates and one would typically be specifically interested in those motifs that behave reproducibly across the replicates. To this end the ISMARA results page offers a special interface where users can provide replicate annotation for their samples, which then enables ISMARA to calculate motif activity profiles that are averaged over replicates using a rigorous Bayesian procedure.

Note that this approach can easily be extended beyond replicates, i.e. the user can arbitrarily divide the samples into groups and ISMARA will automatically calculate average motif activities and associated standard-deviations for each group of samples. For example one might contrast cancer versus non-cancer groups like we have done it for GNF SymAtlas + NCI-60 cancer cell lines dataset . We combined all non-cancer samples in one group and all cancer samples in another group. Here are results for cancer vs non-cancer averaging. .

To start the averaging procedure please click "Perform sample averaging" button to activate the averaging interface. You can find this button is provided in the navigation menu, just above the "Download" section.

After clicking "Perform sample averaging" button on the left sidebar of the main page a few new interface elements will appear in the sample table (see figure). Using the drop-down menus, an user can assign every sample to certain condition and assign appropriate names to each of the conditions (time points, tissues, treatment, etcetera) by typing in the box next to the drop-down menu. Please note that at start drop-down menu automatically extended. At start it has only "condition1" option available. Once you assing a sample to "condition1" the "condition2" option is added to the menu and so on. When you finished annotating the samples please check correctness of email and project name input fields. By default email is filled with email used for submitting the original dataset, project is created from original project name by adding "avrg:" prefix. Next click the "Submit data for averaging" button should be clicked to perform the averaging analysis. A new page will open which displays the task's status and eventually the averaging results. If you provided your email address then you receive a notification email when the results are ready.

The sample averaging interface provides one more advanced option to eliminate "batch effects". If the replicates came in clearly defined batches, for example, when a time-course was performed multiple times, and there is clear systematic shift in the batches then the user can also indicate the batch of each sample. This allows for more advanced normalization across the replicates. To use this, the user should click the "Advanced options" button and then assign batches to samples in the same manner as for assigning conditions. The batch effect correction procedure requires that all batches contain exactly the same set of condiitons!

Motif page

For each motif we provide a separate page with extensive information: activity profile and Z-values across conditions, list of regulator targets, first level regulatory network with other regulators, activity-expression correlation of a motif, gene category enrichment with regulator targets, etc.

Motif information

At the top of the page, the motif's name, Z-value, and its sequence logo are shown. For grouped motifs all logos in a group are shown.

Next is a list with all transcription factor genes thought to bind to the sites of the motif, each with links to corresponding pages in multiple databses like NCBI, ENsembl, etc.

Activity-expression correlation

For many regulatory motifs incorporated into the ISMARA analysis there is more than one TF that can potentially bind to sites for the motif. To help determine which TFs are most likely involved in the activity of a given motif in the dataset in question, ISMARA provides simple correlation analysis between TF gene expression and activity of associated motif.

The "Activity-expression correlation" table shows the Pearson correlation between the motif’s activity profile and the mRNA expression profiles of each of the TFs that can bind to the sites of the motif. The TFs in the list are sorted by the p-value of the correlation. For each of the correlations a link is also provided to a simple scatter plot showing the mRNA expression levels and motif activities across the samples. You can preview the plot by mouseover of the "Click!" link or see a high resolution image by clicking the link. Transcription factor could have multiple promoters. In the table only one propmoter with best correlation cefficient is shown.

Regulatory motif activity profile

This figure shows the inferred activities of the regulatory motif (including error bars) across all samples, where the samples are ordered alphanumerically. The order of the samples in this graph is thus determined by the naming of the files provided by the user and this can be used to ensure the samples are ordered in an appropriate way (e.g. if samples come from a time course, numbering the samples by time will result in the graph showing motif activity across time).

The activity profile illustrates how expression of the regulatory motif targets is changing on average across conditions. For example in the plot underneath targets of HNF1A and HNF1B transcription factors you can see that on average target expression of these regulators is increased in kidney and liver in comparison with other tissues.

Z-values bar chart

In many cases there may be no preferred natural ordering of the samples. In those cases it is more natural to present the regulatory motif activities with samples sorted from those in which the motif is most significantly upregulated, to those where it is most significantly downregulated. ISMARA provides such a list of motif z-values, with samples sorted from largest to smallest z-value. For example, from the bar chart of sorted HNF1A_HNF1B activities, the cell types in which HNF1A_HNF1B activity is highly upregulated or highly downregulated can be seen at a glance.

Protein-protein interaction network of a regulatory motif targets according to the STRING database

It is always desirable to gain some intuition of the pathways and particular biological processes that are targeted by a particular regulatory motif. One way of visualizing the functional structure of the predicted targets of a motif, is to represent these as a network, with links between pairs of genes that are known to be functionally related. The STRING database maintains a curated collection of functional links between proteins, where "functional link" can range from direct physical interaction, to over-representation of the protein pair within abstracts of scientific articles. ISMARA provides, for each motif, a STRING network picture of the set of predicted targets of the motif (for visibility at most the top 100 targets are shown). The network picture is linked to the STRING interactive page for this particular network with more information and functions. For the example of the targets of the HNF1A_HNF1B motif shown here, we see a highly connected cluster of genes that are belong to various metabolic processes, transport and stress response.

STRING: protein-protein interaction network of top targets

First level regulatory network

One of our aims is to understand the causal structure of the transcription regulatory network, and a first step in that direction is the prediction of direct regulatory interactions between the regulatory motifs. For each motif, we check its list of predicted targets for promoters of transcription factors that are associated with other regulatory motifs. Using this we build a regulatory network where nodes correspond to motifs and a directed edge from motif m to motif m′ occurs whenever a promoter of at least one of the transcription factor associated with motif m′ is a predicted target of motif m. On the results page for a given motif, we show only part of the interaction network centered around the motif. This network picture provides some level of interactivity. The user can hide/show edges using the +/- buttons or by moving slider at the left of the picture. Placing the mouse cursor over a motif node displays its corresponding Z-value. Placing the mouse cursor over an edge displays a the names of the target genes involved in the edge, as well as the associated target scores. The latter are log-likelihood ratios of the model including and excluding the particular target edge. The edge color indicates the type of the edge: red for edges from the central motif to another motif, blue for a motif regulating the central motif, and violet for interactions between the other motifs. The edge color intensity is proportional to the target score, i.e. more intense means higher score.

first level interaction network of HNF1A_HNF1B

Regulatory motif top targets

ISMARA provides a list of the top 200 target promoters for the motif sorted by their target score. In addition, every promoter is annotated with information about transcripts and genes which are associated with the promoter. Like the motif list on the main page this table is interactive allowing quick search through the table and sorting by any column. By default the table shows only the top 10 targets but the user can interactively change the number of targets shown. The promoters are linked to SwissRegulon database . In this database the user is provided with a graphical representation of a one kilobase region around the corresponding promoter, displaying TF binding sites, transcripts, and other genomic features.

Gene category enrichment analysis

ISMARA also provides a list of Gene Ontology categories and two sets of categories from MSIG database that are enriched among the predicted targets of a regulatory motif. Lists are provided for the "biological process", "cellular component", and "molecular function", "canonical pathways" and "reactome pathways" category sets. Enrichment calculated as:

Total log-likelihood - sum of log-likelihood scores of all regulatory motif targets in the given category.
Log-likelihood per target - total log-likelihood devided by a number of genes in the category.

By defaul categories are sorted by the total log-likelihood. This type of sorting usually bring to the top general categories. You can also sort the table by log-likelihood per target, which brings up more specific gene categories. Like the motif index table the GOA table also provides quick search through the table.

Sample page

For every sample we provide a barplot od top 10 and bottom 10 Z-values (significances) of regulatory motifs in this particular sample. The shown regulatory motifs correspond to regulators whose targets are most significantly upregulated or downregulated respectively in this sample.

Z-values (signiifcances) for all motifs are listed in a table afyer the barplot. The table could be sorted by Z-values and could be searched using the search box.

Mean activities page

Although, typically, users are most interested in explaining expression changes across the samples, in some cases users might also be interested in knowing to what extent the absolute average levels of the promoters across the samples can be fit in terms of ‘mean activities’ of the regulatory motifs, i.e. to learn which regulatory motifs are most predictive of consistently high or low absolute expression across the conditions.

Important! Mean activity of a regulatory motif is not mean of the motif activities across the samples!

First we provide barplot of 10 motifs with highest and 10 motifs with lowest mean activities. Here the regulatory motifs are sorted only by the mean activity value ignoring the corresponding error bars. The error bars are also shown in the plot.

Barchart: Top 10 and bottom 10 mean activities.

Next we provide a barplot of top 10 and bottom 10 mean activity significances (Z-values) of regulatory motifs.

Barchart: Top 10 and bottom 10 mean activitiy significances.

Fitted mean activities and their Z-values are listed in the table. Liked other tables you can search for your transcription factor of interest using the search box on the top. Regulatory motif names are linked to the corresponding motif pages.

Mean activity table of regulatory motifs.

Gene search functionality

In the left side menu just above the downloads section there is a search box with button "Search gene".

Just type in a gene name or gene ID of the interest and click "Search gene" button. The search input supports for gene names. Type at least 2 letters and a white box appears with all avaliable completions for your search term. Please take in account that list of gne names is limited first by a presence of genes in the gene annoation used, second by the expression of a gene in the given dataset (i.e. non-expressed genes are not available).

After clicking the button, ISMARA needs some time to generete the corresponding page. Please wait some time (5-30 seconds) and then you will be redirected to the page with the results.

Gene search page

The gene search results page shows first the expression profiles of all promoters associated with the gene of interest. The promoter names are shown in the legend on the right of the plot.

Below the promoter expression plot there are separate sections for every promoter associated with the gene. The promoter section starts with information about the promoter.

Fraction of promoter variance explainded by the MARA model.
Link to the SwissRegulon database genome browser showing genomic location around the promoter. The genomic browser page visualize surrounding transcripts and predicted transcription factor binding sites.
A list of genes and transcripts associated with the promoter. The genes and transcript names are linked to the corresponding databases.

The last part of the promoter section contains a plot with original expression profile of the promoter and fitted expression profile of the promoter. Next to the plot there is a table listing regulatory motifs which have predicted transcription factor binding sites in the given promoter region. Each motif name is coupled with a check box. You can click the checkbox to see contribution of a regulatory motif in the total fitted expression profile. By default the predicted expression profile corresponds to mean expression of a promoter across samples and across all promoters.

Original and fitted expression profiles of the promoter

If you move your cursor over the profile you can see the corresponding expression values and full name of the sample.

The regulatory motif table lists motifs which have a predicted transcription factor binding site in the promoter region. For every motif table shows "ChiSq" ( log-likelihood score ), sitecount - sum of all motif binding sites probabilities within promoter region, and regulatory motif significance (Z-value). If you click on a checkbox in front of a motif name then fitted expression profile changes showing contribution of the motif into the fit. If you deselect a checkbox then the contribution is removed from the predicted expression profile. Under the table there are two buttons "All on" and "All off" which are equivalent to selecting all the motif checkboxes and deselecting all checkboxes respectively.

Promoter FOV table

In the navigation menu on the left of the main page there is a link "All promoters sorted by FOV" it takes you to page with table which contains some statistics for every promoter expressed in a dataset. This table takes usually a few seconds to load because of its size.

First column - promoter ID, which is linked to the gene search page for this single promoter only.
Second column - mean expression of the promoter across all conditions.
Third column - mean expression standard deviation. So you can estimate how much expression is changing across condiitons.
Fourth column - fraction of the promoter expression explained variance.
Fifth column - gene associated with the promoter. First is a gene name followed by gene description. The gene name is linked to corresponding gene search page.

There is a search box on the top of the table. You can search the table for a gene of interest or use this function to select only promoters with "chrX", etc. All columns are sortable. By default the table is sorted by FOV column.

Overview of the ISMARA result pages.