To illustrate the results that ISMARA provides, we use a subset of the samples from the GSE26386 (Ernst, 2011) as an example. In this dataset Affymetrix microarrays were used to measure gene expression across eight human cell types. This example dataset is available here, and ISMARA's results are available here.
Note, for most of the images used in the explanation below, clicking on the image links to the corresponding section of the ISMARA result pages.
The main results page contains three main elements. First, the main page provides a list of all regulatory motifs, sorted by the significance that ISMARA assigns to each motif, i.e. the most important regulatory motifs are at the top. Each motif in the list links to results pages for the individual motif. Below the list of motifs there is a list of samples, which provides links to results for individual samples. On the left of the page is a table with links to results on downloadable format, and a link towards the sample averaging interface (described below).
The central element on the main page is the list of motifs. For each motif we provide, from left to right, its name, its significance quantified as a Z-value, the list of transcription factor genes that can bind to the sites of the motif, a thumbnail image of the inferred activities of the motif across the samples, and a sequence logo of the motif. Clicking on the motif name links to a separate page with detailed ISMARA results for this motif. The gene names are linked to the corresponding pages in the NCBI database. Similarly, the thumbnail images are linked to high resolution pictures.
To allow users quick access to motifs that are of particular interest to them, there is a "Search" box at the top of the list which searches through the content of the table. For example, if you type the name of a transcription factor, the table will be automatically reduced to only those entries which match the keyword. The table allows sorting the motifs by their name instead of Z-value by clicking in the corresponding column header cell.
Below the table of motifs is a table listing all samples (sorted alphabetically). Clicking on a sample name links to a page with more extensive results for the sample in question. In case of ChIP-Seq data there is extra column providing links to ChIP-Cor and ChIP-Peak services where you can further analyse the sample with ChIP-Seq tools provided by another website.
Most of ISMARA's results can be downloaded in plain text format from the links provided on the lefthand side of the main page. Each link under the "Downloads" section links to a file containing one or more tab-separated tables. The links correspond to, from top to bottom, a table with inferred motif activities, a table with error bars on the inferred motif activities, a gzipped tar-archive containing all predicted targets for each motif (one file per motif), a table of motifs and their Z-values, and a gzipped tar-archive with all results.
Experiments are often performed in multiple replicates and one would typically be specifically interested in those motifs that behave reproducibly across the replicates. To this end the ISMARA results page offers a special interface where users can provide replicate annotation for their samples, which then enables ISMARA to calculate motif activity profiles that are averaged over replicates using a rigorous Bayesian procedure. A link to the sample averaging interface is provided on the left of the main page, just above the "Download" section.
After clicking "Perform sample averaging" button on the left sidebar of the main page a few new interface elements will appear in the sample table (see figure). Using the drop-down menus, the user can assign every sample to certain condition and assign appropriate names to each of the conditions (time points, tissues, treatment, etcetera) by typing in the box next to the drop-down menu. When the user is finished annotating the samples, the "Submit data for averaging" button should be clicked to perform the averaging analysis. A new page will open which displays the task's status and eventually the averaging results. The user can again specify an email address to receive a notification email when the results are ready.
Note that this approach can easily be extended beyond replicates, i.e. the user can arbitrarily divide the samples into groups and ISMARA will automatically calculate average motif activities and associated standard-deviations for each group of samples.
The sample averaging interface provides one, more advanced, option to eliminate "batch effects". If the replicates came in clearly defined batches, for example, when a time-course was performed multiple times, then the user can also indicate the batch of each sample. This allows for more advanced normalization across the replicates. To use this, the user should click the "Advanced options" button and then assign batches to samples in the same manner as for assigning conditions.
For our sample dataset you can access the results obtained by averaging over the two replicates here.
For each motif, a separate page is provided containing extensive results and information on that motif.
At the top of the page, the motif's name, Z-value, and its sequence logo are shown. Below that is a list with all transcription factor genes thought to bind to the sites of the motif, each with links to corresponding pages in the NCBI database.
For many motifs incorporated into the ISMARA analysis there is more than one TF that can potentially bind to sites for the motif. To help determine which TFs are most likely involved in the activity of a given motif in the dataset in question, ISMARA provides simple correlation analysis between TF gene expression and activity of associated motif.
The "Activity-expression correlation" table shows the Pearson correlation between the motif’s activity profile and the mRNA expression profiles of each of the TFs that can bind to the sites of the motif. The TFs in the list are sorted by the p-value of the correlation. For each of the correlations a link is also provided to a simple scatter plot showing the mRNA expression levels and motif activities across the samples. You can preview the plot by mouseover of the "Click!" link or see a high resolution image by clicking the link.
This figure shows the inferred activities of the motif (including error bars) across all samples, where the samples are ordered alphanumerically. The order of the samples in this graph is thus determined by the naming of the samples provided by the user and this can be used to ensure the samples are ordered in an appropriate way (e.g. if samples come from a time course, numbering the samples by time will result in the graph showing motif activity across time).
In many cases there may be no preferred natural ordering of the samples. In those cases it is more natural to present the motif activities with samples sorted from those in which the motif is most significantly upregulated, to those where it is most significantly downregulated. ISMARA provides such a list of motif z-values, with samples sorted from largest to smallest z-value. For example, from the bar chart of sorted E2F activities, the cell types in which E2F activity is highly upregulated or highly downregulated can be seen at a glance.
It is always desirable to gain some intuition of the pathways and particular biological processes that are targeted by a particular motif. One way of visualizing the functional structure of the predicted targets of a motif, is to represent these as a network, with links between pairs of genes that are known to be functionally related. The STRING database maintains a curated collection of functional links between proteins, where "functional link" can range from direct physical interaction, to over-representation of the protein pair within abstracts of scientific articles. ISMARA provides, for each motif, a STRING network picture of the set of predicted targets of the motif (for visibility at most the top 200 targets are shown). The network picture is linked to the STRING interactive page for this particular network with more information and functions. For the example of the targets of the E2F motif shown here, we see a highly connected cluster of genes that are involved in the regulation of the cell cycle (specifically the G1/S transition).
One of our aims is to understand the causal structure of the transcription regulatory network, and a first step in that direction is the prediction of direct regulatory interactions between the motifs. For each motif, we check its list of predicted targets for promoters of TFs that are associated with other motifs. Using this we build a regulatory network where nodes correspond to motifs and a directed edge from motif m to motif m′ occurs whenever a promoter of at least one of the TFs associated with motif m′ is a predicted target of motif m. On the results page for a given motif, we show only part of the interaction network centered around the motif. This network picture provides some level of interactivity. The user can hide/show edges using the +/- buttons or by moving slider at the left of the picture. Placing the mouse cursor over a motif node displays its corresponding Z-value. Placing the mouse cursor over an edge displays a the names of the target genes involved in the edge, as well as the associated target scores. The latter are log-likelihood ratios of the model including and excluding the particular target edge. The edge color indicates the type of the edge: red for edges from the central motif to another motif, blue for another motif regulating the central motif, and violet for interactions between the other motifs. The edge color intensity is proportional to the target score, i.e. more intense means higher score.
ISMARA provides a list of the target promoters for the motif sorted by their target score. In addition every promoter is annotated with information about transcripts and genes which are associated with the promoter. Like the motif list on the main page this table is interactive allowing quick search through the table and sorting by any column. By default the table shows only the top 20 targets but the user can interactively change the number of targets shown. The promoters are linked to SwissRegulon database. In this database the user is provided with a graphical representation of a one kilobase region around the corresponding promoter, displaying TF binding sites, transcripts, and other genomic features.
ISMARA also provides a list of Gene Ontology categories that are enriched among the predicted targets of a motif. Lists are provided for the "biological process", "cellular component", and "molecular function" hierarchies. A p-value for enrichment is calculated using a simple hypergeometric test and only categories with a p-value below 0.05 are shown. The categories are sorted by the fold-enrichment of targets relative to what would be expected by chance. Like the motif index table the GOA table also provides quick search through the table and sorting by different columns.