When a motif is assigned a positive activity in a given sample, it means that the occurrence of sites for the motif are predicted to lead to an upregulation of the promoter relative to its average expression/chromatin state signal. ISMARA cannot distinguish whether this is due to an increased activity of an activator or a decreased activity of a repressor. Thus, if you are sure that the TFs which bind a motif of your interest are repressors, then high activity in one sample means that these repressors are `less repressing' in that sample compared to other samples. Note that ISMARA also presents information on the RNA expression of the regulators associated with a motif. Often (but not always) this RNA expression profile will show positive correlation with the activities for activators and negative correlation with the activities for repressors.
ISMARA uses a curated list of high quality regulatory motifs that combines data from motif databases, e.g. JASPAR, with our own motif finding methods applied to different data-sets. Our motif set is periodically updated to take new experimental data into account. If you believe you have a high quality position-specific weight matrix for a TF currently not represented in ISMARA's collection, please contact us as we would be interested to include it in a future release of ISMARA.
Samples are ordered in alphabetical order by filename. If you want to make sure they are listed in a particular order (e.g. ordered in time), give the files appropriate names, e.g. 00_control.CEL, 01_perturbation_1h.CEL, 02_perturbation_4h.CEL, ...
ISMARA automatically rescales plots and their labels to fit the file names specified by the user. When file names are very long, they will be truncated. In general, to improve viewability, it is advicable to keep the sample names relatively short.
On ISMARA's results page there is a button "Perform sample averaging" which
opens options for sample averaging which allows users to specify which
samples should be considered replicates of he same condition and which
samples belong to common batches. For example: if you have samples done each
in triplicate: cond_1_rep1, cond_1_rep2, cond_1_rep3, ..., cond_N_rep1,
cond_N_rep2, cond_N_rep3, then you should use the drop-down menus to create N
conditions, each containing 3 samples, and then click "Submit". If the
replicates were done in batches, then before submitting you should click
"Advanced options" and assign your samples to the corresponding
Note that the replicate/batch averaging only affects the inferred motif activities and their significance levels, i.e. the targets of each motif are not affected by the sample averaging.
This should never happen. Please contact us if you see such an error.
ISMARA: Automated modeling of genomic signals as a democracy of regulatory
Piotr J. Balwierz, Mikhail Pachkov, Phil Arnold, Andreas J. Gruber, Mihaela Zavolan & Erik van Nimwegen Genome Research 2014
For each motif ISMARA provides a list of associated TFs and a table with Pearson correlation coefficients and scatter plots of the activity and their expression (mRNA) levels. You need to click on the links to see the plots. If the table is missing it means that the expression of a TF is unknown: either there are no probes complementary it on the microarray or it is not expressed. Note that we do not have expression information for miRNAs.
It is always a matter of choice where to put a cutoff on significance level. The z-statistic roughly corresponds to the number of standard deviations that the motif's activity is away from zero. Thus, for a motif with z-value of 2.0, the motif activity is typically 2 standard deviations away from zero, which is a substantial indication of its significance. As a general guide, the motif activity profile shows error bars on the motif activity for each time point. If these error bars overlap the zero activity axes for all samples, then the motif is likely not significant.
NCBI does not enforce a standard of data format. Even if a platform is the same for two data sets, the actual files might contain differently processed expression levels and be written in a different format. ISMARA accepts the unprocessed microarray files and reads aligned to the mouse (mm9) and human (hg19, hg18) genomes. These can be compressed with zip, bzip2, gzip and tar file compressors. Please do not create any subdirectories in the archives.
We keep each processed data set for 14 days. After this time it is deleted to save space. During this time you can download the report from the `download' section. If you want to keep the results visible for longer, please let us know.
We support the most popular microarrays and regularly extent the list of supported microarrays. Please contact us to indicate which microarrays you would like to see supported.
Processing times depend on the number of data-sets that were submitted and on the size of the data-sets. NGS data-sets typically take longer to process than microarray datasets. We appreciate your patience. If you have not received results after 12 hours, please contact us as this may indicate that something went wrong.
Accuracy is affected by many factors beyond which microarray was used, e.g. the purity of the cell populations and the quality of the experimental preparation of the samples likely play a much larger role. We advice the use of RNA-seq over the use of microarrays.
Some general TFs target a substantial fraction of all genes in the genome and inclusion of all the targets in the web page would create unwieldy html files. The full target lists can be downloaded from the download menu (on the left). These target files are compressed and the format is tab-delimited fields: promoter, z-value, motif and target RefSeq transcript list (if associated with the promoter). One element of the RefSeq list is a "|"-separated list with fields: transcript, gene symbol, GeneBank gene ID and gene name.
It is challenging to visualize such large networks in a way that is useful. However, once we have worked out a robust way to usefully visualize the entire network (i.e. not just dumping a hairball) we intend to include this in ISMARA's results.
In principle not. ISMARA was designed to process raw microarray or sequencing data and perform its own uniform normalization and processing procedures. Thus, whenever you have the original CEL files, or read mappings, it is always preferable to upload these directly. However, if you have only processed data, please contact us for help with processing these.
When multiple TFs can bind to the same regulatory sites, the only way to tell for certain which of these TFs is binding to the sites in the particular model system under study, is by follow up experiments. However, to help identify which TF from the family is likely responsible, we provide information on the mRNA expression levels of all TFs associated with each motif. TFs that are not expressed clearly cannot be responsible for the motif activity. Moreover, if one of the TFs shows a strong correlation in its mRNA expression with the inferred motif activities, then this TF is the prime candidate for being responsible for the motif activity in your system. Note that, since our annotation of regulatory motifs is necesssarily incomplete, in some cases the responsible TF may not be in the list of associated TFs.
The NGS data should be presented as alignment files containing the alignments of the sequencing reads to the hg19, hg18, or mm9 genome assemblies. It shoudl typically be possible to provide such read alignments in the standard BED, BAM, or SAM formats. If this is not possible for your data, please contact us.
Yes, it should not be a problem. We have successfully tested ISMARA with 20 GB of uploaded data.
A TSC is a set of neighboring, co-expressed Transcription Start Sites (TSSs). For detailed background information please look at: http://genomebiology.com/content/10/7/R79. Our promoters consist of clusters containing TSCs and known starts of mRNA and Refseq transcripts. We define the proximal promoter region as the genomic region running from 500 base pairs upstream of the first TSS to 500 base pairs downstream of the last TSS in the cluster. Our transcription factor binding site predictions use thes proximal promoter regions.
Currently human, mouse, and Saccharomyces cerevisiae are supported. We plan to provide support for Drosophila and E. coli in the near future. If you are particularly interested in running ISMARA for worm or another organism, please contact us.
All the true replicates which we have looked at so far look very similar in terms of activity profiles. If it is not the case for you, there might be something wrong and you might want to check that there is no mix up of the samples or other error. For example, you could check whether the expression levels in the replicates are close.
Potential strategies include knocking down a TF of interest, overexpressing it, or doing a ChIP experiment with an antibody for the TF, but many other possibilities exist. You could take a look at the following papers that have used ISMARA and, in many cases, performed follow-up experimentation to validate its predictions.
Yes! ISMARA now supports FASTQ files. your dataset could be either a set of single end read files or a set of paired-end read files. There is a requirement for paired-end read files. These files should be submitted as two files with suffix "_R1" for the first end and "_R2" for the second end. Example: "sample1_R1.fastq.gz". "sample1_R2.fastq.gz". If your data is a single end then please avoid such suffixes ("_R1" and "_R2") in the file names.
Please write a letter to: firstname.lastname@example.org