When a regulatory motif is assigned a positive activity in a given sample, it means that the occurrence of sites for the motif are predicted to lead to an upregulation of the promoter relative to its average expression/chromatin state signal. ISMARA cannot distinguish whether this is due to an increased activity of an activator or a decreased activity of a repressor. Thus, if you are sure that the TFs which bind a motif of your interest are repressors, then high activity in one sample means that these repressors are `less repressing' in that sample compared to other samples. On a page of a regulatory motif there is a section "Activity-expression correlation" where user can see how expression of a transcription factor is associated with the activity of its motif. Often (but not always) this RNA expression profile will show positive correlation with the activities for activators and negative correlation with the activities for repressors.
ISMARA uses a curated list of high quality regulatory motifs that combines data from motif databases, e.g. JASPAR , SwissRegulon , etc. with our own motif finding methods applied to different data-sets. Our motif set is periodically updated to take new experimental data into account. If you believe you have a high quality position-specific weight matrix for a TF currently not represented in ISMARA's collection, please contact us as we would be interested to include it in a future release of ISMARA.
Samples are ordered in alphabetical order by filename. If you
want to make sure they are listed in a particular order
(e.g. ordered in time), give the files appropriate names, e.g.:
00_control.CEL,
01_perturbation_1h.CEL,
02_perturbation_4h.CEL, ...
ISMARA automatically rescales plots and their labels to fit the file names specified by the user. When file names are very long, they will be truncated. However if you move your cursor to a point of interest on the activity plot (or almost any other plot) you will see popup boxes with the full sample name and activity value plus error bar.
On ISMARA's results page there is a button "Perform sample
averaging" which opens options for sample averaging which allows
users to specify which samples should be considered replicates of
the same condition and which samples belong to common batches. For
example: if you have samples done each in triplicate:
cond_1_rep1, cond_1_rep2, cond_1_rep3, ..., cond_N_rep1,
cond_N_rep2, cond_N_rep3, then you should use the drop-down menus
to create N conditions, each containing 3 samples, and then click
"Submit". If the replicates were done in batches, then before
submitting you should click "Advanced options" and assign your
samples to the corresponding batches.
Note that the replicate/batch averaging only affects the inferred
motif activities and their significance levels, i.e. the targets of each
motif are not affected by the sample averaging.
This should never happen. Please contact us if you see such an error.
For each motif ISMARA provides a list of associated TFs and a table with Pearson correlation coefficients and scatter plots of the activity and their expression (mRNA) levels. You need to click on the links to see the plots. If the table is missing it means that the expression of a TF is unknown: either there are no probes complementary it on the microarray or it is not expressed. Note that we do not have expression information for miRNAs.
It is always a matter of choice where to put a cutoff on significance level. The z-statistic roughly corresponds to the number of standard deviations that the motif's activity is away from zero. Thus, for a motif with z-value of 2.0, the motif activity is typically 2 standard deviations away from zero, which is a substantial indication of its significance. As a general guide, the motif activity profile shows error bars on the motif activity for each time point. If these error bars overlap the zero activity axes for all samples, then the motif is likely not significant.
NCBI does not enforce a standard of data format. Even if a platform is the same for two data sets, the actual files might contain differently processed expression levels and be written in a different format. ISMARA accepts the unprocessed microarray files in affymetrix .CELL format and unaligned reads in .FASTQ format or reads aligned to the mouse (mm9, mm10, mm39), human (hg38, hg19, hg18), rat (rn6), zebrafish (dr11), arabidopsis, yeast, E.coli genomes in .BAM/.SAM/.BED format. These can be compressed with zip, bzip2, gzip and tar file compressors. Please do not create any subdirectories in the archives.
We keep each processed data set for 14 days on the ISMARA web server. After this time it is deleted to save space. During this time you can download the report from the `download' section. If you want to keep the results visible for longer, please let us know.
We support the most popular microarrays and regularly extent the list of supported microarrays. Please contact us to indicate which microarrays you would like to see supported by the ISMARA tool.
Processing times depend on the number of data-sets that were submitted and on the size of the data-sets. NGS data-sets typically take longer to process than microarray datasets. We appreciate your patience. If you have not received results after 12 hours, please contact us as this may indicate that something went wrong.
Accuracy is affected by many factors beyond which microarray was used, e.g. the purity of the cell populations and the quality of the experimental preparation of the samples likely play a much larger role. We advice the use of RNA-seq over the use of microarrays.
Some general transcription factors target a substantial fraction of all genes in the genome and inclusion of all the targets in the web page would create unwieldy html files. The full target lists can be downloaded from the download menu (on the left), link "Regulatory interactions". These target files are compressed and the format is tab-delimited fields: promoter, z-value, motif and target transcript list (if associated with the promoter). Eache element of the associated transcript list is a "|"-separated list with fields: transcript, gene symbol, gene ID and gene name.
It is challenging to visualize such large networks in a way that is useful. However, once we have worked out a robust way to usefully visualize the entire network (i.e. not just dumping a hairball) we intend to include this in ISMARA's results.
In principle not. ISMARA was designed to process raw microarray or sequencing data and perform its own uniform normalization and processing procedures. Thus, whenever you have the original CEL files, unmapped or mapped read, it is always preferable to upload these directly. However, if you have only processed data, please contact us for help with processing these.
When multiple transcription factors can bind to the same regulatory sites, the only way to tell for certain which of these TFs is binding to the sites in the particular model system under study, is by follow up experiments. However, to help identify which TF from the family is likely responsible, we provide information on correlation of the transcription factor gene expression levels with corresponding regulatory motif activity. TFs that are not expressed clearly cannot be responsible for the motif activity. Moreover, if one of the TFs shows a strong correlation in its mRNA expression with the inferred motif activities, then this TF is the prime candidate for being responsible for the motif activity in your system. Note that, since our annotation of regulatory motifs is necesssarily incomplete, in some cases the responsible TF may not be in the list of associated TFs.
The NGS data should be presented as alignment files containing the alignments of the sequencing reads to the supported version of genome assemblies. It shoudl typically be possible to provide such read alignments in the standard BED, BAM, or SAM formats. If this is not possible for your data, please contact us.
Yes, it should not be a problem. We have successfully tested ISMARA with 1Tb dataset being uploaded.
A TSC is a set of neighboring, co-expressed Transcription Start Sites (TSSs). For detailed background information please look at: http://genomebiology.com/content/10/7/R79. Our promoters consist of clusters containing TSCs and known starts of mRNA and annotated transcripts. We define the proximal promoter region as the genomic region running from 500 base pairs upstream of the first TSS to 500 base pairs downstream of the last TSS in the cluster. Our transcription factor binding site predictions use thes proximal promoter regions.
Currently human, mouse, rat, zebrafish, yeast, arabidopsis and E.coli are supported. We plan to provide support for more species in the near future. If you are particularly interested in running ISMARA for worm or another organism, please contact us.
All the true replicates which we have looked at so far look very similar in terms of activity profiles. If it is not the case for you, there might be something wrong and you might want to check that there is no mix up of the samples or other error. For example, you could check whether the expression levels in the replicates are close.
Potential strategies include knocking down a transcription factor of interest, overexpressing it, or doing a ChIP experiment with an antibody for the transcription factor, but many other possibilities exist. You could take a look at the following papers that have used ISMARA and, in many cases, performed follow-up experimentation to validate its predictions.
Yes! ISMARA supports FASTQ files. your dataset could be either a set of single end read files or a set of paired-end read files. There is a requirement for paired-end read files. These files should be submitted as two files with suffix "_R1" for the first end and "_R2" for the second end. Example: "sample1_R1.fastq.gz". "sample1_R2.fastq.gz". If your data is a single end then please avoid such suffixes ("_R1" and "_R2") in the file names.
Please write a letter to: swissregulon@gmail.com