ISMARA Frequently Asked Questions:

Q: My favorite transcription factor (TF) is a repressor, what does it mean when ISMARA says it is active?
Q: Why is my favorite TF not listed?
Q: In which order will the samples be shown?
Q: I cannot read the sample labels in the activity plot. They look truncated.
Q: I have replicate experiments. How exactly should I average them?
Q: I am getting an error while uploading expression data: `connection was reset'
Q: I have found ISMARA useful in my research. How to cite ISMARA?
Q: Can I correlate TF expression with motif activity?
Q: Which z-value is significant enough to state that a TF is changing its activity?
Q: I downloaded a sample from NCBI's Gene Expression Omnibus. Why doesn't it work with ISMARA?
Q: I cannot access my results anymore. It says: `page not found'
Q: A microarray of my choice is not currently supported. Will it be in the future?
Q: I uploaded my expression data a couple of hours ago and I still haven't had a result. How long do I need to wait?
Q: Which microarrays would you recommend for me if I want my ISMARA results be as accurate as possible?
Q: Why are only the top 200 target promoters listed? How do I get the full list?
Q: Why is no single global transcription regulatory network plotted?
Q: A motif is associated with a family of TFs. How can I know which one of the family is responsible?
Q: Can I provide NGS reads in some other format than BED?
Q: My BED files are really big. Is it OK to upload them?
Q: What is the definition of a promoter, and how does it relate to Transcription Start Clusters (TSCs)?
Q: Is there ISMARA for yeast, worm or fly?
Q: Replicated experiments don't get similar predicted motif activities in the ISMARA output. Why?
Q: How do I validate ISMARA's predictions in a follow-up experiment?
Q: Can I upload FASTQ files?
Q: I have some other questions which are not listed here.

Q: My favorite transcription factor (TF) is a repressor, what does it mean when ISMARA says it is active?

When a regulatory motif is assigned a positive activity in a given sample, it means that the occurrence of sites for the motif are predicted to lead to an upregulation of the promoter relative to its average expression/chromatin state signal. ISMARA cannot distinguish whether this is due to an increased activity of an activator or a decreased activity of a repressor. Thus, if you are sure that the TFs which bind a motif of your interest are repressors, then high activity in one sample means that these repressors are `less repressing' in that sample compared to other samples. On a page of a regulatory motif there is a section "Activity-expression correlation" where user can see how expression of a transcription factor is associated with the activity of its motif. Often (but not always) this RNA expression profile will show positive correlation with the activities for activators and negative correlation with the activities for repressors.

Q: Why is my favorite TF not listed?

ISMARA uses a curated list of high quality regulatory motifs that combines data from motif databases, e.g. JASPAR , SwissRegulon , etc. with our own motif finding methods applied to different data-sets. Our motif set is periodically updated to take new experimental data into account. If you believe you have a high quality position-specific weight matrix for a TF currently not represented in ISMARA's collection, please contact us as we would be interested to include it in a future release of ISMARA.

Q: In which order will the samples be shown?

Samples are ordered in alphabetical order by filename. If you want to make sure they are listed in a particular order (e.g. ordered in time), give the files appropriate names, e.g.:
00_control.CEL,
01_perturbation_1h.CEL,
02_perturbation_4h.CEL, ...

Q: I cannot read the sample labels in the activity plot.

ISMARA automatically rescales plots and their labels to fit the file names specified by the user. When file names are very long, they will be truncated. However if you move your cursor to a point of interest on the activity plot (or almost any other plot) you will see popup boxes with the full sample name and activity value plus error bar.

Q: I have replicate experiments. How exactly should I average them?

On ISMARA's results page there is a button "Perform sample averaging" which opens options for sample averaging which allows users to specify which samples should be considered replicates of the same condition and which samples belong to common batches. For example: if you have samples done each in triplicate: cond_1_rep1, cond_1_rep2, cond_1_rep3, ..., cond_N_rep1, cond_N_rep2, cond_N_rep3, then you should use the drop-down menus to create N conditions, each containing 3 samples, and then click "Submit". If the replicates were done in batches, then before submitting you should click "Advanced options" and assign your samples to the corresponding batches.
Note that the replicate/batch averaging only affects the inferred motif activities and their significance levels, i.e. the targets of each motif are not affected by the sample averaging.

Q: I am getting an error while uploading expression data: `connection was reset'

This should never happen. Please contact us if you see such an error.

Q: How should ISMARA be cited?

ISMARA: Automated modeling of genomic signals as a democracy of regulatory motifs
Piotr J. Balwierz, Mikhail Pachkov, Phil Arnold, Andreas J. Gruber, Mihaela Zavolan & Erik van Nimwegen Genome Research 2014

Q: Can I correlate TF expression with motif activity?

For each motif ISMARA provides a list of associated TFs and a table with Pearson correlation coefficients and scatter plots of the activity and their expression (mRNA) levels. You need to click on the links to see the plots. If the table is missing it means that the expression of a TF is unknown: either there are no probes complementary it on the microarray or it is not expressed. Note that we do not have expression information for miRNAs.

Q: Which z-value is significant enough to state that a TF is changing its activity?

It is always a matter of choice where to put a cutoff on significance level. The z-statistic roughly corresponds to the number of standard deviations that the motif's activity is away from zero. Thus, for a motif with z-value of 2.0, the motif activity is typically 2 standard deviations away from zero, which is a substantial indication of its significance. As a general guide, the motif activity profile shows error bars on the motif activity for each time point. If these error bars overlap the zero activity axes for all samples, then the motif is likely not significant.

Q: I downloaded a sample from NCBI's Gene Expression Omnibus. Why doesn't it work with ISMARA?

NCBI does not enforce a standard of data format. Even if a platform is the same for two data sets, the actual files might contain differently processed expression levels and be written in a different format. ISMARA accepts the unprocessed microarray files in affymetrix .CELL format and unaligned reads in .FASTQ format or reads aligned to the mouse (mm9, mm10, mm39), human (hg38, hg19, hg18), rat (rn6), zebrafish (dr11), arabidopsis, yeast, E.coli genomes in .BAM/.SAM/.BED format. These can be compressed with zip, bzip2, gzip and tar file compressors. Please do not create any subdirectories in the archives.

Q: I cannot access my results anymore. It says: `page not found'

We keep each processed data set for 14 days on the ISMARA web server. After this time it is deleted to save space. During this time you can download the report from the `download' section. If you want to keep the results visible for longer, please let us know.

Q: My microarray is currently not supported. Will it be in the future?

We support the most popular microarrays and regularly extent the list of supported microarrays. Please contact us to indicate which microarrays you would like to see supported by the ISMARA tool.

Q: I uploaded my expression data a couple of hours ago and I still haven't had a result. How long do I need to wait?

Processing times depend on the number of data-sets that were submitted and on the size of the data-sets. NGS data-sets typically take longer to process than microarray datasets. We appreciate your patience. If you have not received results after 12 hours, please contact us as this may indicate that something went wrong.

Q: Which microarrays would you recommend for me if I want my ISMARA results be as accurate as possible?

Accuracy is affected by many factors beyond which microarray was used, e.g. the purity of the cell populations and the quality of the experimental preparation of the samples likely play a much larger role. We advice the use of RNA-seq over the use of microarrays.

Q: Why are only the top 200 target promoters listed? How do I get the full list?

Some general transcription factors target a substantial fraction of all genes in the genome and inclusion of all the targets in the web page would create unwieldy html files. The full target lists can be downloaded from the download menu (on the left), link "Regulatory interactions". These target files are compressed and the format is tab-delimited fields: promoter, z-value, motif and target transcript list (if associated with the promoter). Eache element of the associated transcript list is a "|"-separated list with fields: transcript, gene symbol, gene ID and gene name.

Q: Why is no single global transcription regulatory network plotted?

It is challenging to visualize such large networks in a way that is useful. However, once we have worked out a robust way to usefully visualize the entire network (i.e. not just dumping a hairball) we intend to include this in ISMARA's results.

Q: I have an Excel table which lists genes and their fold-changes. Can ISMARA read it?

In principle not. ISMARA was designed to process raw microarray or sequencing data and perform its own uniform normalization and processing procedures. Thus, whenever you have the original CEL files, unmapped or mapped read, it is always preferable to upload these directly. However, if you have only processed data, please contact us for help with processing these.

Q: A motif is associated with a family of transcription factors. How can I know which one of the family is responsible?

When multiple transcription factors can bind to the same regulatory sites, the only way to tell for certain which of these TFs is binding to the sites in the particular model system under study, is by follow up experiments. However, to help identify which TF from the family is likely responsible, we provide information on correlation of the transcription factor gene expression levels with corresponding regulatory motif activity. TFs that are not expressed clearly cannot be responsible for the motif activity. Moreover, if one of the TFs shows a strong correlation in its mRNA expression with the inferred motif activities, then this TF is the prime candidate for being responsible for the motif activity in your system. Note that, since our annotation of regulatory motifs is necesssarily incomplete, in some cases the responsible TF may not be in the list of associated TFs.

Q: Can I provide NGS reads in some other format than BED/BAM/SAM?

The NGS data should be presented as alignment files containing the alignments of the sequencing reads to the supported version of genome assemblies. It shoudl typically be possible to provide such read alignments in the standard BED, BAM, or SAM formats. If this is not possible for your data, please contact us.

Q: My BED/BAM/SAM files are really big. Is it OK to upload them?

Yes, it should not be a problem. We have successfully tested ISMARA with 1Tb dataset being uploaded.

Q: What is the definition of a promoter, and how does it relate to Transcription Start Clusters (TSCs)?

A TSC is a set of neighboring, co-expressed Transcription Start Sites (TSSs). For detailed background information please look at: http://genomebiology.com/content/10/7/R79. Our promoters consist of clusters containing TSCs and known starts of mRNA and annotated transcripts. We define the proximal promoter region as the genomic region running from 500 base pairs upstream of the first TSS to 500 base pairs downstream of the last TSS in the cluster. Our transcription factor binding site predictions use thes proximal promoter regions.

Q: Is there ISMARA for worm, fly, etc.?

Currently human, mouse, rat, zebrafish, yeast, arabidopsis and E.coli are supported. We plan to provide support for more species in the near future. If you are particularly interested in running ISMARA for worm or another organism, please contact us.

Q: Replicated experiments don't get similar predicted motif activities in the ISMARA output. Why?

All the true replicates which we have looked at so far look very similar in terms of activity profiles. If it is not the case for you, there might be something wrong and you might want to check that there is no mix up of the samples or other error. For example, you could check whether the expression levels in the replicates are close.

Q: How do I validate ISMARA's predictions in a follow-up experiment?

Potential strategies include knocking down a transcription factor of interest, overexpressing it, or doing a ChIP experiment with an antibody for the transcription factor, but many other possibilities exist. You could take a look at the following papers that have used ISMARA and, in many cases, performed follow-up experimentation to validate its predictions.

Q: Can I upload FASTQ files?

Yes! ISMARA supports FASTQ files. your dataset could be either a set of single end read files or a set of paired-end read files. There is a requirement for paired-end read files. These files should be submitted as two files with suffix "_R1" for the first end and "_R2" for the second end. Example: "sample1_R1.fastq.gz". "sample1_R2.fastq.gz". If your data is a single end then please avoid such suffixes ("_R1" and "_R2") in the file names.

Q: I have some other questions which are not listed here.

Please write a letter to: swissregulon@gmail.com

Back to ISMARA