
In this directory are 3 input files for phylogibbs:

lexA.fna
YBR093C_al.fna
GAT1_regions.fna
synthetic.fna

Output files obtained by running on these input files have names:
lexA.out lexA.track
YBR093C.1.out YBR093C.1.track 
YBR093C.2.out YBR093C.2.track
YBR093C.3.out YBR093C.3.track
GAT1_regions.out GAT1_regions.track
synthetic.out synthetic.track

In each file the precise command-line options used are listed. Below
we describe the options used in each of these runs.

In some of the runs we use a file called "backgroundfile" that
contains non-coding sequences from yeast (to build a background
model). We do not include this file here because of its size. A
publicly available file of intergenic yeast sequences such as the
following will work nicely, and may be used instead of
"backgroundfile" in the commandlines below:
ftp://genome-ftp.stanford.edu/pub/yeast/sequence/genomic_sequence/orf_dna/archive/utr5_sc_500.20040206.fasta.gz


INPUT FILES AND COMMAND-LINE DESCRIPTIONS

1) lexA.fna
The input file contains the upstream regions from E. coli genes that
are known to be regulated by the E. coli transcription factor lexA.

Command-line:
../src/phylogibbs -D 0 -m 18 -y 14 -z 1 -f lexA.fna -o lexA.out -t lexA.track

-D 0: Since the sequences are not phylogenetically related we turn
phylogenetic scoring off.
-m 18: We look for a motif of width 18.
-y 14: We expect 14 sites in total.
-z 1:  We look for a single motif only.
-f lexA.fna: Specifies the input file.
-o lexA.out: Specifies the output file for the annealing results.
-t lexA.track: Specifies the output file for the tracking results
(posterior probabilities).

2) YBR093C_al.fna 
The input file contains the dialign alignment of the 5 upstream regions of
the yeast gene YBR093C from the species:
S. cerevisiae
S. paradoxus
S. mikatae
S. kudriavzevii
S. bayanus

Command-line of the first run:
../src/phylogibbs -D 1 -L "(cer:0.8,par:0.8,mik:0.0.58,kud:0.5,bay:0.45)" -m 10 -T 0.5 -y 5 -z 1 -N 3 -F backgroundfile -f YBR093C_al.fna -o YBR093C.1.out -t YBR093C.1.track

-D 1: We now use phylogenetic scoring and treat the alignment flexibly
      (aligned blocks containing gaps are split into smaller blocks without
      gaps)
-L "(cer:0.8,par:0.8,mik:0.0.58,kud:0.5,bay:0.45)": Phylogenetic tree
   relating the species in Newick format. Here we have simplified the
   tree by assuming a star-topology. The number after the colon for
   each species gives the "proximity" to the common ancestor which is
   the probability that a neutrally evolving nucleotide has not been
   substituted since divergence from the common ancestor. 
-m 10: We look for a motif of width 10
-y 5: We search for a total of 5 sites.
-z 1: We search for one motif only in this run.
-S 200: We run for 200 cycles as opposed to the default of 100 cycles.
-N 3: We use a Markov-model of order 3 for the background model. That
      is, the background probability of a base depends on the 3
      neighboring nucleotides to its left.
-F backgroundfile: We use the file "backgroundfile" (a multi-fasta
                   file with intergenic sequences from cerevisiae to
                   set the background model (the input data would not
                   suffice to fit a model with so many (3*4^3)
                   parameters).
-f YBR093C_al.fna: Specifies the input file.
-o YBR093C.1.out: Specifies the output file for the annealing result.
-t YBR093C.1.track: Specifies the output file for the trackin result.

Command-line of the second run:
../src/phylogibbs -D 1 -L "(cer:0.8,par:0.8,mik:0.0.58,kud:0.5,bay:0.45)" -m 10 -y 9 -z 3 -T 0.5 -R -N 3 -F backgroundfile -f YBR093C_al.fna -o YBR093C.2.out -t YBR093C.2.track

-D 1: see previous run
-L "(cer:0.8,par:0.8,mik:0.0.58,kud:0.5,bay:0.45)": see previous run
-m 10: See previous run.
-y 9: We now look for 9 sites in total.
-z 3: We now allow up to 3 different motifs.
-T 0.5: Instead of using a uniform prior over the space of WMs
        (default) we use the so-called
        Jeffreys/Fisher/information-geometry  prior that assigns a
        probability proportional to 1/sqrt(wa*wc*wg*wt) to a WM column
        with probabilit wa, wc, wg, and wt.
-R: Print the locations of the sites counting from the end of each
    sequence. That is, a site at -x means the site starts x bases
    'upstream'.
-N 3: See previous run.
-F backgroundfile: See previous run.
-f YBR093C_al.fna: See previous run.
-o YBR093C.2.out: See previous run.
-t YBR093C.2.track: See previous run.

Command-line of the third run:
../src/phylogibbs -D 1 -L (cer:0.8,par:0.8,mik:0.0.58,kud:0.5,bay:0.45) -m 10 -y 9 -z 3 -N 3 -F backgroundfile -c -1 -f YBR093C_al.fna -o YBR093C.1.out -t YBR093C.1.track

The only new option is:
-c -1: This turns colour-changing moves on. Each cycle there will be
number of colour-change moves that is chosen automatically. This
allows the total number of sites and motifs to fluctuate. A maximum
entropy prior will be used that has the number specified by -y as the
expected total number of sites, and the number specified by -z as the
expected number of motifs.


3) GAT1_regions.fna 
These are dialign alignmens of upstream regions of 7 different genes
that (according to the annotation in Harbison et al., Nature, 2004)
contain a total of 15 sites for GAT1.

2) GAT1_regions.fna
This file contains dialign alignmens of upstream regions from the 5
yeast species mentioned above of 7 different genes that (according to
Harbison et al.) contain a total of 15 sites for GAT1.

Command-line:
../src/phylogibbs -D 1 -L
"(cer:0.8,par:0.8,mik:0.64,kud:0.58,bay:0.5)" -T 0.5 -m 15 -N 3- F
backgroundfile -y 45 -z 3 -f GAT1_regions.fna -o GAT1_regions.out -t GAT1_regions.track


-D 1: Using phylogeny with flexible treatment of the alignment.
-L "(cer:0.8,par:0.8,mik:0.64,kud:0.58,bay:0.5)": The approximate 
   phylogeny of the yeast species.
-T 0.5: The Jeffreys/Fisher/Information-geometry prior on WM-space.
-m 15: We look for largish sites of length 15.
-N 3: Markov model of order 3 for background (see above).
-F backgroundfile: file with sequences to construct background model
   (see above).
-y 45: We search for a total of 45 sites.
-z 3: We allow up to 3 different motifs.
-f GAT1_regions.fna: input file.
-o GAT1_regions.out: output of the anneal.
-t GAT1_regions.track: output from tracking.

4) synthetic.fna
This file contains an alignmnent of 5 synthetic intergenic regions
that are each at a proximity 0.5 from the common ancestor (all bases
are aligned without gaps). There are 4 sites from a randomly generated
WM that are embedded in these sequences. 

Command-line:
../src/phylogibbs -D 1 -L
(seq0:0.7,seq1:0.7,seq2:0.7,seq3:0.7,seq4:0.7) -m 10 -y 4 -z 1 -S 250
-N -1 -q -f synthetic.fna -o synthetic.out -t synthetic.track

-D 1: Use phylogeny flexibly.
-L "(seq0:0.5,seq1:0.5,seq2:0.5,seq3:0.5,seq4:0.5)": all sequences are
   at proximity 0.7 from the common ancestor.
-m 10: The motif we are looking for has width 10.
-y 4: We are looking for 4 sites.
-z 1: We are looking for one motif only.
-S 250: Do 250 cycles as opposed to the default of 100 cycles.
-N -1: Use a background model where each base has probability 1/4 of
       occurring.
-q: Don't print any information to the screen (accept for error
    messages).
-f synthetic.fna: Specifies input file.
-o synthetic.out: Specifies the output file for the anneal results.
-t synthetic.track: Specifies the output file for the tracking results.


DESCRIPTIONS OF THE OUTPUT FILES

We now describe the format of the output files by going over the
examples. We will display lines from the output files and then add
comments to explain their meaning.

1) lexA.out and lexA.track

lexA.out:
--------
From file:
Command-line arguments: -D 0 -m 18 -y 14 -z 1 -f lexA.fna -o lexA.out
-t lexA.track
Comment:
The command-line arguments are listed.

From file:
Seq   0: b1741 Length 229
Seq   1: ftsK Length 134
Seq   2: dinD Length 208
Seq   3: uvrD Length 83
Seq   4: yebB Length 175
Seq   5: lexA Length 109
Seq   6: uvrB Length 300
Seq   7: umuD Length 300
Seq   8: recN Length 85
Comment:
The the input sequences are numbered and their names listed. The
length of each of the input sequences is printed as well.

From file:
GSL Random number seed: 983
Comment:
Reports the random number seed that was used. This mainly has use for
debugging and/or when one needs to reproduce the output of PhyloGibbs
from a particular run.

From file:
No. of moves: colour 0, single window 1582, shift 226, total 1808
Comment:
List the total number of moves of each type that were performed during
the anneal. Since colour-change moves are turned off (default) there
were only single window-switch moves and global shift moves. 

From File:
Log-posterior probability of the reference state: 123.388688
Comment:
The number report here is the difference between the log-posterior
probability of the reference configuration and the log-posterior
probability of the empty configuration in which no windows are
colored.



From file:
Motif 1.
Number of windows = 14  Top window score= 2.06335e-07
Comment:
The .out file specifies the site configuration that is obtained at the
end of annealing, which we call the reference configuration.  These
lines indicate that the first motif in the reference configuration
contains 14 windows (=sites). The window with the highest score in
this motif (color) has a score of 2.06335e-07. The score reported for
each window in the output file is the difference between the
log-posterior probability of the configuration that is obtained when
the reference state is perturbed by uncolorig the window in question
and the log-posterior probability of the reference state. The smaller
the score the 'better': a very small score thus indicates that the
posterior probability of the configuration drops a lot when the window
is uncolored.

In file:
cagtataaaCTGGTTTTATATACAGTAaagaggctg -- [rev]  seq    8        recN
pos    17 score 2.063e-07
Comment:
This line indicates a single site from motif 1. First a segment from
the input sequence is listed with the site appearing in capitals. Then
the orientation of the site is displayed. Next the number and name of
the sequence frm which the site derives are shown. Then the position
at which the site starts is shown. This site thus runs from positions
17 through 34 in sequence 8. Finally the score of the site is shown.
Note that the sites are ordered according to score.

From file:
-------- Weight matrix for this motif (absolute base counts)---------
Comment:
Next follows the weight matrix for this motif, inferred from the
alignment of sites in the motif. The WM format is the same as the
format used by TRANSFAC and MEME. 

From file:
//
NA Motif_1
PO         A          C          G          T       cons         inf
Comment:
the "//" indicates the start of the WM. The line starting with "NA"
indicates the name of this motif. The next line specifies the line
format. Each line shows first the position, then the number of times A
occurs at that position, then the number of times C occurs at that
position, the number of times G occurs at that position, the number of
times T occurs at that position, a consensus base, and the information
score of the base-distribution at this position.

From file:
01      2.00      12.00       0.00       0.00          C        1.41
Comment:
Base counts at position 1 of the site. Nucleotide A appears twice and
C appears 12 times. The consensus nucleotide is C and the information
score (which is 2 minus the entropy of the distribution in bits). 

From file:
//
Comment:
Indicates the end of the weight matrix.


lexA.track
----------
From file:
Average log-posterior probability of sampled configurations:
107.218401
Comment:
Shows the average difference of the log-posterior probabilities of the
site configurations and the empty configuration during the
sampling/tracking phase. 

From File:
Tracking stats motif 1
Comment:
The file now shows posterior probabilities for different sites to
belong to the motifs that occurred in the reference state. These
posterior probabilities are obtained by collecting statistics during a
sampling run.

From file:
attacactcCTGTTAATCCATACAGCAacagtactg -- [fwd]  seq    1        ftsK
pos    40  prob 0.99
Comment:
The line shows a single site and its posterior probability to belong
to motif 1. A segment from sequence 1 is shown with the actual site
shown in capitals. The strand from which the segment derives is shown,
and the number and name of the sequence. The the position at which the
site starts is shown (this site runs from position 40 through 57 in
sequence 1). Finally, the posterior probability of the site belonging
to the motif is shown (0.99 in this case). Note that the sites are
listed in order of decreasing posterior probability.

From file:
catcaccatAATATTTCTGATACAGCGtaaactccg -- [rev]  seq    6        uvrB
pos   179  prob 0.06
Comment:
The last site in this motif that is reported has posterior probability
0.06. By default sites with posterior probability less than 0.05 are
not shown. This can be overruled by using option -E. For instance -E
0.01 would show all sites with posterior probability 0.01 and higher.

From file:
-------- Weight matrix for this motif (absolute base counts)---------
//
NA Motif_1
PO         A          C          G          T       cons         inf
01      1.88      11.37       0.00       0.16          C        1.32
Comment:
Note that in the construction of the WMs in the tracking output file
each site is weighted by its posterior probability. This generally
leads to non-integer counts for the number of bases of each type. Here
there are 11.37 'occurrence' of base C.


YBRC093.1.out
--------------

From file:
aagagATCGCACATGccaaa -- [fwd]  seq    0 YBR093C_cer  pos   801 score 5.148e-10
aagagATCCCACATGtcata |- [fwd]  seq    1 YBR093C_par  pos   801
aagagATTCCACATGtcagc |- [fwd]  seq    2 YBR093C_mik  pos   805
gagatATCCCACATGccaga |- [fwd]  seq    3 YBR093C_kud  pos   807
gacgcATTCTACATGccaga `- [fwd]  seq    4 YBR093C_bay  pos   807
Comment:
Since the input file now contains a dialign alignment of 5 sequences,
a single window may now span sequences of up to 5 species. The 'site'
displayed here is an alignment of 5 putative sites from an aligned
block of all 5 yeast species. The positions indicate where in each of
the sequences the sites start. Note also that there is a single score
for the entire window.

From file:
-------- Weight matrix for this motif (absolute base counts)---------
//
NA Motif_1
PO         A          C          G          T       cons         inf
01     11.18       0.00       2.28       0.00          A        1.34
Comment:
When using phylogenetic scoring on aligned sequences (-D 1 or -D 2)
the base counts in the weight matrix are generally
non-integer. Roughly speaking, for each group of phylogenetically
related sequences, the observed (evolutionarily correlated) bases are
approximated by a non-integer number of independent bases. For
example, 3 Cs and 2 Gs in an aligned column of bases from the 5
species might correspond to 1.8 Cs and 1.2 Gs.  

YBRC093.2.out
--------------

From file:
Motif 1.
Number of windows = 5  Top window score= 2.20653e-10

Motif 2.
Number of windows = 2  Top window score= 5.79989e-09

Motif 3.
Number of windows = 2  Top window score= 9.91445e-09

Comment:
In this run we set -y 9 and -z 3. We thus asked for up to three moitfs
and a total of 9 sites. Notice that 5 sites have gone in the first
motif, 2 in the second, and 2 in the third. The motifs are ordered by
the score of the top scoring window in the motif.


YBR093C.2.track
---------------

From file:
aaattAGCACGTTTTcgcat -- [fwd]  seq    0 YBR093C_cer  pos  -364  prob 0.38
aaattAGCACGTTTTcgcat |- [fwd]  seq    1 YBR093C_par  pos  -365
gatttAGCACGTTTTcgcat |- [fwd]  seq    2 YBR093C_mik  pos  -362
aaattAGCACGTTTTtcaca |- [fwd]  seq    3 YBR093C_kud  pos  -352
aaattGGCACGTTTTctcat `- [fwd]  seq    4 YBR093C_bay  pos  -359
Comment:
Note first that the location of the site is now given with respect to
the end of the sequence. That is, in cerevisiae the site runs from 364
to 355 bases upstream of translation start of the gene YBR093C. 
Note also that this site from the tracking results of motif 1 does not
occur in this motif in the reference state but apparently associates
with the motif during tracking almost 40% of the time. 

YBR093C.3.out
-------------

From file:
Motif 1.
Number of windows = 5  Top window score= 5.14843e-10

Motif 2.
Number of windows = 5  Top window score= 1.01266e-08

Motif 3.
Number of windows = 3  Top window score= 3.34384e-08

Comment:
Because we allow colour-changing moves (-c -1) the number of sites
specified by -y 9 and the number of motifs -z 3 are only treated as
guesses. At the end of the annealing the reference state indeed has 3
motifs but there is a total of 13 sites.


GAT1_regions.out
----------------

From file:
Seq   0: >Scer_YDL237W      Length 178
Seq   1: Spar_2881    Length 178
Seq   2: Sbay_Contig5 Length 203
Seq   3: Smik_Contig2 Length 107
Seq   4: Skud_Contig1 Length 170
Seq   5: >Scer_YEL062W      Length 999
Seq   6: Spar_5973    Length 999
Seq   7: Sbay_Contig6 Length 999
Seq   8: Smik_Contig2 Length 844
Seq   9: Skud_Contig1 Length 999
Comment:
In this input file the sequence come in groups of aligned
sequences. The sequence at the start of each group is indicated by the
">" at the start of the sequence's name. 

From file:
Motif 1.
Number of windows = 22  Top window score= 1.11454e-10

Motif 2.
Number of windows = 12  Top window score= 6.5889e-10

Motif 3.
Number of windows = 11  Top window score= 2.85084e-09

Comment:
The 45 sites (-y 45) to be assigned to at most three motifs (-z 3)
have been distributed 22 to the first motif, 12 to the second motif,
and 11 to the third motif.

From file:
-------- Weight matrix for this motif (absolute base counts)---------
//
NA Motif_2
PO         A          C          G          T       cons         inf
01      5.90       5.31       3.29      11.86          H        0.16
02      5.97       0.97       5.00      15.87          T        0.45
03      4.86       2.89       9.54      11.73          D        0.18
04      6.49       3.76       9.25       7.31          N        0.07
05      9.32       0.00       7.39       8.91          D        0.42
06      1.13      10.60       0.00      15.72          T        0.82
07      0.00      17.67       3.84       5.50          C        0.73
08      0.00       0.97       0.00      23.93          T        1.76
09      0.00       0.98       0.00      24.16          T        1.76
10     24.53       0.00       0.00       0.00          A        2.00
11      0.00       0.00       0.00      24.53          T        2.00
12      0.00      24.34       1.15       0.00          C        1.73
13      0.00       0.00       2.64      21.89          T        1.51
14      4.00       5.49       1.93      16.27          T        0.42
15      7.53       3.38       8.83       9.43          D        0.09
//

Comment:
Note that the reverse-complement of this weight matrix contains the
consensus sequence GATAAG which matches the known binding motif of
GAT1. 

GAT1_region.track
-----------------

From file:
== Posterior probabilities obtained through tracking the reference state. ==

Tracking stats motif 2
--------------
ctgttttAAAATCCTTATCTTGtctcctt -- [fwd]  seq    0 Scer_YDL237  pos 49  prob 1.00
ctgttttGGGTTCCTTATCTTGgctcttt |- [fwd]  seq    1  Spar_2881   pos 48
acattttAGATTTCTTATCTTTctccctt |- [fwd]  seq    2 Sbay_Contig  pos 69
ttgcttcACGTGTCTTATCTCGcttcttt `- [fwd]  seq    4 Skud_Contig  pos 37
ccgctgaTGTACTTATCTGTGAttggtct -- [rev]  seq    1  Spar_2881   pos 147  prob 1.00
gttacttATTTCTTATCTTGGTttgatct |- [rev]  seq    2 Sbay_Contig  pos 172
ccaccaaAGTCCTTATCTTGGTttggcct |- [rev]  seq    3 Smik_Contig  pos 76
acgctgaAGTTCTTATCTAGATttgacct `- [rev]  seq    4 Skud_Contig  pos 139

Comment:
Note that while the motif was only the second motif in the output file
(because there was another motif with a better scoring top window),
since the highest posterior probability is higher in this motif it
appears first in the track file. 

From file:
-------- Weight matrix for this motif (absolute base counts)---------
//
NA Motif_2
PO         A          C          G          T       cons         inf
01      7.96       4.91       3.15       6.77          N        0.08
02      5.58       4.67       6.08       7.87          N        0.03
03      5.24       2.96       5.95       9.64          D        0.12
04      6.46       5.49       2.83       9.04          H        0.11
05      6.52       6.03       2.32       7.10          H        0.10
06      0.65       4.34       2.19      14.80          T        0.67
07      0.18      13.74       0.45       7.43          C        0.88
08      6.66       0.40       0.92      13.01          T        0.74
09      0.00       1.10       0.00      20.15          T        1.71
10     15.07       4.80       0.42       0.91          A        0.86
11      1.23       0.11       0.33      19.21          T        1.51
12      2.51      15.44       2.07       2.32          C        0.62
13      0.82       0.37       7.56      12.97          T        0.75
14      5.59       5.06       3.14       8.71          N        0.09
15      7.18       3.93       2.41       9.47          H        0.17
//
Comment:
From the information score profile of the weigh matrix one would guess
that the true binding site might only be 8 basepairs wide and have the
consensus T(C/t)(T/a)TATC(T/g) or (A/c)GATA(A/t)(G/a)A.


synthetic.out
-------------

From file:
Seq   0: seq0 Nseqs 5 Lseq 500 nwm 1 wwidth 10 polr 3.07710921155829
mutrate 0.3 consensus ['aAT*tcACgc'] wmpos [[92,194,296,398]] Length 500
Seq   1: seq1 Nseqs 5 Lseq 500 nwm 1 wwidth 10 polr 3.07710921155829
mutrate 0.3 consensus ['aAT*tcACgc'] wmpos [[92,194,296,398]] Length
500
Seq   2: seq2 Nseqs 5 Lseq 500 nwm 1 wwidth 10 polr 3.07710921155829
mutrate 0.3 consensus ['aAT*tcACgc'] wmpos [[92,194,296,398]] Length
500
Seq   3: seq3 Nseqs 5 Lseq 500 nwm 1 wwidth 10 polr 3.07710921155829
mutrate 0.3 consensus ['aAT*tcACgc'] wmpos [[92,194,296,398]] Length
500
Seq   4: seq4 Nseqs 5 Lseq 500 nwm 1 wwidth 10 polr 3.07710921155829
mutrate 0.3 consensus ['aAT*tcACgc'] wmpos [[92,194,296,398]] Length 500

Comment:
The sequences in this file were generated synthetically and the names
indicate some of the parameters. Each sequence is length 500. All
sequences have a 'distance' 0.3 to the common ancestor (meaning that
the proximity is 0.7). A random weight matrix was generated of length 10. This
weight matrix is quite fuzzy and has consensus aAT*tcACgc. Four sites
were sampled from this WM and were embedded in the sequences starting
at positions 92, 194, 296 and 398.

From file:
taataCCCTTGGATCtccta -- [rev]  seq    0       seq0   pos   398 score 0.005156
tacccCAGAGAAAATtgcca |- [rev]  seq    1       seq1   pos   398
atacaCAGTGAGATTgtcct |- [rev]  seq    2       seq2   pos   398
aattgCGGTCAGATCgaata |- [rev]  seq    3       seq3   pos   398
aagcaGGGTGACCACcgcct `- [rev]  seq    4       seq4   pos   398
gctacGAGTTTGATTtttaa -- [fwd]  seq    0       seq0   pos   432 score 0.009175
tatgtACTTTTCCTGttggt |- [fwd]  seq    1       seq1   pos   432
agacgCCTCTATAAGaccgg |- [fwd]  seq    2       seq2   pos   432
gtcctAAGACACAACctaaa |- [fwd]  seq    3       seq3   pos   432
gcctgCTTTGAACTGatgat `- [fwd]  seq    4       seq4   pos   432
aacccCGTTCAAATGatgag -- [rev]  seq    0       seq0   pos   194 score 0.01101
ctccaACGTCATATCgttta |- [rev]  seq    1       seq1   pos   194
tacaaCAGATAGATCtgtag |- [rev]  seq    2       seq2   pos   194
gcttcAATTGTCATCcttgg |- [rev]  seq    3       seq3   pos   194
gcatcTCGTCAGATTgttgg `- [rev]  seq    4       seq4   pos   194
catccCCTATAGATCagtta -- [rev]  seq    0       seq0   pos    92 score 0.02226
tatcaGGTTTTTTGCtggta |- [rev]  seq    1       seq1   pos    92
ggtgaGAGTGGGATTcacta |- [rev]  seq    2       seq2   pos    92
acgtcGATTTCTCATcggtt |- [rev]  seq    3       seq3   pos    92
cagtaCAGCGCTCATtactt `- [rev]  seq    4       seq4   pos    92

Comment: 
The motif that the algorithm contains 3 of the 4 'true' sites.


synthetic.track
---------------

From file:
Tracking stats motif 1
--------------
aacccCGTTCAAATGatgag -- [rev]  seq    0       seq0   pos   194  prob 0.21
ctccaACGTCATATCgttta |- [rev]  seq    1       seq1   pos   194
tacaaCAGATAGATCtgtag |- [rev]  seq    2       seq2   pos   194
gcttcAATTGTCATCcttgg |- [rev]  seq    3       seq3   pos   194
gcatcTCGTCAGATTgttgg `- [rev]  seq    4       seq4   pos   194
catccCCTATAGATCagtta -- [rev]  seq    0       seq0   pos    92  prob 0.18
tatcaGGTTTTTTGCtggta |- [rev]  seq    1       seq1   pos    92
ggtgaGAGTGGGATTcacta |- [rev]  seq    2       seq2   pos    92
acgtcGATTTCTCATcggtt |- [rev]  seq    3       seq3   pos    92
cagtaCAGCGCTCATtactt `- [rev]  seq    4       seq4   pos    92
taataCCCTTGGATCtccta -- [rev]  seq    0       seq0   pos   398  prob 0.16
tacccCAGAGAAAATtgcca |- [rev]  seq    1       seq1   pos   398
atacaCAGTGAGATTgtcct |- [rev]  seq    2       seq2   pos   398
aattgCGGTCAGATCgaata |- [rev]  seq    3       seq3   pos   398
aagcaGGGTGACCACcgcct `- [rev]  seq    4       seq4   pos   398
gctacGAGTTTGATTtttaa -- [fwd]  seq    0       seq0   pos   432  prob 0.10
tatgtACTTTTCCTGttggt |- [fwd]  seq    1       seq1   pos   432
agacgCCTCTATAAGaccgg |- [fwd]  seq    2       seq2   pos   432
gtcctAAGACACAACctaaa |- [fwd]  seq    3       seq3   pos   432
gcctgCTTTGAACTGatgat `- [fwd]  seq    4       seq4   pos   432

Comment:
The 3 truee sites have the higest posterior probabilities. Note,
however, that the posterior probabilities are rather low. Even the
highest scoring site has an estimated probability of only 21% to be a
'true' site. 
