We have carried out a comprehensive analysis of the occurrence of
pseudogenes (disabled copies of genes) in a diverse selection of 64
prokaryote genomes. We find a total of ~7000 candidate prokaryotic
pseudogenes. Moreover, in all the genomes surveyed, pseudogenes occur
in at least 1 to 5% of all gene-like sequences, with some genomes
having considerably higher occurrence. The relevant data and texts
can be found here.
Downloadable Files
-
Complete list
of prokaryote pseudogenes
This is simple, tab-delimited data file. The fields are as follows:
- Kingdom
- Organism
- Chromosome ID
- Starting coordinate of pseudogene
- Ending coordinate of pseudogene
- Strand
- Swiss-Prot ID of closest homologue
- E-value
- Percent identity
- Matching length
- First residue of the matching region in the closest homologue
- Last residue of the matching region in the closest homologue
- Translated sequence of pseudogenes
- Matched region of the closest homologue
- DNA sequence of pseudogene
-
Directory of associated chromosomes sequences
These are the original genome sequences used for the analysis. The
references for them are given in the paper. The coordinates in the
above pseudogene list (e.g. in fields 4 and 5) should synch perfectly
with these files. The files are stored as simple gzipped text files
using a naming convention based on the organism name in field 2 of the
above file, with all lowercase letters and with spaces and punctuation
changed to dashes. For instance, the file for "Escherichia coli
O157:H7" is called Escherichia_coli_O157:H7_EDL933 __complete_genome.fasta.
|
Associated Publications
- Comprehensive analysis of pseudogenes in prokaryotes reveals widespread evidence of gene decay and failed horizontal-transfer events
Liu et al. Genome Biol (2004)
-
A "polyORFomic" analysis of prokaryote genomes using disabled-homology filtering reveals conserved but undiscovered short ORFs.
Harrison et al. JMB (2003)
|