Yale Gerstein Lab




Prokaryote Pseudogenes

We have carried out a comprehensive analysis of the occurrence of pseudogenes (disabled copies of genes) in a diverse selection of 64 prokaryote genomes. We find a total of ~7000 candidate prokaryotic pseudogenes. Moreover, in all the genomes surveyed, pseudogenes occur in at least 1 to 5% of all gene-like sequences, with some genomes having considerably higher occurrence. The relevant data and texts can be found here.


Downloadable Files

  • Complete list of prokaryote pseudogenes
    This is simple, tab-delimited data file. The fields are as follows:
    1. Kingdom
    2. Organism
    3. Chromosome ID
    4. Starting coordinate of pseudogene
    5. Ending coordinate of pseudogene
    6. Strand
    7. Swiss-Prot ID of closest homologue
    8. E-value
    9. Percent identity
    10. Matching length
    11. First residue of the matching region in the closest homologue
    12. Last residue of the matching region in the closest homologue
    13. Translated sequence of pseudogenes
    14. Matched region of the closest homologue
    15. DNA sequence of pseudogene

  • Directory of associated chromosomes sequences
    These are the original genome sequences used for the analysis. The references for them are given in the paper. The coordinates in the above pseudogene list (e.g. in fields 4 and 5) should synch perfectly with these files. The files are stored as simple gzipped text files using a naming convention based on the organism name in field 2 of the above file, with all lowercase letters and with spaces and punctuation changed to dashes. For instance, the file for "Escherichia coli O157:H7" is called Escherichia_coli_O157:H7_EDL933 __complete_genome.fasta.





Associated Publications

  • Comprehensive analysis of pseudogenes in prokaryotes reveals widespread evidence of gene decay and failed horizontal-transfer events Liu et al. Genome Biol (2004)
  • A "polyORFomic" analysis of prokaryote genomes using disabled-homology filtering reveals conserved but undiscovered short ORFs. Harrison et al. JMB (2003)