Untitled Document

ID: unique identifier for each processed pseudogene in the format of chr$a_$b.$c where $a is chromosome name, $b is Swissprot/Trembl Protein Accession Number, and $c is the sequential numering of the pseudogene that matches protein $b on chromosome $a. Example: chr1_P02404.1.
Short_ID: short version of pseudogene ID in the fomat of $a_$b where $a is the Swissprot protein acession number and $b is the sequential numbering of the pseudogene that matches protein $a in the whole genome. Example P02404_1.
Chr: chromosome name.
Chrom_start: starting coordiante of the pseudogene on the chromosome, based on the Build 28 of the GoldebPath assemble.
Chrom_end: end coordinate of the pseudogene on the chromosome.
Chrom_strand: "-" or "+"
Cytogenic_band: chromosomal band as predicted by Ensembl. Example: "1p36.33".
Query_protein: Accession number of the cloest match protein in Swissprot/TrEmbl.
Query_start: starting amino acid number on the query protein that the pseudogene matches.
Query_end: end amino acid number on the query protein that the pseudogene matches.
Query_len: Sequence length of the query protein in cloumn 8.
Completeness: sequence completeness of the pseudogene compared with the query protein.
E-value: Expect value of the pseudogene in the TBLASTX search.
AA_ident: amino acid sequence identity between the pseudogene and query protein.
DNA_ident: nucleotide sequence identity between the pseudogene and the query protein, coding region only. Some query proteins don't have coding sequence available.
Polya: "0" or "1" or "2" or "3".
- "0" : no polyA tail ( > 30 A in 50 bp window) detected of the pseudogene.
- "1" : has polyA tail and also polyadenilation signal with 50 bp of the begining of the tail
- "2" : has polyA tail and polyadenilation signal within 50-100 bp of the begining of the tail
- "3": has polyA tail but no polyadenilation detected.
Disable: "0" or "d" or "D". "0" indicates no disablement (only for RP pseudogenes). "d" indicates disablement in a region of low sequence identity. "D" indicates disablement in region of high sequence identity.
GC_Pgene: GC content of the pseudogene sequence
GC_Isochore: GC content of the 100K bp window on the chromosome.
Isochore_class: isochore class where the pseudogene resides. L1, L2, HJ1, H2 H3
Kimura_Distance: Evolution distance of the pseudogene sequence from the present day sequence.
Class: "PSSD1" indicates "true" processed pseudogenes. "PSSD2" indicates putative processed pseudogenes.
Comment: cytoplasmic ribosomal protein pseudogenes are labeled as "RP".
Protein_name: "Protein name" field of the query protein in the Swissprot/TrEmbl.
Gene_name: "Gene name" field of the query protein in the Swissprot/TrEmbl.
MIM: Entry of the query protein in the MIM database (Mendelian Inheritance in Man).

These files contain multiple-sequence, FASTA format, nucleotide seuqunces of the annotated processed pseudogenes.
Each pseudogene entry has 2 lines. The header line begining with ">", followed with a unique pseudogene ID (field 1 in the corresponding .gff annotation file). Some other attributes of the pseudogene are also provided on the header line including "Chrom", "Chrom-start", "Chrom_end", "Strand", "band", "Query_protein", "Query_start", "Query_end", "Queyr_len", "Class_new", "Comment" and "Short_ID". Definition of the attributes can be found from above.

These files contain multiple-sequence, FASTA format, predicted amino acid seuqunces of the annotated processed pseudogenes.
Each pseudogene entry has 3lines. The header line begining with ">", followed with a unique pseudogene ID (field 1 in the corresponding .gff annotation file). Some other attributes of the pseudogene are also provided on the header line including "Chrom", "Chrom-start", "Chrom_end", "Strand", "band", "Query_protein", "Query_start", "Query_end", "Queyr_len", "Class_new", "Comment" and "Short_ID". Definition of the attributes can be found from above.
Second line is the amino acid sequence of the query protein, the third line is the predicted amino acid sequence of the pseudogene. Frameshifts are indicated as "\" or "/", stop codons are indicated as "X", gaps are shown as "-".