ENCODE Pseudogenes
Status on 6-Oct
We have a list of pseudogenes from four research groups. These pseudogenes are our starting point at this moment. We agree to add other pseudogenes later.
| GIS | HAVAVA | UCSC | Yale |
GIS | 46 | 42 | 45 | 39 |
HAVANA | 42 | 165 | 104 | 132 |
UCSC | 45 | 106 | 163 | 105 |
Yale | 39 | 135 | 104 | 167 |
Roadmap of generating a concensus pseudogene annotation for the ENCODE regions
Step I -- filter the above lists to remove pseudogenes overlapping with current GENCODE coding exons /loci. Pseudogenes overlaping with introns or noncoding genes will be kept.
Following are the filtered pseudogenes -- i.e., those overlapping with exons of Known_genes have been removed (except for HAVAVA list):
| GIS | HAVAVA | UCSC | Yale |
GIS | 45 | 44 | 42 | 38 |
HAVANA | 44 | 185 | 113 | 144 |
UCSC | 42 | 113 | 138 | 97 |
Yale | 38 | 144 | 95 | 156 |
Step II -- take a union of the above pseudogenes. Where a pseudogenic region is annotated by more than one group, the boundary represents the smallest start and the largest end.
222 union pseudogenes.
Step III -- Assign a parent protein for each pseudogene in the union using a protein set from the UniProt. Pseudogenes without a matching protein are excluded.
The protein set -- SPROT TREMBL.
198 updated pseudogenes with their parent proteins identified.
Step IV -- re-align each pseudogene to its parent protein.
Resulting Alignments
Step V -- update consensus list of pseudogenes with boundaries derived from the alignment in Step IV.
The 198 concensus pseudogenes (differs from the above in terms of boundaries)
A table showing that these consensus pseudogenes intersect with 43 GIS, 177 HAVANA, 128 UCSC-retro, 146 UCSC-duplicated, 152 YALE pseudogenes (and 19 GENCODE exons).
Step VI -- The updated consensue list of pseudogenes with their assigned parent proteins and new classification (processed or non-processed).
Consensus pseudogenes with classification in ENCODE coordinates.
Consensus pseudogenes with classification in hg17 chromosome coordinates.
Alignments of consensus pseudogenes with their parent proteins.