Zheng D., Frankish A. et al., "Pseudogenes in the ENCODE Regions: Consensus Annotation, Analysis of Transcription and Evolution"
All 201 consensus pseudogenes in the ENCODE regions -- present in gtf format.
A table of 201 consensus pseudogenes with links to all ENCODE data in UCSC genome browser. All data described in this manuscript are available at UCSC browser, including annotation of the five individual methods, transcription, ChIP-chip, MSA and variation data.
A list of pseudogenes with reasons identified only through manual annotation by the HAVANA team.
FigS1, supplement of Fig 4. Sequence preservation of human genomic components in other species. The number of human pseudogenes (or genes, exons, introns) with orthologous sequences in each species was computed and then plotted after normalization with the total number in human. NPS and PS stand for non-processed and processed pseudogenes, respectively.
FigS2, supplement of Fig 5A. Sequence identities between human genomic components and their orthologs. The orthologous sequences of each human genomic components (pseudogenes, genes, introns, exons, upstream (Up) and downstream (Down) 1kb sequences) were retrieved from MSA data and then pair-wise nucleotide sequence identity was calculated. Shown here are the means for each type of components. A line representing the data expected from neutral evolution is also shown.
FigS3, a screen of ENCODE pseudogene track in the UCSC browser.
Table S1 Detection of pseudogene orthologs and assessment of their protein coding potential. For each pseudogene, its orthologous sequences were retrieved and compared to the parent protein using sequence alignment programs GeneWise of FASTA. The resulting alignments were examined for disablements (nonsense and frameshift mutations). 0, orthologous sequence is missing; 1, orthologous sequence is detected but without disablements; 2, disabled orthologous sequence is detected.