Date: Fri, 30 Dec 2005 05:50:32 -0500 Organization: Yale Bioinformatics (http://bioinfo.mbb.yale.edu) Subject: Summary of pseudogene call on Wed. 21-Dec at 11 AM EDT [PGENE] Hi, I very much enjoyed the call before the holidays. I hope everyone is enjoying their holiday time. Below I have attached a summary of the last call . cheers, marK ## ## Summary of 21-Dec-05 Call ## * Main topics of discussion 1. Understanding Alex's RACE results 2. Deyou's MSA analysis and how it can be integrated into either G&T or MSA paper. 3. What is the list of pgenes that will be presented to the world as a freeze on a certain date and a time framework to keep to a scheduled paper publication date in March 06. ** RACE * Results Alex completed RACE experiments of the four pools A,B,C and D. (See attached email below.) Pool A covers pseudogenes for which specific primers could be designed. So analysis of Pool A gives the cleanest interpretable results with no gene overlap and cross hyb issues. Pool C is done with primers that could potentially prime with parental gene if ENCODE array has the parent gene. From the initial analysis, it looks like a very small number of pgenes are transcribed. For starters, the analysis of pool A will provide the minimal set of transcribed pgenes which will then have to be studied carefully. So in a nutshell we have to do the following: a. Generate a list of pgene RACE frags; are there probes in region that are highly likely to be transcribed? b. In addition to pool A, pools B and D have to be analyzed. Processing steps will include looking for extensions say 5kB away from the primer and see if there are probes in the region unique to a pseudogene. c. Clone the RACE products and sequence to unambiguously verify the transcribed pgene. * What do we need from Alex/Phil? 1. The exact list of pgenes that were picked for the RACE experiments. 2. File containing the pseudomedians needed from Phil * ToDos France will do an analysis of the race frags similar to her analysis on genes. Deyou will look in some detail at race frag results after we can correctly interpret them. ** snoRNAs Tom introduced the topic of snoRNAs and pgenes. He suggested that we look to see if there were snoRNA in non-processed pgenes. Potentially the role of pgenes could be to replicate elements such as snoRNA ( in introns) rather than protein coding diversity. See http://lowelab.ucsc.edu/snoRNAdb/ . ** Deyou's MSA analysis Indicates that most pgenes have been created after mouse/human split. (See pdf in ppt directory.) Not clear if this would go in G&T or MSA paper or in a future paper... ** Future Paper It was agreed that while it may be included in one of the ENCODE papers, an expanded detailed paper would be written later. ** List of pgenes to be presented to the outside world It was agreed that we stick with the list of pgenes that have been looked over by Adam (wherever there was no consensus). The only remaining bit is for Adam to look at the pgenes that are specific to Yontao's analysis and include them after manual inspection. This will be the list that will be put out for the general public. * So Adam has the following chores: 1. Include Yontao's pgenes in current set after careful inspection. 2. Associate each pgene with its parent gene and location of parent gene on the genome. 3. Tag each pgene as either processed or non-processed. ** Timeline 1. A final set of pgenes that does not differentiate between GENCODE and ENCODE pgenes should be ready by Jan 7. 2. France will redo analysis of transfrags and tars shortly thereafter. 3. A rough draft of paper should be in works by Jan.7 4. This gives one month, the month of Feb to polish up the paper. 5. Paper ready to go out by March. -------- Original Message -------- Subject: FW: pooling pseudogenes for Affy Subject: pooling pseudogenes for Affy Dear Philipp, Tom, We are at the finishing stage with the 5'pseudoRACE in 12 tissues. I will bring the samples with me next week and drop them on Wednesday at your lab before moving to USCS, thus it is time to think about the best way to pool the pseudoRACE fragments. This is what I propose: We have tested 49 non-processed pseudogenes, 26 with specific primers (5 to 14 mismatches/25mer, between pseudogene and parental gene) and 23 with "common primers (0 to 3 mismatches/25mer, between pseudogene and parental gene). Similarly, we have tested 109 processed pseudogenes (in fact we test 125 but sometimes we are dealing with primers recognizing multiple pseudogenes), 25 with specific primers (7 to 14 mismatches/25mer, between pseudogene and parental gene) and 84 with "common primers (0 to 3 mismatches/25mer, between pseudogene and parental gene). I suggest: 1. to pool all the specific reactions (26 non-processed and 25 processed) 2. to pool the "common" non-processed (23 reactions; 8 with no mismatch, 7 with 1 mismatch, 7 with 2 mismatches and 1 with 3 mismatches) 3. to separate in two pools the "common" processed (total of 84 reactions, in the first pool of 48, the reactions done with primers with no mismatches, in the second pool the 36 reactions with mismatches (1 to 3 mismatches) What do you think of this conservative stance? We are looking at a total of 48 ENCODE affymetrix chips. Alexandre