Information  
 
About MisPred  
Statistics  
Gencode info  
Publications  
Useful links  
Contacts  
   
MisPred Database  
 
Search MisPred  
Analyze Your Sequence  
   
 

Analysis of Gencode sequences using MisPred tools



Gencode is a sub-project of Encode. The overall goal of the proposal is to identify all protein-coding genes in the regions of the human genome selected within the Encode project.

Conflict 1

AC110015.1-002 contains an extracellular Cadherin domain but lacks signal peptide or transmembrane domains. AC110015.1-002 is a short (N-terminally and C-terminally truncated) fragment of CADH2_human (that is why it lacks both the N-terminal secretory signal peptide and transmembrane domain of the protein).

Conflict 2

We have identified no sequence based on conflict 2.

Conflict 3

We have identified no sequence based on conflict 3.

Conflict 4

In order to identify suspicious Gencode sequences (with domains deviating from the ‘normal’ size) we focused on those Pfam-A domains (29,561 domains belonging to 2,752 PfamA families) that are present in human Swiss-Prot proteins (altogether 11,384 proteins) as we deemed Swiss-Prot the most reliable protein database. First, we have determined the "normal size-range" of complete Pfam-A domains predicted in the best-curated Swiss-Prot entries and used these criteria to find domains/proteins that significantly deviate from these values. In order to achieve this we removed incomplete domains: i.e. domains whose beginning or end was missing in their first or last 5 positions in the multiple alignments provided by Pfam for each domain, furthermore the lengths of which differ considerably (by at least 2 standard deviations) from the PfamA seed averages.

In the next step we ran a Blastp search with the Gencode sequences as queries against the 16,465 human Swissprot PfamA domains. If the domain match with an Ensembl protein was considerably shorter (in this conservative approach the cutoff was set to 40% of the actual domain length) or longer than the actual domain the Gencode protein was considered suspicious. With this approach we identified 67 Gencode proteins with domains of suspicious length. The results of these analyses are summarized in Table 10.

It should be noted that most of the Gencode sequences identified by conflict 4 are known to have no start and/or no stop codon (cf. Table 10, 4th column), a reflection of the fact that the corresponding transcripts are not full length. In most cases the deviant domains are N-terminally and/or C-terminally truncated as a consequence of the incompleteness of the transcripts (cf. Table 10, 5th column). In the present set of sequences we identified only three cases where the domain deviates from normal size as a result of internal deletion. Further analysis of these cases suggests that the deletions are incompatible with the viability of the protein, suggesting that the transcripts arise through aberrant splicing.

Table 10.  Domain size deviation in Gencode sequences

Gencode sequence Swiss-Prot entry containing
best matching Pfam domain
Deviant domain Comment in Gencode Deviation
AC004041.1-003 RAD50_HUMAN/3-1298 PF02463 [no_stop] N-terminal
AC004080.4-005 HXA9_HUMAN/1-193 PF04617   N-terminal
AC004500.4-006 AFF1_HUMAN/8-1208 PF05110 [no_start] [no_stop] N- and C-terminal
AC006985.9-003 UD16_HUMAN/27-524 PF00201   N-terminal
AC008440.4-004 MYADM_HUMAN/168-319 PF01284 [no_stop] C-terminal
AC008440.4-005 MYADM_HUMAN/168-319 PF01284 [no_stop] C-terminal
AC008440.4-006 MYADM_HUMAN/168-319 PF01284 [no_stop] C-terminal
AC009404.1-003 DDX18_HUMAN/203-374 PF00270 [no_start] [no_stop] N-terminal
AC011330.7-003 KCRU_HUMAN/148-414 PF00217 [no_stop] C-terminal
AC012314.13-005 SEN34_HUMAN/217-308 PF01974 [no_stop] C-terminal
AC012314.13-006 SEN34_HUMAN/217-308 PF01974 [no_stop] C-terminal
AC012314.9-004 CNOT3_HUMAN/580-748 PF04153   C-terminal
AC012314.9-008 CNOT3_HUMAN/2-242 PF04065 [no_start] [no_stop] N-terminal
AC015691.9-002 TRIM6_HUMAN/92-133 PF00643   internal
AC051649.4-011 TNNT3_HUMAN/72-214 PF00992 [no_stop] N- and C-terminal
AC051649.4-012 TNNT3_HUMAN/72-214 PF00992 [no_stop] N- and C-terminal
AC053503.5-009 DESM_HUMAN/106-414 PF00038 [no_start] N- and C-terminal
AC068580.4-002 CATD_HUMAN/78-409 PF00026 [no_stop] C-terminal
AC104389.18-002 HBD_HUMAN/7-141 PF00042 [no_stop] C-terminal
AC121336.1-002 MYCT_HUMAN/65-590 PF00083   C-terminal
AC132217.7-005 TY3H_HUMAN/195-526 PF00351 [no_start] [no_stop] N- and C-terminal
AF121897.3-002 SH3BG_HUMAN/64-161 PF04908 [no_start] [no_stop] N-terminal
AF121897.3-006 SH3BG_HUMAN/64-161 PF04908 [no_start] [no_stop] N-terminal
AF121897.3-007 SH3BG_HUMAN/64-161 PF04908 [no_start] [no_stop] N-terminal
AF277315.16-002 G6PD_HUMAN/211-505 PF02781 [no_stop] C-terminal
AF277315.16-005 G6PD_HUMAN/211-505 PF02781 [no_stop] C-terminal
AF277315.16-006 G6PD_HUMAN/211-505 PF02781   C-terminal
AF277315.16-007 G6PD_HUMAN/211-505 PF02781 [no_stop] C-terminal
AP000275.63-002 SYNJ1_HUMAN/535-866 PF03372 [no_stop] C-terminal
AP000279.68-004 GCFC_HUMAN/389-892 PF07842   C-terminal
AP000302.60-008 PUR2_HUMAN/105-298 PF01071 [no_stop] C-terminal
AP000302.60-010 PUR2_HUMAN/105-298 PF01071 [no_stop] C-terminal
AP000304.10-013 QORL_HUMAN/144-309 PF00107 [no_stop] C-terminal
AP000313.6-005 ITSN1_HUMAN/1241-1422 PF00621 [no_start] [no_stop] N-terminal
AP001462.1-004 MEN1_HUMAN/2-615 PF05053 [no_stop] C-terminal
AP001462.1-005 MEN1_HUMAN/2-615 PF05053 [no_stop] C-terminal
AP001462.1-006 MEN1_HUMAN/2-615 PF05053 [no_stop] C-terminal
AP001462.1-009 MEN1_HUMAN/2-615 PF05053 [no_stop] C-terminal
AP001462.1-010 MEN1_HUMAN/2-615 PF05053 [no_stop] C-terminal
AP001462.3-016 GRP3_HUMAN/146-332 PF00617 [no_stop] C-terminal
AP006216.3-003 ZPR1_HUMAN/48-207 PF03367 [no_start] [no_stop] N-terminal
AP006216.7-002 APOC3_HUMAN/1-91 PF05778 [no_stop] C-terminal
LA16c-OS12.1-016 LUC7L_HUMAN/4-271 PF03194 [no_start] N-terminal
LL22NC03-28H9.1-008 SYN3_HUMAN/86-191 PF02078 [no_stop] C-terminal
RP11-115M6.1-002 EM55_HUMAN/162-226 PF07653 [no_stop] C-terminal
RP11-126K1.5-010 RFX5_HUMAN/76-158 PF02257 [no_start] N-terminal
RP11-247A12.4-008 PTPA_HUMAN/25-358 PF03095 [no_stop] C-terminal
RP11-247A12.5-001 CACP_HUMAN/34-616 PF00755   internal
RP11-247A12.5-007 CACP_HUMAN/34-616 PF00755 [no_stop] N-terminal
RP11-298J23.1-003 PEPC_HUMAN/72-387 PF00026 [no_stop] N- and C-terminal
RP1-149A16.10-011 BPIL2_HUMAN/38-217 PF01273   C-terminal
RP11-505P4.2-015 EF1A1_HUMAN/5-239 PF00009   C-terminal
RP11-517O1.1-013 STAG2_HUMAN/154-273 PF08514 [no_stop] C-terminal
RP11-517O1.1-014 STAG2_HUMAN/154-273 PF08514 [no_stop] C-terminal
RP1-309K20.1-006 NFS1_HUMAN/59-422 PF00266   C-terminal
RP1-309K20.2-014 CPNE1_HUMAN/304-451 PF07002 [no_stop] C-terminal
RP1-309K20.2-015 CPNE1_HUMAN/304-451 PF07002 [no_start] [no_stop] N-terminal
RP4-614O4.1-008 IF6_HUMAN/3-204 PF01912 [no_stop] C-terminal
RP4-614O4.9-002 CT128_HUMAN/9-293 PF07894   N-terminal