Analysis of Gencode sequences using MisPred tools
Gencode is a sub-project of Encode.
The overall goal of the proposal is to identify all protein-coding genes in the regions of the human genome selected within the Encode project.
Conflict 1
AC110015.1-002 contains an extracellular Cadherin domain but lacks signal peptide or transmembrane domains. AC110015.1-002
is a short (N-terminally and C-terminally truncated) fragment of CADH2_human (that is why it lacks both the N-terminal secretory signal peptide and
transmembrane domain of the protein).
Conflict 2
We have identified no sequence based on conflict 2.
Conflict 3
We have identified no sequence based on conflict 3.
Conflict 4
In order to identify suspicious Gencode sequences (with domains deviating from the ‘normal’ size) we focused on those Pfam-A domains
(29,561 domains belonging to 2,752 PfamA families) that are present in human Swiss-Prot proteins (altogether 11,384 proteins) as we deemed Swiss-Prot
the most reliable protein database. First, we have determined the "normal size-range" of complete Pfam-A domains predicted in the best-curated Swiss-Prot
entries and used these criteria to find domains/proteins that significantly deviate from these values. In order to achieve this we removed incomplete domains:
i.e. domains whose beginning or end was missing in their first or last 5 positions in the multiple alignments provided by Pfam for each domain,
furthermore the lengths of which differ considerably (by at least 2 standard deviations) from the PfamA seed averages.
In the next step we ran a Blastp search with the Gencode sequences as queries against the 16,465 human Swissprot PfamA domains.
If the domain match with an Ensembl protein was considerably shorter (in this conservative approach the cutoff was set to 40% of the actual domain length) or
longer than the actual domain the Gencode protein was considered suspicious. With this approach we identified 67 Gencode proteins with domains of suspicious
length. The results of these analyses are summarized in Table 10.
It should be noted that most of the Gencode sequences identified by conflict 4 are known to have no start and/or no stop codon
(cf. Table 10, 4th column), a reflection of the fact that the corresponding transcripts are not full length.
In most cases the deviant domains are N-terminally and/or C-terminally truncated as a consequence of the incompleteness of the transcripts
(cf. Table 10, 5th column). In the present set of sequences we identified only three cases where the domain deviates from normal size
as a result of internal deletion. Further analysis of these cases suggests that the deletions are incompatible with the viability of the protein,
suggesting that the transcripts arise through aberrant splicing.
Table 10. Domain size deviation in Gencode sequences
| Gencode sequence |
Swiss-Prot entry containing best matching Pfam domain |
Deviant domain |
Comment in Gencode |
Deviation |
| AC004041.1-003 |
RAD50_HUMAN/3-1298 |
PF02463 |
[no_stop] |
N-terminal
|
| AC004080.4-005 |
HXA9_HUMAN/1-193 |
PF04617 |
|
N-terminal
|
| AC004500.4-006 |
AFF1_HUMAN/8-1208 |
PF05110 |
[no_start] [no_stop] |
N- and C-terminal
|
| AC006985.9-003 |
UD16_HUMAN/27-524 |
PF00201 |
|
N-terminal
|
| AC008440.4-004 |
MYADM_HUMAN/168-319 |
PF01284 |
[no_stop] |
C-terminal
|
| AC008440.4-005 |
MYADM_HUMAN/168-319 |
PF01284 |
[no_stop] |
C-terminal
|
| AC008440.4-006 |
MYADM_HUMAN/168-319 |
PF01284 |
[no_stop] |
C-terminal
|
| AC009404.1-003 |
DDX18_HUMAN/203-374 |
PF00270 |
[no_start] [no_stop] |
N-terminal
|
| AC011330.7-003 |
KCRU_HUMAN/148-414 |
PF00217 |
[no_stop] |
C-terminal
|
| AC012314.13-005 |
SEN34_HUMAN/217-308 |
PF01974 |
[no_stop] |
C-terminal
|
| AC012314.13-006 |
SEN34_HUMAN/217-308 |
PF01974 |
[no_stop] |
C-terminal
|
| AC012314.9-004 |
CNOT3_HUMAN/580-748 |
PF04153 |
|
C-terminal
|
| AC012314.9-008 |
CNOT3_HUMAN/2-242 |
PF04065 |
[no_start] [no_stop] |
N-terminal
|
| AC015691.9-002 |
TRIM6_HUMAN/92-133 |
PF00643 |
|
internal
|
| AC051649.4-011 |
TNNT3_HUMAN/72-214 |
PF00992 |
[no_stop] |
N- and C-terminal
|
| AC051649.4-012 |
TNNT3_HUMAN/72-214 |
PF00992 |
[no_stop] |
N- and C-terminal
|
| AC053503.5-009 |
DESM_HUMAN/106-414 |
PF00038 |
[no_start] |
N- and C-terminal
|
| AC068580.4-002 |
CATD_HUMAN/78-409 |
PF00026 |
[no_stop] |
C-terminal
|
| AC104389.18-002 |
HBD_HUMAN/7-141 |
PF00042 |
[no_stop] |
C-terminal
|
| AC121336.1-002 |
MYCT_HUMAN/65-590 |
PF00083 |
|
C-terminal
|
| AC132217.7-005 |
TY3H_HUMAN/195-526 |
PF00351 |
[no_start] [no_stop] |
N- and C-terminal
|
| AF121897.3-002 |
SH3BG_HUMAN/64-161 |
PF04908 |
[no_start] [no_stop] |
N-terminal
|
| AF121897.3-006 |
SH3BG_HUMAN/64-161 |
PF04908 |
[no_start] [no_stop] |
N-terminal
|
| AF121897.3-007 |
SH3BG_HUMAN/64-161 |
PF04908 |
[no_start] [no_stop] |
N-terminal
|
| AF277315.16-002 |
G6PD_HUMAN/211-505 |
PF02781 |
[no_stop] |
C-terminal
|
| AF277315.16-005 |
G6PD_HUMAN/211-505 |
PF02781 |
[no_stop] |
C-terminal
|
| AF277315.16-006 |
G6PD_HUMAN/211-505 |
PF02781 |
|
C-terminal
|
| AF277315.16-007 |
G6PD_HUMAN/211-505 |
PF02781 |
[no_stop] |
C-terminal
|
| AP000275.63-002 |
SYNJ1_HUMAN/535-866 |
PF03372 |
[no_stop] |
C-terminal
|
| AP000279.68-004 |
GCFC_HUMAN/389-892 |
PF07842 |
|
C-terminal
|
| AP000302.60-008 |
PUR2_HUMAN/105-298 |
PF01071 |
[no_stop] |
C-terminal
|
| AP000302.60-010 |
PUR2_HUMAN/105-298 |
PF01071 |
[no_stop] |
C-terminal
|
| AP000304.10-013 |
QORL_HUMAN/144-309 |
PF00107 |
[no_stop] |
C-terminal
|
| AP000313.6-005 |
ITSN1_HUMAN/1241-1422 |
PF00621 |
[no_start] [no_stop] |
N-terminal
|
| AP001462.1-004 |
MEN1_HUMAN/2-615 |
PF05053 |
[no_stop] |
C-terminal
|
| AP001462.1-005 |
MEN1_HUMAN/2-615 |
PF05053 |
[no_stop] |
C-terminal
|
| AP001462.1-006 |
MEN1_HUMAN/2-615 |
PF05053 |
[no_stop] |
C-terminal
|
| AP001462.1-009 |
MEN1_HUMAN/2-615 |
PF05053 |
[no_stop] |
C-terminal
|
| AP001462.1-010 |
MEN1_HUMAN/2-615 |
PF05053 |
[no_stop] |
C-terminal
|
| AP001462.3-016 |
GRP3_HUMAN/146-332 |
PF00617 |
[no_stop] |
C-terminal
|
| AP006216.3-003 |
ZPR1_HUMAN/48-207 |
PF03367 |
[no_start] [no_stop] |
N-terminal
|
| AP006216.7-002 |
APOC3_HUMAN/1-91 |
PF05778 |
[no_stop] |
C-terminal
|
| LA16c-OS12.1-016 |
LUC7L_HUMAN/4-271 |
PF03194 |
[no_start] |
N-terminal
|
| LL22NC03-28H9.1-008 |
SYN3_HUMAN/86-191 |
PF02078 |
[no_stop] |
C-terminal
|
| RP11-115M6.1-002 |
EM55_HUMAN/162-226 |
PF07653 |
[no_stop] |
C-terminal
|
| RP11-126K1.5-010 |
RFX5_HUMAN/76-158 |
PF02257 |
[no_start] |
N-terminal
|
| RP11-247A12.4-008 |
PTPA_HUMAN/25-358 |
PF03095 |
[no_stop] |
C-terminal
|
| RP11-247A12.5-001 |
CACP_HUMAN/34-616 |
PF00755 |
|
internal
|
| RP11-247A12.5-007 |
CACP_HUMAN/34-616 |
PF00755 |
[no_stop] |
N-terminal
|
| RP11-298J23.1-003 |
PEPC_HUMAN/72-387 |
PF00026 |
[no_stop] |
N- and C-terminal
|
| RP1-149A16.10-011 |
BPIL2_HUMAN/38-217 |
PF01273 |
|
C-terminal
|
| RP11-505P4.2-015 |
EF1A1_HUMAN/5-239 |
PF00009 |
|
C-terminal
|
| RP11-517O1.1-013 |
STAG2_HUMAN/154-273 |
PF08514 |
[no_stop] |
C-terminal
|
| RP11-517O1.1-014 |
STAG2_HUMAN/154-273 |
PF08514 |
[no_stop] |
C-terminal
|
| RP1-309K20.1-006 |
NFS1_HUMAN/59-422 |
PF00266 |
|
C-terminal
|
| RP1-309K20.2-014 |
CPNE1_HUMAN/304-451 |
PF07002 |
[no_stop] |
C-terminal
|
| RP1-309K20.2-015 |
CPNE1_HUMAN/304-451 |
PF07002 |
[no_start] [no_stop] |
N-terminal
|
| RP4-614O4.1-008 |
IF6_HUMAN/3-204 |
PF01912 |
[no_stop] |
C-terminal
|
| RP4-614O4.9-002 |
CT128_HUMAN/9-293 |
PF07894 |
|
N-terminal
|
|