Bachelorarbeit, 2015
69 Seiten
1. Introduction
1.1. Cellular compartmentalization
1.2. Nuclear localization signal (NLS)
1.2.1. Monopartite NLS
1.2.2. Bipartite NLS
1.2.3. PY-NLS
1.3. Nuclear export signal (NES)
1.4. NLSdb - Database of nuclear localization signals 1.0
1.5. Motivation
2. Materials and Methods
2.1. Collection of experimentally verified nuclear transport signals
2.1.1. NLSs
2.1.1.1. The database NLSdb1.0
2.1.1.2. Publication of Lange et al.
2.1.1.3. Prediction tool SeqNLS
2.1.1.4. The Swiss-Prot database
2.1.1.5. PY-NLS sources
2.1.1.6. Others
2.1.2. NESs
2.1.2.1. The database ValidNESs
2.1.2.2. The NESdb
2.1.2.3. NESbase database
2.1.2.4. The Swiss-Prot database
2.1.2.5. The prediction tool NESMapper
2.1.2.6. Others
2.1.3. Test set – unannotated Swiss-Prot proteins
2.2. In silico mutagenesis
2.2.1. Sets of nuclear and non-nuclear proteins
2.2.2. Mutagenesis approach
2.3. Data analysis
2.3.1. Data pre-processing tools
2.3.2. Protein function and NLS prediction tools
3. Results and Discussion
3.1. Experimental development dataset
3.2. Sequence properties of nuclear localization signals and their proteins
3.2.1. Signal length
3.2.2. Organism of origin
3.2.3. Sequence similarity
3.2.4. Subcellular localization
3.2.5. Clustering of signals
3.3. 4301 novel potential NLSs through mutagenesis
3.3.1. Characterization of potential NLSs
3.3.2. Increasing coverage from 9% to 43%
3.4. Benchmark - NLSdb1.0 vs. NLSdb2.0
3.4.1. 38% of proteins with novel potential NLSs in NLSdb1.0
3.4.2. 100% overlap between NLSdb1.0 and NLSdb2.0
4. Conclusion
5. Outlook
This thesis aims to update the NLSdb database by integrating newly collected experimental data and generating novel potential nuclear localization signals (NLSs) through in silico mutagenesis, thereby improving the coverage and predictive capability for identifying nuclear proteins. Furthermore, the study performs an extensive analysis of sequence properties and sub-group classification of transport signals to provide biological insights.
2.2.2. Mutagenesis approach
The development set of 2452 experimentally verified NLSs was used as training set for the iterative in silico mutagenesis approach. The algorithm was divided into three main steps:
Firstly, the size of the development set was decreased for keeping only experimental NLSs that can be found in proteins with annotated nuclear location in Swiss-Prot. Only the signals that did not occur in protein sequences of the non-nuclear set were taken. These signals were then tested to occur in the protein sequences of the nuclear dataset.
Secondly, we performed a mutational step, using the signals of the reduced development set as input. Figure 2 visualizes the in silico mutation with an example. Every signal was mutated at each position into all 20 amino acids. All possible mutations of every signal were tested again for their occurrence in the protein sequences of the non-nuclear and the nuclear dataset.
The last step was an iteration on the mutated signals. Only mutated signals matching in the nuclear proteins, but not in the non-nuclear proteins, were sorted into the result set and shortened by one position at the end of the signals. The shorter signals still matching exclusively in the sequences of the nuclear protein set were further shortened. This was repeated until the created sequence matches either in none or both of the two protein sets. All resulting signals formed the set of potential NLSs.
1. Introduction: This chapter covers fundamental cellular concepts, defining nuclear localization and export signals (NLS/NES) and presenting the motivation for updating the NLSdb database.
2. Materials and Methods: This section details the data collection from literature and databases, the criteria for reliable evidence, the in silico mutagenesis algorithm, and the bioinformatics tools used for sequence analysis and clustering.
3. Results and Discussion: This central chapter presents the comprehensive sequence analysis of transport signals, the generation of 4301 new potential NLSs, and benchmarking results showing the improved coverage of the updated NLSdb2.0.
4. Conclusion: The conclusion summarizes the successful update of the database and highlights the utility of the generated data for predicting nuclear localization and understanding protein transport mechanisms.
5. Outlook: This section discusses future directions, including the planned analysis of nuclear export signals (NESs) and potential improvements to the database user interface.
Nuclear transport, Protein localization, NLSdb, Bioinformatics, Monopartite NLS, Bipartite NLS, PY-NLS, In silico mutagenesis, Sequence analysis, Consensus sequence, Protein sequences, Subcellular localization, Karyopherins, Swiss-Prot, Database update
The research focuses on the bioinformatics analysis of nuclear transport signals and the update of the NLSdb database to improve the identification of proteins imported into the nucleus.
The central themes include the categorization of nuclear localization signals (monopartite, bipartite, and PY-NLS), in silico signal discovery, and the statistical analysis of protein sequences containing these signals.
The primary objective is to update the 2003 version of NLSdb to incorporate recent research, resulting in a more comprehensive database that increases the coverage for detecting nuclear proteins.
The study utilizes in silico mutagenesis, sequence clustering via UPGMA, pattern matching, PSI-Blast for homology inference, and redundancy reduction using Cd-hit and Uniqueprot.
The main section covers the collection of experimental data, the algorithmic discovery of 4301 new potential NLSs, the analysis of signal properties, and the benchmarking of the new database version against the old one.
Key terms include bioinformatics, NLSdb, nuclear localization signal, in silico mutagenesis, sequence clustering, and protein transport.
Potential signals are generated by mutating experimental signals at every position and iteratively shortening them until they match exclusively within the sequences of a verified nuclear protein dataset.
The validation confirms that proteins predicted to have an NLS via the updated database often exhibit known nuclear functions or localization in literature, providing support for the predictive accuracy of NLSdb2.0.
The analysis revealed a high frequency of NLS-containing proteins of viral origin, which relates to the biological necessity of viruses to enter the host cell nucleus for replication.
Der GRIN Verlag hat sich seit 1998 auf die Veröffentlichung akademischer eBooks und Bücher spezialisiert. Der GRIN Verlag steht damit als erstes Unternehmen für User Generated Quality Content. Die Verlagsseiten GRIN.com, Hausarbeiten.de und Diplomarbeiten24 bieten für Hochschullehrer, Absolventen und Studenten die ideale Plattform, wissenschaftliche Texte wie Hausarbeiten, Referate, Bachelorarbeiten, Masterarbeiten, Diplomarbeiten, Dissertationen und wissenschaftliche Aufsätze einem breiten Publikum zu präsentieren.
Kostenfreie Veröffentlichung: Hausarbeit, Bachelorarbeit, Diplomarbeit, Dissertation, Masterarbeit, Interpretation oder Referat jetzt veröffentlichen!

