Analysis of Nuclear Transport Signals

Bachelorarbeit, 2015
69 Seiten

Informatik - Bioinformatik

Leseprobe

1. Introduction

1.1. Cellular compartmentalization

1.2. Nuclear localization signal (NLS)

1.2.1. Monopartite NLS

1.2.2. Bipartite NLS

1.2.3. PY-NLS

1.3. Nuclear export signal (NES)

1.4. NLSdb - Database of nuclear localization signals 1.0

1.5. Motivation

2. Materials and Methods

2.1. Collection of experimentally verified nuclear transport signals

2.1.1. NLSs

2.1.1.1. The database NLSdb1.0

2.1.1.2. Publication of Lange et al.

2.1.1.3. Prediction tool SeqNLS

2.1.1.4. The Swiss-Prot database

2.1.1.5. PY-NLS sources

2.1.1.6. Others

2.1.2. NESs

2.1.2.1. The database ValidNESs

2.1.2.2. The NESdb

2.1.2.3. NESbase database

2.1.2.4. The Swiss-Prot database

2.1.2.5. The prediction tool NESMapper

2.1.2.6. Others

2.1.3. Test set – unannotated Swiss-Prot proteins

2.2. In silico mutagenesis

2.2.1. Sets of nuclear and non-nuclear proteins

2.2.2. Mutagenesis approach

2.3. Data analysis

2.3.1. Data pre-processing tools

2.3.2. Protein function and NLS prediction tools

3. Results and Discussion

3.1. Experimental development dataset

3.2. Sequence properties of nuclear localization signals and their proteins

3.2.1. Signal length

3.2.2. Organism of origin

3.2.3. Sequence similarity

3.2.4. Subcellular localization

3.2.5. Clustering of signals

3.3. 4301 novel potential NLSs through mutagenesis

3.3.1. Characterization of potential NLSs

3.3.2. Increasing coverage from 9% to 43%

3.4. Benchmark - NLSdb1.0 vs. NLSdb2.0

3.4.1. 38% of proteins with novel potential NLSs in NLSdb1.0

3.4.2. 100% overlap between NLSdb1.0 and NLSdb2.0

4. Conclusion

5. Outlook

Objectives & Topics

This thesis aims to update the NLSdb database by integrating newly collected experimental data and generating novel potential nuclear localization signals (NLSs) through in silico mutagenesis, thereby improving the coverage and predictive capability for identifying nuclear proteins. Furthermore, the study performs an extensive analysis of sequence properties and sub-group classification of transport signals to provide biological insights.

Updating the NLSdb database with current experimental data
Application of in silico mutagenesis to discover novel potential NLSs
Analysis of sequence properties, signal length, and organism distribution of transport signals
Refinement of consensus sequences through clustering and alignment of signal sub-groups
Benchmarking the updated database (NLSdb2.0) against the previous version and evaluating predictive coverage

Excerpt from the book

2.2.2. Mutagenesis approach

The development set of 2452 experimentally verified NLSs was used as training set for the iterative in silico mutagenesis approach. The algorithm was divided into three main steps:

Firstly, the size of the development set was decreased for keeping only experimental NLSs that can be found in proteins with annotated nuclear location in Swiss-Prot. Only the signals that did not occur in protein sequences of the non-nuclear set were taken. These signals were then tested to occur in the protein sequences of the nuclear dataset.

Secondly, we performed a mutational step, using the signals of the reduced development set as input. Figure 2 visualizes the in silico mutation with an example. Every signal was mutated at each position into all 20 amino acids. All possible mutations of every signal were tested again for their occurrence in the protein sequences of the non-nuclear and the nuclear dataset.

The last step was an iteration on the mutated signals. Only mutated signals matching in the nuclear proteins, but not in the non-nuclear proteins, were sorted into the result set and shortened by one position at the end of the signals. The shorter signals still matching exclusively in the sequences of the nuclear protein set were further shortened. This was repeated until the created sequence matches either in none or both of the two protein sets. All resulting signals formed the set of potential NLSs.

Summary of Chapters

1. Introduction: This chapter covers fundamental cellular concepts, defining nuclear localization and export signals (NLS/NES) and presenting the motivation for updating the NLSdb database.

2. Materials and Methods: This section details the data collection from literature and databases, the criteria for reliable evidence, the in silico mutagenesis algorithm, and the bioinformatics tools used for sequence analysis and clustering.

3. Results and Discussion: This central chapter presents the comprehensive sequence analysis of transport signals, the generation of 4301 new potential NLSs, and benchmarking results showing the improved coverage of the updated NLSdb2.0.

4. Conclusion: The conclusion summarizes the successful update of the database and highlights the utility of the generated data for predicting nuclear localization and understanding protein transport mechanisms.

5. Outlook: This section discusses future directions, including the planned analysis of nuclear export signals (NESs) and potential improvements to the database user interface.

Keywords

Nuclear transport, Protein localization, NLSdb, Bioinformatics, Monopartite NLS, Bipartite NLS, PY-NLS, In silico mutagenesis, Sequence analysis, Consensus sequence, Protein sequences, Subcellular localization, Karyopherins, Swiss-Prot, Database update

Frequently Asked Questions

What is the primary focus of this research?

The research focuses on the bioinformatics analysis of nuclear transport signals and the update of the NLSdb database to improve the identification of proteins imported into the nucleus.

What are the central themes of this work?

The central themes include the categorization of nuclear localization signals (monopartite, bipartite, and PY-NLS), in silico signal discovery, and the statistical analysis of protein sequences containing these signals.

What is the main objective of the thesis?

The primary objective is to update the 2003 version of NLSdb to incorporate recent research, resulting in a more comprehensive database that increases the coverage for detecting nuclear proteins.

Which computational methods are employed in this study?

The study utilizes in silico mutagenesis, sequence clustering via UPGMA, pattern matching, PSI-Blast for homology inference, and redundancy reduction using Cd-hit and Uniqueprot.

What is covered in the main section of the work?

The main section covers the collection of experimental data, the algorithmic discovery of 4301 new potential NLSs, the analysis of signal properties, and the benchmarking of the new database version against the old one.

Which keywords best characterize this work?

Key terms include bioinformatics, NLSdb, nuclear localization signal, in silico mutagenesis, sequence clustering, and protein transport.

How does the in silico mutagenesis process define potential signals?

Potential signals are generated by mutating experimental signals at every position and iteratively shortening them until they match exclusively within the sequences of a verified nuclear protein dataset.

What does the validation using randomly chosen proteins demonstrate?

The validation confirms that proteins predicted to have an NLS via the updated database often exhibit known nuclear functions or localization in literature, providing support for the predictive accuracy of NLSdb2.0.

Why are viruses a significant topic in the context of protein transport signals?

The analysis revealed a high frequency of NLS-containing proteins of viral origin, which relates to the biological necessity of viruses to enter the host cell nucleus for replication.

Ende der Leseprobe aus 69 Seiten - nach oben

Details

Titel: Analysis of Nuclear Transport Signals
Hochschule: Technische Universität München
Autor: Silvana Wolf (Autor:in)
Erscheinungsjahr: 2015
Seiten: 69
Katalognummer: V365482
ISBN (eBook): 9783668443815
ISBN (Buch): 9783668443822
Dateigröße: 3191 KB
Sprache: Englisch
Schlagworte: analysis nuclear transport signals
Produktsicherheit: GRIN Publishing GmbH
Preis (Ebook): US$ 34,99
Preis (Book): US$ 49,99

Arbeit zitieren: Silvana Wolf (Autor:in), 2015, Analysis of Nuclear Transport Signals, München, Page::Imprint:: GRINVerlagOHG, https://www.diplomarbeiten24.de/document/365482