What Fuels Transformers in Computer Vision? Unraveling ViT's Advantages

Masterarbeit, 2022
39 Seiten, Note: 7.50

Leseprobe

Chapter 1
- Introduction
- Purpose Statement
- Approach
  - Natural Language Processing (NLP)
  - Computer Vision (CV)
Chapter 2
- Transformer
- Transformer - Building Blocks
  - Transformer - Workflow
  - Transformers - Digest
- Vision Transformer (ViT)
  - Key Ideas
- ViT in CNN Realm
  - ViT - State of the Art (SOTA)
  - ViT and CNN: A Shared Vision?
Chapter 3.
- Perspectives for Transformers and ViTs
- Selected Learning Paradigms
  - Model Soups - Ensemble Learning
  - Multimodal Learning . .
  - Self-Supervised Learning.
  - Other Approaches and Open Question
- Beyond Transformers?
- Personal Path Of Exploration
- Conclusion

Objectives and Key Themes

This master's thesis examines the architecture and functionality of Vision Transformers (ViT), a type of neural network that leverages the Transformer architecture to achieve state-of-the-art results in computer vision tasks. The study aims to understand the strengths and limitations of ViT by tracing their origins and analyzing key elements such as self-attention mechanisms. The objective is to demonstrate how these models compete and outperform traditional convolutional neural networks (CNNs) in the domain of computer vision.

The role of Transformer neural architectures (TNA) in computer vision
Analysis of the strengths and limitations of ViT compared to CNNs
Exploration of key elements within ViT, including self-attention mechanisms
Discussion of potential future enhancements and research directions for Transformer-based architectures
Examination of the potential link between Transformer-based neural architectures and Turing-completeness

Chapter Summaries

Chapter 1 introduces the topic of computer vision and the role of deep learning. It highlights the limitations of traditional CNNs in achieving human-like generalization capabilities and explores the potential of Transformer architectures as a solution. The chapter introduces ViT, a Transformer-based neural network specifically designed for image processing.
Chapter 2 delves into the Transformer architecture, explaining its building blocks, workflow, and key concepts. It also analyzes the application of Transformer models in the field of computer vision, specifically discussing ViT and its performance compared to CNNs.
Chapter 3 explores various perspectives for Transformers and ViTs, including advanced learning paradigms like ensemble learning, multimodal learning, and self-supervised learning. It also discusses potential future directions for research in this area.

Keywords

The key terms and concepts explored in this thesis include Vision Transformers (ViT), Transformer neural architectures (TNA), self-attention mechanisms, computer vision, convolutional neural networks (CNNs), deep learning, state-of-the-art (SOTA), Turing-completeness, and advanced learning paradigms like ensemble learning, multimodal learning, and self-supervised learning.

Ende der Leseprobe aus 39 Seiten - nach oben

Details

Titel: What Fuels Transformers in Computer Vision? Unraveling ViT's Advantages
Hochschule: Universidad de Alcalá
Veranstaltung: Artificial Intelligence and Deep Learning
Note: 7.50
Autor: Tolga Topal (Autor:in)
Erscheinungsjahr: 2022
Seiten: 39
Katalognummer: V1437625
ISBN (Buch): 9783346993311
Sprache: Englisch
Schlagworte: Artificial intelligence AI deep learning transformers computer vision Vision Transformers
Produktsicherheit: GRIN Publishing GmbH
Preis (Ebook): US$ 17,99
Preis (Book): US$ 20,99

Arbeit zitieren: Tolga Topal (Autor:in), 2022, What Fuels Transformers in Computer Vision? Unraveling ViT's Advantages, München, Page::Imprint:: GRINVerlagOHG, https://www.diplomarbeiten24.de/document/1437625

Kommentare

Melden Sie sich an, um einen Kommentar zu schreiben

Noch keine Kommentare.