Masterarbeit, 2014
101 Seiten, Note: First
1 Introduction
1.1 Computation in Electromagnetism
1.1.1 Maxwell’s Equations
1.1.2 Finite-Difference Time-Domain (FDTD)
1.2 Computational Parallelization Techniques & GPGPU
1.2.1 Parallel Computer Architecture
1.2.2 Parallel Algorithms & Programs
1.2.3 Emerging Parallelization Techniques: GPGPU
1.3 The Problem and The Objective
1.4 Thesis Overview
1.5 Original Contribution
2 Electromagnetism & Finite-Difference Time-Domain - Overview
2.1 Maxwell’s Equations
2.2 Finite-Difference Time-Domain (FDTD)
2.2.1 Frequency Dependent Material Parameters & Frequency Dependent FDTD
2.2.2 Boundary Condition
2.3 Summary of Maxwell’s Equations and FDTD Method
2.4 Computer Implementation of FDTD Method
2.4.1 Basics of FORTRAN 90 Programming
2.4.2 Implementation of FDTD Method
2.5 Advantages and Limitations of FDTD Computation
2.6 Concluding Remarks
3 Computation of FDTD on GPGPU using CUDA Programming
3.1 GPGPU - The Parallelization Monster and Computation Techniques
3.2 CUDA and CUDA Fortran
3.3 CUDA Implementation of FDTD Method for GPGPU Computation
3.4 Computation on Nvidia’s General Purpose GPU
3.4.1 GPU Hardware and support for FDTD Computation
3.4.2 Memory Coalescing
3.5 Execution of FDTD Method on GPU Hardware
3.6 Concluding Remarks
4 The Solution to The Problem
4.1 The Problem - Revisited
4.2 The Solution
4.3 Programmatic Implementation of the Solution
4.3.1 Implementation
4.3.2 Invoking Buffer Kernel
4.4 Possible Limitations and their Solutions
4.5 Concluding Remarks
5 Evaluation and Validation of The Solution
5.1 Testing of the Implemented Solution
5.1.1 Input Parameters for FDTD Computation
5.1.2 Hardware Environment
5.1.3 Test Results
5.2 Critical Analysis & Evaluation of Test Results
5.2.1 Speed-Up Analysis
5.2.2 Evaluation and Comments
6 Conclusion and Future Scope
6.1 Future Scope
6.2 Conclusion
A Survey Questions posted
A.1 Survey Questions posted via email and Researchgate OSN platform
The primary objective of this thesis is to address the performance bottleneck in FDTD computations on GPGPU architectures, which is caused by the high latency of data transfers between the host CPU and the device GPU. By implementing a software-based buffer mechanism, the work aims to optimize memory access patterns and improve the efficiency of data input/output (I/O) during the simulation process.
4.2 The Solution
The novel approach which is proposed in this thesis actually tries to optimize the memory access from the GPU to the main memory. This is achieved by introducing the concept of ’Buffer’ in the FDTD computation.
In computing, Data Buffer [44] can be implemented as a region on the physical memory storage used to temporarily store data, which may be moved from one place to another. In the world of High Performance Computing, when processes have on work on a piece of data, which has to be moved from one process to another, the technique to use data buffer is often used. The technique of data buffer is used while sending data from the sender to the receiver in order to hold the data being transferred. Buffers are very popular in implementation in input/output (I/O) of hardware devices and uses the First in, First out (FIFO) method to output the data. Another technique, which is also very popular in data transfer and holding the data for future usage, is called ’Cache’. But the basic difference between a buffer and a cache is that cache holds data, which has high possibility to be accessed again in near future, whereas, buffer only holds the data temporarily while the data is being transferred or communicated.
Since, the FDTD computation on General Purpose GPU can be imagined as a continuos sender-receiver computation, therefore this simple solution of introducing buffer for the data input/output (I/O) of FDTD computation improves the performance and optimizes the memory access by compensating the lantency gap between the memory host-GPU. When a program is computed on the CPU (host) and then the GPU kernel is launched to be able to compute on the GPU, then a CPU can be thought as a sender and the GPU can be thought as a receiver. In the same way, when the GPU completes the computation and sends back the computed data (output) to the CPU for further analysis or some other operation to be done by the CPU, then the GPU can be thought of as a sender and the CPU as a receiver.
1 Introduction: Provides an overview of electromagnetic computation, the necessity for FDTD, and the role of GPGPU parallelization in addressing modern computational demands.
2 Electromagnetism & Finite-Difference Time-Domain - Overview: Details the theoretical basis of Maxwell’s Equations and the FDTD method, including discretization and implementation strategies.
3 Computation of FDTD on GPGPU using CUDA Programming: Explores the hardware architecture of NVIDIA GPUs and the use of CUDA kernels to parallelize FDTD simulations.
4 The Solution to The Problem: Introduces the buffer-based memory optimization strategy to mitigate latency issues between the host CPU and GPU.
5 Evaluation and Validation of The Solution: Presents the experimental setup and results, analyzing the speed-up achieved by the buffer technique.
6 Conclusion and Future Scope: Summarizes the thesis findings and suggests potential future improvements to the implemented methodology.
data I/O, buffer, Finite difference methods, FDTD, time domain analysis, hardware, acceleration, high performance computing, parallel programming, parallel architectures, GPU, graphics processing unit, parallel computing, CUDA, multi-core computing
The work focuses on improving the efficiency of data input/output (I/O) operations for FDTD simulations running on GPGPU systems by addressing the high latency gap between the CPU and the GPU.
The key themes include parallel computing architectures, the Finite-Difference Time-Domain method, GPU memory management, and the implementation of software buffers to optimize data transfers.
The objective is to develop and implement a buffer-based technique to minimize performance bottlenecks caused by excessive data movement between the host and the GPU during FDTD computations.
The author uses Fortran and CUDA Fortran to implement the FDTD algorithm and the proposed buffer kernels, running tests on a system equipped with an NVIDIA Tesla K20Xm GPU.
The main body examines the theoretical background of FDTD, explains the GPGPU parallelization model (CUDA), details the design of the buffer-based solution, and validates its performance through experimental benchmarks.
The study is characterized by keywords such as GPGPU, FDTD, CUDA, data I/O, parallel computing, memory latency, and performance acceleration.
The buffer reduces the need for constant, high-latency memory transfers between the GPU and the main memory by temporarily storing intermediate field data on the GPU device itself.
Yes, the thesis discusses potential issues such as buffer overflow and under-run, explaining that the software-implemented approach in this study effectively manages these through dynamic sizing.
The evaluation demonstrates that the implemented buffer technique results in a speed-up (approximately 1.03x to 1.09x) compared to versions without the buffer, though performance may vary for random excitation points.
Der GRIN Verlag hat sich seit 1998 auf die Veröffentlichung akademischer eBooks und Bücher spezialisiert. Der GRIN Verlag steht damit als erstes Unternehmen für User Generated Quality Content. Die Verlagsseiten GRIN.com, Hausarbeiten.de und Diplomarbeiten24 bieten für Hochschullehrer, Absolventen und Studenten die ideale Plattform, wissenschaftliche Texte wie Hausarbeiten, Referate, Bachelorarbeiten, Masterarbeiten, Diplomarbeiten, Dissertationen und wissenschaftliche Aufsätze einem breiten Publikum zu präsentieren.
Kostenfreie Veröffentlichung: Hausarbeit, Bachelorarbeit, Diplomarbeit, Dissertation, Masterarbeit, Interpretation oder Referat jetzt veröffentlichen!

