An implementation of a numerical advection equation solver on modern graphics cards using compute unified device architecture
dc.contributor.author | Dang, Wei | |
dc.date.accessioned | 2022-03-16T20:31:56Z | |
dc.date.available | 2022-03-16T20:31:56Z | |
dc.date.issued | 2010-12 | |
dc.identifier.uri | http://hdl.handle.net/11122/12771 | |
dc.description | Thesis (M.S.) University of Alaska Fairbanks, 2010 | en_US |
dc.description.abstract | "In the past decade, the Graphics Processing Unit (GPU) is reported to have become a powerful general-purpose computation platform for various application areas. The Arctic Region Supercomputing Center (ARSC) intends to assess the capability of this emerging computing tool so that they may enlist it as component of supercomputing systems, but at a lower cost. This thesis reports on parallelization, on both GPU and CPU, of a numerical algorithm named the Total Variation Diminishing (TVD) scheme, which is used in the Eulerian Polar Parallel Ionospheric Model (EPPIM) developed at UAF's Geophysical Institute (GI) and ARSC. The GPU (single NVIDIA Tesla® C2050) and CPU (dual Intel Xeon x5560) implementations were parallelized using the Compute Unified Device Architecture (CUDA) language and OpenMP with the C language respectively. A speedup of up to 175x was observed when comparing the CUDA/GPU implementation to the non-parallelized CPU version, and of almost 40x when comparing to the parallelized CPU version. Results also demonstrated an average floating-point-operation rate of 107 GFLOPs, 351 times more than that the CPU version can offer. However, there is still space for improvement as only one tenth of the peak theoretical performance of the C2050 was achieved"--Leaf iii. | en_US |
dc.description.tableofcontents | 1. Introduction -- 1.1. Motivation -- 1.2. Similar work -- 1.3. Contribution -- 1.4. Thesis outline -- 2. Background -- 2.1. Evolution of GPU computing -- 2.2. Compute Unified Device Architecture -- 2.2.1. Hardware architecture -- 2.2.2. Software architecture -- 2.2.3. Terminology -- 2.2.4. Compilation workflow -- 2.2.5. CUDA memory model -- 2.2.6. Programming methodology -- 2.2.7. Performance considerations for scientific computing -- 2.3. Mathematical background -- 2.3.1. Continuity equation -- 2.3.2. Numerical schemes -- 2.3.3. The corner transport upwind scheme -- 2.3.4. The Lax-Wendroff scheme -- 2.3.5. The TVD scheme -- 3. Algorithms -- 3.1. Introduction -- 3.2. The serial algorithm -- 3.3. The parallel algorithms -- 4. Performance test and analysis -- 4.1. Hardware configuration -- 4.2. Methodology -- 4.2.1. Testing approach -- 4.2.2. Testing environment -- 4.2.3. Validation -- 4.3. Results and analysis -- 4.3.1. Serial implementation -- 4.3.2. The single-kernal parallel implementation -- 4.3.3. The multi-kernal parallel implementation -- 5. Conclusions and future work -- 5.1. Conclusions -- 5.2. Future work -- References -- Appendix. | en_US |
dc.language.iso | en_US | en_US |
dc.subject | Graphics processing units | en_US |
dc.subject | Computer graphics | en_US |
dc.title | An implementation of a numerical advection equation solver on modern graphics cards using compute unified device architecture | en_US |
dc.type | Thesis | en_US |
dc.type.degree | ms | en_US |
dc.identifier.department | Department of Electrical and Computer Engineering | en_US |
refterms.dateFOA | 2022-03-16T20:31:56Z |