Show simple item record

dc.contributor.authorDang, Wei
dc.date.accessioned2022-03-16T20:31:56Z
dc.date.available2022-03-16T20:31:56Z
dc.date.issued2010-12
dc.identifier.urihttp://hdl.handle.net/11122/12771
dc.descriptionThesis (M.S.) University of Alaska Fairbanks, 2010en_US
dc.description.abstract"In the past decade, the Graphics Processing Unit (GPU) is reported to have become a powerful general-purpose computation platform for various application areas. The Arctic Region Supercomputing Center (ARSC) intends to assess the capability of this emerging computing tool so that they may enlist it as component of supercomputing systems, but at a lower cost. This thesis reports on parallelization, on both GPU and CPU, of a numerical algorithm named the Total Variation Diminishing (TVD) scheme, which is used in the Eulerian Polar Parallel Ionospheric Model (EPPIM) developed at UAF's Geophysical Institute (GI) and ARSC. The GPU (single NVIDIA Tesla® C2050) and CPU (dual Intel Xeon x5560) implementations were parallelized using the Compute Unified Device Architecture (CUDA) language and OpenMP with the C language respectively. A speedup of up to 175x was observed when comparing the CUDA/GPU implementation to the non-parallelized CPU version, and of almost 40x when comparing to the parallelized CPU version. Results also demonstrated an average floating-point-operation rate of 107 GFLOPs, 351 times more than that the CPU version can offer. However, there is still space for improvement as only one tenth of the peak theoretical performance of the C2050 was achieved"--Leaf iii.en_US
dc.description.tableofcontents1. Introduction -- 1.1. Motivation -- 1.2. Similar work -- 1.3. Contribution -- 1.4. Thesis outline -- 2. Background -- 2.1. Evolution of GPU computing -- 2.2. Compute Unified Device Architecture -- 2.2.1. Hardware architecture -- 2.2.2. Software architecture -- 2.2.3. Terminology -- 2.2.4. Compilation workflow -- 2.2.5. CUDA memory model -- 2.2.6. Programming methodology -- 2.2.7. Performance considerations for scientific computing -- 2.3. Mathematical background -- 2.3.1. Continuity equation -- 2.3.2. Numerical schemes -- 2.3.3. The corner transport upwind scheme -- 2.3.4. The Lax-Wendroff scheme -- 2.3.5. The TVD scheme -- 3. Algorithms -- 3.1. Introduction -- 3.2. The serial algorithm -- 3.3. The parallel algorithms -- 4. Performance test and analysis -- 4.1. Hardware configuration -- 4.2. Methodology -- 4.2.1. Testing approach -- 4.2.2. Testing environment -- 4.2.3. Validation -- 4.3. Results and analysis -- 4.3.1. Serial implementation -- 4.3.2. The single-kernal parallel implementation -- 4.3.3. The multi-kernal parallel implementation -- 5. Conclusions and future work -- 5.1. Conclusions -- 5.2. Future work -- References -- Appendix.en_US
dc.language.isoen_USen_US
dc.subjectGraphics processing unitsen_US
dc.subjectComputer graphicsen_US
dc.titleAn implementation of a numerical advection equation solver on modern graphics cards using compute unified device architectureen_US
dc.typeThesisen_US
dc.type.degreemsen_US
dc.identifier.departmentDepartment of Electrical and Computer Engineeringen_US
refterms.dateFOA2022-03-16T20:31:56Z


Files in this item

Thumbnail
Name:
Dang_W_2010.pdf
Size:
11.38Mb
Format:
PDF

This item appears in the following Collection(s)

Show simple item record