Communities
People & Pages
Collections
Post has attachment
Add a comment...
Post has attachment
Add a comment...
Post has attachment
Public
Tips for #GPU Performance! Maximizing Unified Memory Performance in NVIDIA #CUDA
#HPC #DataScience http://bit.ly/2zoHdck
#HPC #DataScience http://bit.ly/2zoHdck
Add a comment...
Post has attachment
ThunderSVM: A Fast SVM Library on GPUs and CPUs
(Zeyi Wen, Jiashuai Shi, Bingsheng He, Qinbin Li, Jian Chen)
#GPU #CUDA #OpenMP #MachineLearning #ML #DataMining #SVM #Package
Support Vector Machines (SVMs) are classic supervised learning models for classification, regression and distribution estimation. A survey conducted by Kaggle in 2017 shows that 26% of the data mining and machine learning practitioners are users of SVMs. However, SVM training and prediction are very expensive computationally for large and complex problems. This paper presents an efficient and open source SVM software toolkit called ThunderSVM which exploits the high-performance of Graphics Processing Units (GPUs) and multi-core CPUs. ThunderSVM supports all the functionalities-including classification (SVC), regression (SVR) and one-class SVMs-of LibSVM and uses identical command line options, such that existing LibSVM users can easily apply our toolkit. ThunderSVM can be used through multiple language interfaces including C/C++, Python, R and MATLAB. Our experimental results show that ThunderSVM is generally an order of magnitude faster than LibSVM while producing identical SVMs. In addition to the high efficiency, we design our convex optimization solver in a general way such that SVC, SVR, and one-class SVMs share the same solver for the ease of maintenance. Documentation, examples, and more about ThunderSVM are available.
https://hgpu.org/?p=17912
(Zeyi Wen, Jiashuai Shi, Bingsheng He, Qinbin Li, Jian Chen)
#GPU #CUDA #OpenMP #MachineLearning #ML #DataMining #SVM #Package
Support Vector Machines (SVMs) are classic supervised learning models for classification, regression and distribution estimation. A survey conducted by Kaggle in 2017 shows that 26% of the data mining and machine learning practitioners are users of SVMs. However, SVM training and prediction are very expensive computationally for large and complex problems. This paper presents an efficient and open source SVM software toolkit called ThunderSVM which exploits the high-performance of Graphics Processing Units (GPUs) and multi-core CPUs. ThunderSVM supports all the functionalities-including classification (SVC), regression (SVR) and one-class SVMs-of LibSVM and uses identical command line options, such that existing LibSVM users can easily apply our toolkit. ThunderSVM can be used through multiple language interfaces including C/C++, Python, R and MATLAB. Our experimental results show that ThunderSVM is generally an order of magnitude faster than LibSVM while producing identical SVMs. In addition to the high efficiency, we design our convex optimization solver in a general way such that SVC, SVR, and one-class SVMs share the same solver for the ease of maintenance. Documentation, examples, and more about ThunderSVM are available.
https://hgpu.org/?p=17912
Add a comment...
Post has attachment
Public
GPU Acceleration of a High-Order Discontinuous Galerkin Incompressible Flow Solver
(Ali Karakus, Noel Chalmers, Kasia Swirydowicz, Timothy Warburton)
#GPU #CUDA #CFD #FluidDynamics #NSE
We present a GPU-accelerated version of a high-order discontinuous Galerkin discretization of the unsteady incompressible Navier-Stokes equations. The equations are discretized in time using a semi-implicit scheme with explicit treatment of the nonlinear term and implicit treatment of the split Stokes operators. The pressure system is solved with a conjugate gradient method together with a fully GPU-accelerated multigrid preconditioner which is designed to minimize memory requirements and to increase overall performance. A semi-Lagrangian subcycling advection algorithm is used to shift the computational load per timestep away from the pressure Poisson solve by allowing larger timestep sizes in exchange for an increased number of advection steps. Numerical results confirm we achieve the design order accuracy in time and space. We optimize the performance of the most time-consuming kernels by tuning the fine-grain parallelism, memory utilization, and maximizing bandwidth. To assess overall performance we present an empirically calibrated roofline performance model for a target GPU to explain the achieved efficiency. We demonstrate that, in the most cases, the kernels used in the solver are close to their empirically predicted roofline performance.
https://hgpu.org/?p=17910
(Ali Karakus, Noel Chalmers, Kasia Swirydowicz, Timothy Warburton)
#GPU #CUDA #CFD #FluidDynamics #NSE
We present a GPU-accelerated version of a high-order discontinuous Galerkin discretization of the unsteady incompressible Navier-Stokes equations. The equations are discretized in time using a semi-implicit scheme with explicit treatment of the nonlinear term and implicit treatment of the split Stokes operators. The pressure system is solved with a conjugate gradient method together with a fully GPU-accelerated multigrid preconditioner which is designed to minimize memory requirements and to increase overall performance. A semi-Lagrangian subcycling advection algorithm is used to shift the computational load per timestep away from the pressure Poisson solve by allowing larger timestep sizes in exchange for an increased number of advection steps. Numerical results confirm we achieve the design order accuracy in time and space. We optimize the performance of the most time-consuming kernels by tuning the fine-grain parallelism, memory utilization, and maximizing bandwidth. To assess overall performance we present an empirically calibrated roofline performance model for a target GPU to explain the achieved efficiency. We demonstrate that, in the most cases, the kernels used in the solver are close to their empirically predicted roofline performance.
https://hgpu.org/?p=17910
Add a comment...
Post has attachment
Public
Analysing the Performance of GPU Hash Tables for State Space Exploration
(Nathan Cassee, Anton Wijs)
#GPU #CUDA #Hash #DSE #Performance
In the past few years, General Purpose Graphics Processors (GPUs) have been used to significantly speed up numerous applications. One of the areas in which GPUs have recently led to a significant speed-up is model checking. In model checking, state spaces, i.e., large directed graphs, are explored to verify whether models satisfy desirable properties. GPUexplore is a GPU-based model checker that uses a hash table to efficiently keep track of already explored states. As a large number of states is discovered and stored during such an exploration, the hash table should be able to quickly handle many inserts and queries concurrently. In this paper, we experimentally compare two different hash tables optimised for the GPU, one being the GPUexplore hash table, and the other using Cuckoo hashing. We compare the performance of both hash tables using random and non-random data obtained from model checking experiments, to analyse the applicability of the two hash tables for state space exploration. We conclude that Cuckoo hashing is three times faster than GPUexplore hashing for random data, and that Cuckoo hashing is five to nine times faster for non-random data. This suggests great potential to further speed up GPUexplore in the near future.
https://hgpu.org/?p=17909
(Nathan Cassee, Anton Wijs)
#GPU #CUDA #Hash #DSE #Performance
In the past few years, General Purpose Graphics Processors (GPUs) have been used to significantly speed up numerous applications. One of the areas in which GPUs have recently led to a significant speed-up is model checking. In model checking, state spaces, i.e., large directed graphs, are explored to verify whether models satisfy desirable properties. GPUexplore is a GPU-based model checker that uses a hash table to efficiently keep track of already explored states. As a large number of states is discovered and stored during such an exploration, the hash table should be able to quickly handle many inserts and queries concurrently. In this paper, we experimentally compare two different hash tables optimised for the GPU, one being the GPUexplore hash table, and the other using Cuckoo hashing. We compare the performance of both hash tables using random and non-random data obtained from model checking experiments, to analyse the applicability of the two hash tables for state space exploration. We conclude that Cuckoo hashing is three times faster than GPUexplore hashing for random data, and that Cuckoo hashing is five to nine times faster for non-random data. This suggests great potential to further speed up GPUexplore in the near future.
https://hgpu.org/?p=17909
Add a comment...
Post has attachment
Baidu Apollo 2.0 Continues to Evolve with Neousys Technology's Nuvo-6108GC
This year, the Chinese IT heavyweight, #Baidu, will launch the long-expected #Apollo2.0 project. It is a comprehensive leap for China's #autonomous driving technologies. More than 200 partners support this project. The recommended reference hardware is based on the #Nuvo6108GC computer by #Neousys Technology (Taiwan). This computer uses the #GTX1080Ti graphic card by Nvidia and Intel’s 6th generation processors. To communicate with the CAN #invehicle networks, the computer provides PCI card slots. ESD’s CAN-PCIe/402 interface boards are recommended for Baidu Apollo development partners.
▶ Find out what Neousys GC series can do for you:
https://www.neousys-tech.com/en/discover/gpu-embedded-computing/?utm_source=Googleplus&utm_medium=social&utm_campaign=BaiduApollo2
▶ Learn more about Neousys Nuvo-6108GC #GPU Computer:
https://www.neousys-tech.com/en/product/application/rugged-embedded/nuvo-6108gc-gpu-computing/?utm_source=Googleplus&utm_medium=social&utm_campaign=BaiduApollo2
#CES #Autonomous #Autocar #Automation
This year, the Chinese IT heavyweight, #Baidu, will launch the long-expected #Apollo2.0 project. It is a comprehensive leap for China's #autonomous driving technologies. More than 200 partners support this project. The recommended reference hardware is based on the #Nuvo6108GC computer by #Neousys Technology (Taiwan). This computer uses the #GTX1080Ti graphic card by Nvidia and Intel’s 6th generation processors. To communicate with the CAN #invehicle networks, the computer provides PCI card slots. ESD’s CAN-PCIe/402 interface boards are recommended for Baidu Apollo development partners.
▶ Find out what Neousys GC series can do for you:
https://www.neousys-tech.com/en/discover/gpu-embedded-computing/?utm_source=Googleplus&utm_medium=social&utm_campaign=BaiduApollo2
▶ Learn more about Neousys Nuvo-6108GC #GPU Computer:
https://www.neousys-tech.com/en/product/application/rugged-embedded/nuvo-6108gc-gpu-computing/?utm_source=Googleplus&utm_medium=social&utm_campaign=BaiduApollo2
#CES #Autonomous #Autocar #Automation
Add a comment...
Wait while more posts are being loaded
























































































































































































































































































