المنتديات
الأشخاص والصفحات
المختارات
تحتوي المشاركة على مرفق
إضافة تعليق...
تحتوي المشاركة على مرفق
PyTorch : An open source deep learning platform that provides a seamless path from research prototyping to production deployment.
#openSource #github #pytorch #neuralNetwork #autograd #gpu #numpy #deepLearning #tensor #python
#openSource #github #pytorch #neuralNetwork #autograd #gpu #numpy #deepLearning #tensor #python
تحتوي المشاركة على مرفق
BigDL: Distributed Deep Learning Library for Apache Spark
#DeepLearning #Library #ApacheSpark #github #intelAnalytics #BigDL #bigData #hadoop #python #scala #keras #ai
#DeepLearning #Library #ApacheSpark #github #intelAnalytics #BigDL #bigData #hadoop #python #scala #keras #ai
تحتوي المشاركة على مرفق
caffe2 : A New Lightweight, Modular, and Scalable Deep Learning Framework
#caffe2 #DeepLearning #Framework #machineLearning #ai #artificialintelligence #neuralNetworks #fbOpenSource #facebookarchive #pytorch #Tensor
#caffe2 #DeepLearning #Framework #machineLearning #ai #artificialintelligence #neuralNetworks #fbOpenSource #facebookarchive #pytorch #Tensor
تحتوي المشاركة على مرفق
Analyzing GPU Tensor Core Potential for Fast Reductions
(Roberto Carrasco, Raimundo Vega, Cristóbal A. Navarro)
#CUDA #DeepLearning #DL #Performance
The Nvidia GPU architecture has introduced new computing elements such as the tensor cores, which are special processing units dedicated to perform fast matrix-multiply-accumulate (MMA) operations and accelerate Deep Learning applications. In this work we present the idea of using tensor cores for a different purpose such as the parallel arithmetic reduction problem, and propose a new GPU tensor-core based algorithm as well as analyze its potential performance benefits in comparison to a traditional GPU-based one. The proposed method, encodes the reduction of n numbers as a set of m×m MMA tensor-core operations (for Nvidia’s Volta architecture m=16) and takes advantage from the fact that each MMA operation takes just one GPU cycle. When analyzing the cost under a simplified GPU computing model, the result is that the new algorithm manages to reduce a problem of n numbers in T(n)=5*log_m^2(n) steps with a speedup of S=4/5*log_2(m^2).
https://hgpu.org/?p=18796
(Roberto Carrasco, Raimundo Vega, Cristóbal A. Navarro)
#CUDA #DeepLearning #DL #Performance
The Nvidia GPU architecture has introduced new computing elements such as the tensor cores, which are special processing units dedicated to perform fast matrix-multiply-accumulate (MMA) operations and accelerate Deep Learning applications. In this work we present the idea of using tensor cores for a different purpose such as the parallel arithmetic reduction problem, and propose a new GPU tensor-core based algorithm as well as analyze its potential performance benefits in comparison to a traditional GPU-based one. The proposed method, encodes the reduction of n numbers as a set of m×m MMA tensor-core operations (for Nvidia’s Volta architecture m=16) and takes advantage from the fact that each MMA operation takes just one GPU cycle. When analyzing the cost under a simplified GPU computing model, the result is that the new algorithm manages to reduce a problem of n numbers in T(n)=5*log_m^2(n) steps with a speedup of S=4/5*log_2(m^2).
https://hgpu.org/?p=18796
إضافة تعليق...
تحتوي المشاركة على مرفق
The “Demystify Deep Learning for Executives” video part 2, “The ROI of Deep Learning,” is released. #deeplearning #video #artificialintelligence Enjoy. https://youtu.be/iJed_4h_XDA
إضافة تعليق...
تحتوي المشاركة على مرفق
TensorFlow Doing HPC
(Steven W. D. Chien, Stefano Markidis, Vyacheslav Olshevsky, Yaroslav Bulatov, Erwin Laure, Jeffrey S. Vetter)
#CUDA #TensorFlow #HPC #DeepLearning #DL #Package
TensorFlow is a popular emerging open-source programming framework supporting the execution of distributed applications on heterogeneous hardware. While TensorFlow has been initially designed for developing Machine Learning (ML) applications, in fact TensorFlow aims at supporting the development of a much broader range of application kinds that are outside the ML domain and can possibly include HPC applications. However, very few experiments have been conducted to evaluate TensorFlow performance when running HPC workloads on supercomputers. This work addresses this lack by designing four traditional HPC benchmark applications: STREAM, matrix-matrix multiply, Conjugate Gradient (CG) solver and Fast Fourier Transform (FFT). We analyze their performance on two supercomputers with accelerators and evaluate the potential of TensorFlow for developing HPC applications. Our tests show that TensorFlow can fully take advantage of high performance networks and accelerators on supercomputers. Running our TensorFlow STREAM benchmark, we obtain over 50% of theoretical communication bandwidth on our testing platform. We find an approximately 2x, 1.7x and 1.8x performance improvement when increasing the number of GPUs from two to four in the matrix-matrix multiply, CG and FFT applications respectively. All our performance results demonstrate that TensorFlow has high potential of emerging also as HPC programming framework for heterogeneous supercomputers.
https://hgpu.org/?p=18795
(Steven W. D. Chien, Stefano Markidis, Vyacheslav Olshevsky, Yaroslav Bulatov, Erwin Laure, Jeffrey S. Vetter)
#CUDA #TensorFlow #HPC #DeepLearning #DL #Package
TensorFlow is a popular emerging open-source programming framework supporting the execution of distributed applications on heterogeneous hardware. While TensorFlow has been initially designed for developing Machine Learning (ML) applications, in fact TensorFlow aims at supporting the development of a much broader range of application kinds that are outside the ML domain and can possibly include HPC applications. However, very few experiments have been conducted to evaluate TensorFlow performance when running HPC workloads on supercomputers. This work addresses this lack by designing four traditional HPC benchmark applications: STREAM, matrix-matrix multiply, Conjugate Gradient (CG) solver and Fast Fourier Transform (FFT). We analyze their performance on two supercomputers with accelerators and evaluate the potential of TensorFlow for developing HPC applications. Our tests show that TensorFlow can fully take advantage of high performance networks and accelerators on supercomputers. Running our TensorFlow STREAM benchmark, we obtain over 50% of theoretical communication bandwidth on our testing platform. We find an approximately 2x, 1.7x and 1.8x performance improvement when increasing the number of GPUs from two to four in the matrix-matrix multiply, CG and FFT applications respectively. All our performance results demonstrate that TensorFlow has high potential of emerging also as HPC programming framework for heterogeneous supercomputers.
https://hgpu.org/?p=18795
إضافة تعليق...
يُرجى الانتظار بينما يتم تحميل المزيد من المشاركات

























































































































































































