
The Stanford Computer Vision Lab has added the Nvidia DGX-1 machine to their computer cluster. Currently, it is only accessible within the Stanford network. Please file a help request at http://support.cs.stanford.edu if you have any questions regarding the use of the machine.
Specification
| Hostname | visionlab-dgx1.stanford.edu |
|---|---|
| CPU | 2x Intel E5-2698 v4 2.2 GHz @ 20-core |
| RAM | 512GB |
| GPU | 8x Tesla P100 |
| Networking | 10GbE |
| Storage | 4x 2TB SSD RAID0, NFS-shared storage |
Here is how to get started
- Please request your access to visionlab-dgx1.stanford.edu by filling our support request at https://support.cs.stanford.edu. Please state which sponsoring faculty you are working with.
- SSH into visionlab-dgx1.stanford.edu from campus network or via Stanford VPN service from off campus. (Full, non-split tunnel is required)
Nvidia-Docker
Nvidia suggests using Nvidia-Docker and their provided containers for optimized performance and convenience. The Following table outlines the containers they officially support and available on the DGX as of this writing.
| REPOSITORY | TAG |
|---|---|
| nvidia/cuda | latest |
| nvcr.io/nvidia/digits | 17.04 |
| nvcr.io/nvidia/caffe | 17.04 |
| nvcr.io/nvidia/tensorflow | 17.04 |
| nvcr.io/nvidia/pytorch | 17.04 |
| nvcr.io/nvidia/caffe2 | 17.04 |
| nvcr.io/nvidia/theano | 17.04 |
| nvcr.io/nvidia/mxnet | 17.04 |
| nvcr.io/nvidia/cntk | 17.04 |
| nvcr.io/nvidia/torch | 17.04 |
#Check the current loaded containers, the nvidia containers should already be loaded. Note the TAG column, you'll need to use this when running the docker command
docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
nvidia/cuda latest 569f547756e0 8 days ago 1.671 GB
nvcr.io/nvidia/digits 17.04 3736f3fe071f 4 weeks ago 4.171 GB
nvcr.io/nvidia/caffe 17.04 87c288427f2d 4 weeks ago 2.794 GB
nvcr.io/nvidia/tensorflow 17.04 121558cb5849 6 weeks ago 3.028 GB
nvcr.io/nvidia/pytorch 17.04 2f0834174e65 6 weeks ago 3.793 GB
nvcr.io/nvidia/caffe2 17.04 e5b67a4f6726 6 weeks ago 2.633 GB
nvcr.io/nvidia/theano 17.04 24943feafc9b 6 weeks ago 2.386 GB
nvcr.io/nvidia/mxnet 17.04 24afec0cd359 7 weeks ago 2.338 GB
nvcr.io/nvidia/cntk 17.04 61e61de9fa43 7 weeks ago 5.741 GB
nvcr.io/nvidia/torch 17.04 a337ffb42c8e 7 weeks ago 2.9 GB
nvidia/cuda 7.5 cf43500d0050 5 months ago 1.232 GB
#If you want, you can load your own container
docker load --input /raid/scratch/u/<framework>.tar
#Test nvidia-smi
nvidia-docker run --rm nvidia/cuda nvidia-smi
jimmyw@visionlab-dgx1:~$ nvidia-docker run --rm nvidia/cuda nvidia-smi
Wed May 24 23:59:21 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.66 Driver Version: 375.66 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla P100-SXM2... Off | 0000:06:00.0 Off | 0 |
| N/A 35C P0 31W / 300W | 0MiB / 16276MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla P100-SXM2... Off | 0000:07:00.0 Off | 0 |
| N/A 37C P0 34W / 300W | 0MiB / 16276MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla P100-SXM2... Off | 0000:0A:00.0 Off | 0 |
| N/A 36C P0 35W / 300W | 0MiB / 16276MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla P100-SXM2... Off | 0000:0B:00.0 Off | 0 |
| N/A 37C P0 33W / 300W | 0MiB / 16276MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 4 Tesla P100-SXM2... Off | 0000:85:00.0 Off | 0 |
| N/A 38C P0 32W / 300W | 0MiB / 16276MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 5 Tesla P100-SXM2... Off | 0000:86:00.0 Off | 0 |
| N/A 35C P0 32W / 300W | 0MiB / 16276MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 6 Tesla P100-SXM2... Off | 0000:89:00.0 Off | 0 |
| N/A 37C P0 33W / 300W | 0MiB / 16276MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 7 Tesla P100-SXM2... Off | 0000:8A:00.0 Off | 0 |
| N/A 38C P0 32W / 300W | 0MiB / 16276MiB | 0% Default |
+-------------------------------+----------------------+----------------------++-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
#Launch a framework container in interactive mode. Note the TAG, it is always REPOSITORY:TAG when tag isn't "latest".
nvidia-docker run --rm -ti nvcr.io/nvidia/torch:17.04
jimmyw@visionlab-dgx1:~$ nvidia-docker run --rm -ti nvcr.io/nvidia/torch:17.04
______ __ | Torch7
/_ __/__ ________/ / | Scientific computing for Lua.
/ / / _ \/ __/ __/ _ \ |
/_/ \___/_/ \__/_//_/ | https://github.com/torch
| http://torch.chNVIDIA Release 17.04 (build 17724)
Container image Copyright (c) 2017, NVIDIA CORPORATION. All rights reserved.
Copyright (c) 2016, Soumith Chintala, Ronan Collobert, Koray Kavukcuoglu, Clement Farabet
All rights reserved.Various files include modifications (c) NVIDIA CORPORATION. All rights reserved.
NVIDIA modifications are covered by the license terms that apply to the underlying project or file.root@0a24ff58cde5:/workspace# th
______ __ | Torch7
/_ __/__ ________/ / | Scientific computing for Lua.
/ / / _ \/ __/ __/ _ \ | Type ? for help
/_/ \___/_/ \__/_//_/ | https://github.com/torch
| http://torch.chth>
#If there's a need for the container to access the host server network, just add --net=host
nvidia-docker run --rm --net=host -ti nvcr.io/nvidia/tensorflow:17.04
jimmyw@visionlab-dgx1:~$ nvidia-docker run --rm --net=host -ti nvcr.io/nvidia/tensorflow:17.04
================
== TensorFlow ==
================NVIDIA Release 17.04 (build 21630)
Container image Copyright (c) 2017, NVIDIA CORPORATION. All rights reserved.
Copyright 2017 The TensorFlow Authors. All rights reserved.Various files include modifications (c) NVIDIA CORPORATION. All rights reserved.
NVIDIA modifications are covered by the license terms that apply to the underlying project or file.NOTE: The SHMEM allocation limit is set to the default of 64MB. This may be
insufficient for TensorFlow. NVIDIA recommends the use of the following flags:
nvidia-docker run --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 ...root@visionlab-dgx1:/workspace# apt-get update
Get:1 http://archive.ubuntu.com/ubuntu xenial InRelease [247 kB]
Get:2 http://ppa.launchpad.net/openjdk-r/ppa/ubuntu xenial InRelease [17.5 kB]
Get:3 http://ppa.launchpad.net/openjdk-r/ppa/ubuntu xenial/main amd64 Packages [7096 B]
Get:4 http://archive.ubuntu.com/ubuntu xenial-updates InRelease [102 kB]
Get:5 http://archive.ubuntu.com/ubuntu xenial-security InRelease [102 kB]
Get:6 http://archive.ubuntu.com/ubuntu xenial/main Sources [1103 kB]
Get:7 http://archive.ubuntu.com/ubuntu xenial/restricted Sources [5179 B]
Get:8 http://archive.ubuntu.com/ubuntu xenial/universe Sources [9802 kB]
Get:9 http://archive.ubuntu.com/ubuntu xenial/main amd64 Packages [1558 kB]
Get:10 http://archive.ubuntu.com/ubuntu xenial/restricted amd64 Packages [14.1 kB]
Get:11 http://archive.ubuntu.com/ubuntu xenial/universe amd64 Packages [9827 kB]
Get:12 http://archive.ubuntu.com/ubuntu xenial-updates/main Sources [315 kB]
Get:13 http://archive.ubuntu.com/ubuntu xenial-updates/restricted Sources [3202 B]
Get:14 http://archive.ubuntu.com/ubuntu xenial-updates/universe Sources [193 kB]
Get:15 http://archive.ubuntu.com/ubuntu xenial-updates/main amd64 Packages [693 kB]
Get:16 http://archive.ubuntu.com/ubuntu xenial-updates/restricted amd64 Packages [13.2 kB]
Get:17 http://archive.ubuntu.com/ubuntu xenial-updates/universe amd64 Packages [593 kB]
Get:18 http://archive.ubuntu.com/ubuntu xenial-security/main Sources [86.8 kB]
Get:19 http://archive.ubuntu.com/ubuntu xenial-security/restricted Sources [2779 B]
Get:20 http://archive.ubuntu.com/ubuntu xenial-security/universe Sources [31.7 kB]
Get:21 http://archive.ubuntu.com/ubuntu xenial-security/main amd64 Packages [334 kB]
Get:22 http://archive.ubuntu.com/ubuntu xenial-security/restricted amd64 Packages [12.8 kB]
Get:23 http://archive.ubuntu.com/ubuntu xenial-security/universe amd64 Packages [142 kB]
Fetched 25.2 MB in 3s (6808 kB/s)
Reading package lists... Done

