Novatech DL-DT3

This 4U system offers increased scalability and a no-compromise approach to performance. NVLink offers up to 5-10x the performance of a traditional PCI-E bus, increasing communication bandwidth between GPU to GPU and GPU to CPU, resulting in the elimination of transfer bottlenecks. Increased GPU computes performance via NVLink increases demand for RAM, as such inference machines typically benefit in performance from more memory. Server grade ECC memory is used to reduce memory to CPU latency as well as increased reliability from redundant PSU's. The system is scalable up to a maximum of 8 GPU's (based on V100 offering up to 125TerraFLOPS per unit) compute performance, allowing for real-time inference deployment over large, fast networks. This system incorporates 2x 10GbE ports, allowing for improved performance for multiple nodes, and other resources on the network, in particular, storage which can often be a limiting factor. The additional 4U not only allows for more GPU's but also more local storage. A further expansion is also possible which can be based to incorporate raid controllers, HBA's, or very high-speed NICs (Melanox, InfiniBand). From customer feedback, Novatech would recommend special consideration to the storage network in order to complete the ideal ecosystem (IBM, Pure Logic).

Novatech Deep Learning DL-DT3 Workstation
Available within 10 working days

Specification

GPU

8x NVIDIA Tesla V100 SXM2 120 TFLOPS FP32

CPU

2x E5-2698v4 2.2Ghz 20 Cores

STORAGE

OS: Samsung PM863A 480 NVME
Data: 4x Samsung PM863A 1.92TB NVME
Fully configurable

Memory

512GB 2400Mhz Quad Channel (8x64GB)ECC REG

GPU Specification

GPU's Installed 8x NVIDIA Tesla V100 SXM2 NVLINK2
Maximum Number of GPUs 8x
CUDA Cores Per GPU 5120
Peak Half Precision FP16 Performance Per GPU 125 TFLOPS
Peak Single Precision FP32 Performance Per GPU 15.7 TFLOPS
Peak Double Precision FP64 Performance Per GPU 7.8 TFLOPS
GPU Memory Per GPU 16 GB GDDR5
Memory Interface Per GPU 4096-bit
Memory Bandwidth Per GPU 900 GB/s
System Interface SXM2
Maximum Power Consumption Per GPU 300 W

CPU (Two installed in default configuration)

Description Intel® Xeon® E5-2698V4 Processor
# of Cores 20
# of Threads 40
Processor Base Frequency 2.20 GHz
Max Turbo Frequency 3.60 GHz
Cache 50 MB SmartCache
TDP 135 W

Memory

Description 512GB (8x64GB) 2400MHz DDR4 ECC Registered
Maximum Capacity 2TB

Storage

Drive 1 1x Samsung PM863A 480GB 2.5" SSD 6Gb/s
Drive 2 2x Seagate Seagate Enterprise Performance 2TB HDD 2.5" 7200RPM

Chassis

Description 4U Rackmountable
Colour Black
Dimensions 447(W) x 178(H) x 805(D)mm
2.5" Hotswap Drive Bays 16x Hotswap (x3 Occuipid in base configuration)
5.25" Drive Bays x0
System Cooling Configuration 8x 92mm Cooling Fans

Power Supply

Description 2200W Redundant Power Supplies with PMBus 80 plus Titanium

Motherboard

CPU Intel® Xeon® processor E5-2600 v4 / v3 family (up to 145W TDP)
Dual Socket R3 (LGA 2011)
Chipset Intel® C612
Memory 24 DIMM slots (8 Occupied in base configuration)
Up to 2.0TB RDIMM
Expansion Slots 4 PCI-E 3.0 x16 (low-profile) slots
2 PCI-E 3.0 x8 slots
Storage Controller
Intel C612
Speed
6.0 Gb/s
Raid
RAID 0/1/10/5
LAN Controller

Intel X540
(2) 10GbE ports / (1) Dedicated IPMI LAN Port
I/O Ports USB
(2) USB3.0 ports (at rear)
COM
(1) 1 Serial header (internal)
VGA
(1) D-Sub 15-pin port (at rear)
RJ-45
(2) 10GbE ports, (1) GbE dedicated for IPMI

Additional Network Interface Card

Description Mellanox ConnectX-5 100Gb/s Dual Port
Performance up to 100Gb/s connectivity per port
Physical Ports 2x QSFP28

Hardware Raid Controller

Description 1 3108 8-port PCI-E SAS-3 controller
Ports 8-port (8 internal) 12Gb/s per port
Supported Raid Levels 0, 1, 5, 6, 10, 50, 60
On-card cache 2GB 1866MHz DDR
Physical Ports 2x SFF-8643

Operating System

Description Ubuntu 16.04.3 LTS

NVIDIA V100 SMX2

From recognising speech to training virtual personal assistants and teaching autonomous cars to drive, data scientists are taking on increasingly complex challenges with AI. Solving these kinds of problems requires training deep learning models that are exponentially growing in complexity, in a practical amount of time.

With 640 Tensor Cores, Tesla V100 is the world’s first GPU to break the 100 teraflops (TFLOPS) barrier of deep learning performance.

The GPUs are plugged into the motherboard via a SXM2 connection which offers between 5 and 10 times the speed of traditional PCI-E 3.0, decreasing the latency and increasing the bandwidth from GPU to GPU as well as GPU to CPU.

NVLink versus PCI-E

Unleash ultra-fast communication between the GPU and CPU with NVIDIA® NVLink, a high-bandwidth, energy-efficient interconnect that allows data sharing at rates 5 to 10 times faster than the traditional PCIe Gen3 interconnect, resulting in dramatic speed-ups in application performance that create a new breed of high-density, flexible servers for accelerated computing.

Ready to go

All Novatech Deep Learning systems can come with Ubuntu 16.04 server LTS operating system, and the following additional platforms are available: CUDA, DIGITS, Caffe, Caffe2, CNTK, Pytorch, Tensorflow, Theano, and Torch.

If you require a Framework not listed, simply speak to our team and make them aware of your need.

Custom Engineering

We are ISO 9001:2008 certified and can manage your design, build, and configuration of compute, network, and storage solutions, specific to your needs and applications.

We have invested heavily into our in-house production facilities to ensure that all of our customer’s appropriate compliance, documentation and regulation needs are met

Request a price

All of our systems are built to order to meet our customers needs, and as such pricing varies depending on requirements.

Contact our dedicated Deep Learning team today for a tailored quotation.

Thank you, a member of the team will be in contact as soon as possible.

Sorry, there has been an error. Please contact [email protected] or call us on 023 9232 2500