Novatech DL-DT2

This 4U system offers increased scalability and a no-compromise approach to performance. NVLink offers up to 5-10x the performance of a traditional PCI-E bus, increasing communication bandwidth between GPU to GPU and GPU to CPU. The results are the elimination of transfer bottlenecks and increased GPU compute performance via NVLink which increases demand for RAM, as such inference machines typically benefit in performance from more memory. Server grade ECC memory is used to reduce memory to CPU latency as well as increased reliability from redundant PSU's. The system is scalable up to a maximum of 8 GPU's (based on V100 offering up to 125TerraFLOPS per unit) compute performance allowing for real-time inference deployment over large fast networks. This system incorporates 2x 10GbE ports allowing for improved performance for multiple nodes, and other resources on the network, in particular, storage which can often be a limiting factor. The additional 4U not only allows for more GPU's but also more local storage. A further expansion is also possible which can be based to incorporate raid controllers, HBA's, or very high-speed NICs (Melanox, InfiniBand). From customer feedback, Novatech would recommend special consideration in regards to the storage network, to complete the ideal ecosystem (IBM, Pure Logic).

Novatech Deep Learning DL-DT2 Workstation
Available within 10 working days



4x NVIDIA Tesla V100 SXM2 60 TFLOPS FP32 - Upgradeable to 8 GPUs


2x E5-2698v4 2.2Ghz 20 Cores


OS: Samsung PM863A 480 NVME
Data: 4x Samsung PM863A 1.92TB NVME
Fully configurable


512GB 2400Mhz Quad Channel (8x64GB)ECC REG

GPU Specification

GPU's Installed 4x NVIDIA Tesla V100 SXM2 NVLINK2
Maximum Number of GPUs 8x
CUDA Cores Per GPU 5120
Peak Half Precision FP16 Performance Per GPU 125 TFLOPS
Peak Single Precision FP32 Performance Per GPU 15.7 TFLOPS
Peak Double Precision FP64 Performance Per GPU 7.8 TFLOPS
GPU Memory Per GPU 16 GB GDDR5
Memory Interface Per GPU 4096-bit
Memory Bandwidth Per GPU 900 GB/s
System Interface SXM2
Maximum Power Consumption Per GPU 300 W

CPU (Two installed in default configuration)

Description Intel® Xeon® E5-2698V4 Processor
# of Cores 20
# of Threads 40
Processor Base Frequency 2.20 GHz
Max Turbo Frequency 3.60 GHz
Cache 50 MB SmartCache
TDP 135 W


Description 512GB (8x64GB) 2400MHz DDR4 ECC Registered
Maximum Capacity 2TB


Drive 1 1x Samsung PM863A 480GB 2.5" SSD 6Gb/s
Drive 2 2x Seagate Seagate Enterprise Performance 2TB HDD 2.5" 7200RPM


Description 4U Rackmountable
Colour Black
Dimensions 447(W) x 178(H) x 805(D)mm
2.5" Hotswap Drive Bays 16x Hotswap (x3 Occuipid in base configuration)
5.25" Drive Bays x0
System Cooling Configuration/th> 8x 92mm Cooling Fans

Power Supply

Description 2200W Redundant Power Supplies with PMBus 80 plus Titanium


CPU Intel® Xeon® processor E5-2600 v4 / v3 family (up to 145W TDP)
Dual Socket R3 (LGA 2011)
Chipset Intel® C612
Memory 24 DIMM slots (8 Occupied in base configuration)
Up to 2.0TB RDIMM
Expansion Slots 4 PCI-E 3.0 x16 (low-profile) slots
2 PCI-E 3.0 x8 slots
Storage Controller
Intel C612
6.0 Gb/s
RAID 0/1/10/5
LAN Controller

Intel X540
(2) 10GbE ports / (1) Dedicated IPMI LAN Port
I/O Ports USB
(2) USB3.0 ports (at rear)
(1) 1 Serial header (integral)
(1) D-Sub 15-pin port (at rear)
(2) 10GbE ports, (1) GbE dedicated for IPMI

Operating System

Description Ubuntu 16.04.3 LTS


From recognising speech to training virtual personal assistants and teaching autonomous cars to drive, data scientists are taking on increasingly complex challenges with AI. Solving these kinds of problems requires training deep learning models that are exponentially growing in complexity, in a practical amount of time.

With 640 Tensor Cores, Tesla V100 is the world’s first GPU to break the 100 teraflops (TFLOPS) barrier of deep learning performance.

The GPUs are plugged into the motherboard via a SXM2 connection which offers between 5 and 10 times the speed of traditional PCI-E 3.0, decreasing the latency and increasing the bandwidth from GPU to GPU as well as GPU to CPU.

NVLink versus PCI-E

Unleash ultra-fast communication between the GPU and CPU with NVIDIA® NVLink, a high-bandwidth, energy-efficient interconnect that allows data sharing at rates 5 to 10 times faster than the traditional PCIe Gen3 interconnect, resulting in dramatic speed-ups in application performance that create a new breed of high-density, flexible servers for accelerated computing.

Ready to go

All Novatech Deep Learning systems can come with Ubuntu 16.04 server LTS operating system, and the following additional platforms are available: CUDA, DIGITS, Caffe, Caffe2, CNTK, Pytorch, Tensorflow, Theano, and Torch.

If you require a Framework not listed, simply speak to our team and make them aware of your need.

Custom Engineering

We are ISO 9001:2008 certified and can manage your design, build, and configuration of compute, network, and storage solutions, specific to your needs and applications.

We have invested heavily into our in-house production facilities to ensure that all of our customer’s appropriate compliance, documentation and regulation needs are met

Request a price

All of our systems are built to order to meet our customers needs, and as such pricing varies depending on requirements.

Contact our dedicated Deep Learning team today for a tailored quotation.

Thank you, a member of the team will be in contact as soon as possible.

Sorry, there has been an error. Please contact [email protected] or call us on 023 9232 2500