Skip to content

Data Comparison of NVIDIA Tesla P4 and T4

15. October 2018

During NVIDIA GTC 2018 in Europe NVIDIA announced the new Turing T4 Graphics card. On Twitter this card got a lot of love as it looked like a good evolution of the (now) mainly suggested Tesla P4 (you remember last year it didn’t even appear on NVIDIA’s slides – now it’s their primary suggestion – thanks for listening NVIDIA). I saw the first details about the card in John Fannelis Presentation on the first day. They looked really promising. Here is the picture I with the shown card details (sorry for the head in front of it – I didn’t expect I need it for a blog post…):

tesla_t4_data_jf

You can find the P4 data in this PDF.

image

Let us compare the data of both cards that we have until now.

P4

T4

GPU

Pascal

Turing

CUDA Cores

2560

2560

Frame Buffer (Memory)

8 GB

16 GB

vGPU Profiles

1 GB, 2 GB, 4 GB, 8GB

1 GB, 2 GB, 4 GB, 8GB, 16 GB

Form Factor

PCIe 3.0 single slot

PCIe 3.0 single slot

Max Power

75 W

70 W

Thermal

Passive

Passive

As you can see both cards are really similar. The just need a single slot. Thus, you can put up to six (or sometimes eight -yes there are a few servers that support eight(!) cards – just check the HCL) cards in one server and have a limited power consumption. The only difference is that the T4 uses the new Turing chip and has doubled Frame Buffer (16 GB). That means you can run 16 VMs each with a 1 GB Frame Buffer on this card. Although I don’t know if the touring chip offers enough performance for that (as I have no testcard until now @NVIDIA) it might be a good option to have 8 VMs with 2 GB Frame Buffer on one card. That would help in many situations where 1 GB Frame Buffer is not enough. In this situation you could only put 4 VMs on one card.

So, at this point I also thought that the T4 is a good evolution of the P4. The only main point I was missing was a price as this could be an argue against the price. But then I attended another session. Here they also showed some details about the T4. There was one detail I noticed quite late and thus I was too late to take a picture. Luckily my friend Tobias Zurstegen made this picture showing the technical information’s:

image

Here are two details shown I didn’t find somewhere else. First there are the Max Users – wouldn’t be to hard to calculate when you know that the smallest profile is a 1B profile (= 1GB Frame Buffer). But next too that there is the number of H.264 1080p 30 Streams. So, let’s add these points to our list.

P4

T4

GPU

Pascal

Turing

CUDA Cores

2560

2560

Frame Buffer (Memory)

8 GB

16 GB

vGPU Profiles

1 GB, 2 GB, 4 GB, 8GB

1 GB, 2 GB, 4 GB, 8GB, 16 GB

Form Factor

PCIe 3.0 single slot

PCIe 3.0 single slot

Max Power

75 W

70 W

Thermal

Passive

Passive

Max Users (VMs)

8

16

H.264 1080p 30 Streams

24

16

What’s that? The number of H.264 1080p 30 Streams is lower on a T4 (16) compared to a P4 which has 24 Streams. The T4 has 8 (!) Streams less than a P4?!? If you keep in mind that e.g. in a HDX 3D Pro environment each monitor of a user with activity requires one stream that means that with 8 users having a dual monitor all available Streams can already be used. If you put more users on the same card it might happen that this leads to a performance issue for the users as there not enough H.264 Streams available. Unfortunately, I haven’t found anything on NVIDIAs website that proofs that this number of H.264 Streams is correct and was not just a typo in the slide.

But if it’s trues I am wondering why that happened? What did NVIDIA change thus the number of streams went down and not up. I would have expected at least 32 Streams (compared to the P4). If it was a card design change that would be contra productive. Let us hope it was just a Typo on the slide.

If someone found some other official document which (not) proofs this number please let me know.

Should the number be correct I hope NVIDIA listens again and changes this before the card is released. I see this as a big bottleneck for many environments. Especially as many don’t know about the number of streams and then wonder why they have a bad performance as the graphics chip itself is not under heavy load and they don’t know that to many required H.264 streams can also lead to a poor performance. Next to that keep in mind that it’s now also possible to use H.265 in some environments – but using that leads possible to less streams as it’s encoding is more resource intensive.

From → Citrix, HDX 3D Pro, Nvidia

Leave a Comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: