Amazon EC2, GetDeviceCount(eGPUType.Cuda) = 0

Sep 4, 2014 at 5:11 PM
Edited Sep 4, 2014 at 5:28 PM
Hi, I am using an Amazon g2.2xlarge EC2 server which contains an NVidia Grid K520 GPU.

I am having problems detecting a GPU in the following scenarios:
  • via RDP, 0 Devices found. This is understandable because MS turns off the GPU in RDP mode.
  • via a Windows Process that starts automatically, 0 Devices found. This is after installing Cudafy.Net.Dll to the GAC. This is my ideal situation but I don't understand why the GPU wont show up.
  • via a Windows Process, AFTER logging in via TeamViewer, still I get 0 Devices found.
I have success when I connect via TeamViewer where I log in and run the program manually from the desktop. I get 1 Device found, and it runs a test kernel no problem.

Why does the GPU not show up when running as a service under the Local System account?

On another note, The GRID K520 GPU which is supposed to be a COMPUTE GPU, does not support TCC, which would have solved all these issues. WHY oh why does it not support TCC? Why is this stuff still bleeding edge?
Sep 4, 2014 at 5:57 PM
I believe you cant access display devices in session0, which all services run in. So you have to trigger an interactive login first, and this creates a session1. Then you have to execute your gpu code in session1.

Ive not solved it yet but this is the general gist given here: https://devtalk.nvidia.com/default/topic/408076/cuda-programming-and-performance/running-cuda-in-a-service-example-of-a-cuda-service-in-vista-server-2008-and-windows-7/
Sep 4, 2014 at 7:43 PM
after a long battle with this, my solution was to scrap the service altogether and just set up auto-login on the admin account, and set up a scheduled task to run the gpu code on login.
http://techsultan.com/autologin-on-windows-server-2012/
Sep 6, 2014 at 11:54 AM
Hi
You're right, windows services are only able to run in session0, which by design has no access to display drivers. AFAIK, only workaround for nvidia is if you have a tesla device with the Tesla Compute Cluster driver installed, together with some other graphic card installed as primary display. Or, as you said, force autologin into an interactive session, but that's not really a good option on a cluster environment.
Sep 7, 2014 at 11:07 PM
well the auto-logon trick is the only solution for an EC2 cluster. Amazon bought GRID gpus for their compute servers yet they are not supported by Nvidia for compute drivers. How hopeless are the Amazon hardware guys that they did not know this?

It is satisfactory, though. I can ramp up a few dozen gpu nodes within minutes, with one line of code.