Wednesday, October 17, 2007

SQL server down, detectives continue the search for answers

I woke up this morning to find our database server down (which took VirtualCenter with it). This is a production SQL server, and our accounting software depends on it.

I logged in directly to the ESX host using the VirtualCenter Client. The VM was still running, but I couldn't do anything on the console. I saw the screensaver but it wasn't moving, so I reset the VM, and it started up fine.

The server went down at 6:21 and was down for about 45 min until I caught it.

The ESX performance logs showed normal (i.e. little or no) disk or network activity. However, memory usage dropped to zero, and the SMP processors (2 of them) were at ~2.66 Ghz (max) and ~1.7 Ghz for those 45 min (total average was ~65%).

Windows event logs show nothing, except that the shutdown was unexpected.

No other VM was affected. So it was either an error on the ESX process hosting the VM, or the OS itself.

Unique facts about this VM:

  • It was switched from 4 to 2 processors at one point due to SQL licensing (over a month ago). I researched this and assumed that the NT multiprocessor kernal wouldn't be affected unless we went to 1.
  • Hourly (for our purposes 6:00 am), SQL full backups occur. They create about 2.5 gigs worth of data that takes about 2 min. Should have been done long before the server crashed.
  • I have VCplus (http://www.run-virtual.com/?page_id=184) running on our VirtualCenter server. It was using a domain admin login for DB access during testing, so it had rights to break something if it wanted I suppose.
  • The SQL services are running under different logins than default. We were trying to get SCE 2007 to work with our SQL server about a month ago, and they suggested trying this.
SQL Server Integration Services
NT AUTHORITY\NetworkService
SQL Server FullText Search
LocalSystem
SQL Server
LocalSystem
SQL Server Analysis Services
LocalSystem
SQL Server Reporting Services
LocalSystem
SQL Server Browser
LocalSystem
SQL Server Agent
LocalSystem
I'm going to rebuild it this weekend just to be safe, considering the CPU went crazy, and we changed the processor count after the kernel was installed...

I've posted a question on the VMTN forums, so hopefully someone has some insight.

Tuesday, October 09, 2007

Thin clients and virtual desktops

We started testing virtual desktops for our company. It all originally started as a joke, saying we already had ESX server, why not throw all of our desktops on it as well. Then one of the co-founders of the company asked my boss about it, saying he read about it somewhere and it seemed neat. Then we were planning on virtualizing all of our desktops. But alot of users need CD drives. So as of this writing, we are planning on using virtual desktops for any contractors or new users that do not require a CD drive.

So, for the thin client, we are going to be reusing old PC's where possible, or buying an HP t5135 for new hires. I couldn't get my hands on any version of VMware's VDM 2 Beta from their VDI initiative, and I'm not sure if that will include a thin client when it's released towards the end of this year. So for now, we are using 2X's ThinClientServer, along with their ThinClientOS.

We first installed ThinClientServer on one of our Windows VMs. It does a couple things. First, when your thin client boots up, it acts as a DHCP helper to listen for DHCP requests, and returns a TFTP server and boot image filename. It also includes the TFTP server that hosts the ThinClientOS. Then, after the OS boots and the user trys to log in, it makes decisions based on the login as to which machine to redirect the user to.

I had no problem getting the ThinClientOS up and running on our test HP t5135. But getting it to work on our PCs is a different story. I first tried it on my PC, a relativly new HP dc5700, and it didn't recognize my NIC (after it successfully downloaded the boot image). I then tried it on a much older PC that one of our scanners is hooked up to, and it couldn't detect the video card properly (while I was looking at the error message). So I'll spend a lot of time troubleshooting these driver issues if we go live using ThinClientOS.
I'm looking forward to seeing exactly how VMware's VDM 2 is going to fit into all of this when it gets released in December. I know that they are going to do the same things as ThinClientServer (connection broker), but I'm not sure about a PXE boot image (but I would imagine they would have to include this in order to compete).