Warning: Illegal string offset 'lang' in /home/kvantum/blog.kvantum.ca/wp-content/plugins/keyword-statistics/keyword-statistics.php on line 353

Warning: Illegal string offset 'keywords' in /home/kvantum/blog.kvantum.ca/wp-content/plugins/keyword-statistics/keyword-statistics.php on line 354

Warning: Illegal string offset 'description' in /home/kvantum/blog.kvantum.ca/wp-content/plugins/keyword-statistics/keyword-statistics.php on line 356

Archive for the ‘IT’ Category

SQL 2008 on Win2k8 Cluster doesn’t start on failover

**Update – another cause of this problem at the bottom of this article**

Here’s the problem:

Windows 2008 Failover cluster, running SQL 2008, both Enterprise, x64. I have two separate clusters that were having exactly the same problem.

Sometimes, intermittently, when moving SQL Server app group to another node, the SQL Server(Instance) resource wouldn’t come up. I figured issue not to be related to permissions, for multiple reasons. Sometimes the problem would occur only on one node, then only on the other, and sometimes there would be no issues. In all cases, I was able to start SQL Server service from services.msc, but in that case windows authentication fails and SA is the only account I’d be able to log in with. Furthermore, since I have multiple instances on the same cluster, not all instances were failing, but all instances failed at one point in time or another.

There really is no consistency in what fails when and where, so it was quite a bitch to troubleshoot. However, I did get some [not so] helpful event log entries.

In System logs:

Event 1069 from FailoverClustering: Cluster resource ‘SQL Server (MOSS)’ in clustered service or application ‘SQL Server (MOSS)’ failed.

Event 1205 from FailoverClustering: The Cluster service failed to bring clustered service or application ‘SQL Server (MOSS)’ completely online or offline. One or more resources may be in a failed state. This may impact the availability of the clustered service or application.

App logs:

All errors had the same EventID, 19019 from Failover, but different error text:

[sqsrvres] ODBC sqldriverconnect failed

[sqsrvres] checkODBCConnectError: sqlstate = 28000; native error = 4818; message = [Microsoft][SQL Server Native Client 10.0][SQL Server]Login failed for user ‘NT AUTHORITY\ANONYMOUS LOGON’.

[sqsrvres] OnlineThread: Error connecting to SQL Server.

[sqsrvres] CheckServiceAlive: Service is dead

Most times it was also accompanied by an “Information” log entry, 18456 from Logon: Login failed for user ‘NT AUTHORITY\ANONYMOUS LOGON’. Reason: Token-based server access validation failed with an infrastructure error. Check for previous errors. [CLIENT: 192.168.16.76]

I spent numerous hours reading documentation, forums, and technet articles. Thankfully, we have Microsoft support subscription, and after an hour or two of talking to different agents, I finally got to talk to a tech. The guy seemed to know quite a bit, but by the sounds of it, he’s never touched SQL 2008. It’s got quite a few differences from 2005, so I was a bit worried. He pulled up an article from their internal knowledge base, which I requested to see but was denied, which suggested making a simple registry change on both nodes.

In HKLM\SYSTEM\CurrentControlSet\Control\Lsa\ create REG-DWORD entry called DisableLoopbackCheck and give it value of 1

He couldn’t really tell me what this registry entry does, so I did some hunting on my own. This site has a pretty good explanation, even though it doesn’t directly relate to my problem.

It’s been two weeks now since I did it on our Dev cluster, and the problem hasn’t re-occurred since. Today I’m doing the change the production cluster and see if the problem goes away there too.

Update: It’s been a month now since i fixed our prod clusters, and still no sign of the issue

**Update 2** January 2011

Ran into this issue again, googled the error code and found my own article. This time the problem was slightly different, as the instance wouldn’t start up even with failover to another node, but event log errors were all the same. SQL service could be started manually, but not by the cluster.

Turned out, one of the developers, and they wouldn’t say who as they’re denying they did it, deleted few crucial logins from SQL Security. I started the service manually and added the following logins:

NT SERVICE\ClusSvc

NT SERVICE\MSSQL$SQLSERVER (Where SQLSERVER is the name of the instance)

Problem solved :)

**Update 3**

DisableLoopbackCheck can be avoided by not building cluster nodes from the same image. Essentially, just build both nodes by installing Windows from scratch

XBMC killed by kernel update on Acer Revo

So I recently installed XBMC on my new media pc. I used XBMC live cd to install it to the hard drive, and after setting everything up and making sure everything is working fine, I decided to run the updates. That’s when the fun started.

First, XBMC wouldn’t start. Startx still worked, but launching XBMC produced an error: Fatal: glXCreateContext failed

Some reading suggested adding some ModulePath lines into Section “Files” of xorg.conf, but that didn’t fix the problem. So I figured, since the kernel was updated, maybe the nvidia driver patched the kernel, and the new kernel got rid of the patch. So, downloaded the nvidia ion driver, v190.53, and installed it. Reboot – now X won’t start at all. Awesome

But, solution was easy. Looking through Xorg log I found that X couldn’t find /lib/modules/2.6.31-16-generic/updates/dkms/nvidia.ko . I was able to search for the file, and a simple symlink fixed the problem. So…

cd /lib/modules/2.6.31-16-generic/updates/dkms/

ln -s /lib/modules/2.6.31-16-generic/kernel/drivers/video/nvidia.ko nvidia.ko

Fixed!!!

Edit: 2.6.31-16 should obviously be replaced with the current version of the kernel

Furthermore, looks like this happens every time the kernel is updated, which means nvidia driver reinstall to re-patch the kernel

Citrix Web Interface 4.6 install on 2k3

This thing has really been a bitch. First, Access Management Console wasn’t actually modifying the configuration of the web interface, so I had to manually edit the \Inetpub\wwwroot\Citrix\AccessPlatform\conf\WebInterface.conf file.

Secondly, even before i got to that point, I found that ASP.net was causing some issues. Mainly, I got a 503 error when trying to access the site. Here’s the solution that worked for me:

Of course, make sure you have .net 2 installed. If you have, or at any point had v1 and/or v3 installed, you will probably have to do this:

In cmd,

cd %windir%\Microsoft.NET\Framework\v2.0.50727\  (version might be slightly different)

De-register all versions of .net from IIS

aspnet_regiis -ua

Register this version(v2) with iis:

aspnet_regiis -r

then, iisreset

That’s it.

Now, for the third issue. Everything seemed to work fine, configuration is picked up, the farm is being contacted, but none of the applications would run. First, I got “An error has occurred while connecting to the requested resource” error. Then after changing PooledSockets from On to Off, the error message disappeared, but the apps still wouldn’t open. After some troubleshooting, I found the solution to be, as usual, pretty simple, and left me wondering how come Citrix installer couldn’t do it on its own. I compared IIS config to our existing server, and found that I needed to make 1 entry:

In IIS config, go to Properties of   Web Sites > Default Web Site (or whatever you’re using) > Citrix > Access Platform. In Virtual Directory tab, click on Configuration, and find .ica extension. For me, it wasn’t there. I had to add it with C:\WINDOWS\Microsoft.NET\Framework\v2.0.50727\aspnet_isapi.dll as executable,and GET,HEAD,POST,DEBUG under “Limit to”.

And, we’re done!

An error has occurred while connecting to the requested resource

Return top