365DBA Blog

Database Stuff

SQL Server Monitoring Scripts - 4 - Missing Transaction Log Backups

We don't want any databases in full recovery going without their log backups so here is a script to let us know when we have one. You will need to adjust the @Threshold variable to your specific needs (which is true for all of the scripts in this series) and then you are set. This script takes into account whether the database is mirrored, it's status, and if it's a system database.

Set NoCount On
Set ANSI_Warnings Off
Select '---------------Log Backup Check---------------'
Declare @MissingLogBackups Table(ID Int Identity, DatabaseName VarChar(100))
Declare @Threshold Int
Select @Threshold = 4
Insert Into @MissingLogBackups (DatabaseName)
Select d.[name] From sys.databases d 
Left Join msdb.dbo.backupset b On d.[name] = b.database_name
And b.[type] = 'L' 
Inner Join sys.database_mirroring m On d.[database_id] = m.[database_id]
Where DatabasePropertyEx(d.[name], 'Recovery') <> 'SIMPLE'
and DatabasePropertyEx(d.[name], 'Status') = 'ONLINE'
And d.[name] Not In ('master', 'msdb', 'tempdb', 'model')
And ((m.mirroring_guid Is Null)
Or (m.mirroring_guid Is Not Null And mirroring_role = 1))
Group By d.[name]
Having IsNull(Max(b.backup_finish_date), '1/1/2000') < DateAdd(hh, (-1 * @Threshold), GetDate())
If (Select Count(*) From @MissingLogBackups) > 0
Declare @Count Int, @Counter Int
Select @Count = Max(ID) From @MissingLogBackups
Select @Counter = Min(ID) From @MissingLogBackups
While @Counter <= @Count
Select 'Server: ' + Cast(ServerProperty('MachineName') As sysname) + IsNull('\' + Cast(ServerProperty('InstanceName') As sysname), '') + 
'. Missing log backups for DB: ' + DatabaseName From @MissingLogBackups Where ID = @Counter
Set @Counter = @Counter + 1

SQL Server Monitoring Scripts - 3 - Login Failures

In this script, I gather information on failed login attempts so that if there are any users trying to access data which they are not authorized for or if there is a default database in a connection string that might not exist, etc. we can address the issue. You can change the @LastCheck variable to whatever interval you run the checks.

Set NoCount On
Set ANSI_Warnings Off
Select '---------------Login Failure Check---------------'
Declare @LastCheck DateTime
Select @LastCheck = DateAdd(dd, -2, GetDate())
If (@LastCheck Is Null)
Set @LastCheck = '1-1-2000'
If Object_ID('tempdb.dbo.#ErrLog1') Is Not Null
Drop Table #ErrLog1;
Create Table #ErrLog1 (LogDate DateTime, ProcessInfo VarChar(50), LogText VarChar(2000))
Insert Into #ErrLog1 (LogDate, ProcessInfo, LogText) 
Exec xp_readerrorlog 0, 1, N'Login', N'failed'
Delete From #ErrLog1 Where LogDate <= @LastCheck
Select Cast(LogDate As VarChar(25)), Left(LogText, 500) From #ErrLog1
If Object_ID('tempdb.dbo.#ErrLog1') Is Not Null
Drop Table #ErrLog1;

SQL Server Monitoring Scripts - 2 - Low Disk Space

The second monitoring script in this series is a simple script for obtaining disk space information and alerting if it is below a given threshold. Once again, the formatting is specific to the purpose of using it with the remote PowerShell script which is the reason for the formatting.

Set NoCount On
Set ANSI_Warnings Off
Select '---------------Disk Space Check---------------'
Declare @Threshold Int
Set @Threshold = 30720 --30 GB
Declare @AllDrives Table (drive Char(1), [MB free] Int)
Declare @LowDrives Table (ID Int Identity, Drive Char(1), MBFree Int)
Insert Into @AllDrives
Exec master.dbo.xp_fixeddrives
Insert Into @LowDrives (Drive, MBFree)
Select drive, [MB free] From @AllDrives Where [MB free] <= @Threshold
If (Select Count(*) From @LowDrives) > 0
Declare @Count Int, @Counter Int
Select @Count = Max(ID) From @LowDrives
Select @Counter = Min(ID) From @LowDrives
While @Counter <= @Count
Select 'Server: ' + Cast(ServerProperty('MachineName') As sysname) + IsNull('\' + Cast(ServerProperty('InstanceName') As sysname), '') + 
' low disk space: ' + Drive + ' = ' + Cast(MBFree As VarChar(10)) + ' is below threshold of ' + Cast(@Threshold As VarChar(10))
From @LowDrives Where ID = @Counter
Set @Counter = @Counter + 1

SQL Server Monitoring Scripts - 1 - High Log Usage

I was recently in a position where I needed to monitor several SQL Server instances on disparate machines and I didn't have a central server instance that I could make use of for data collection, job scheduling, etc. and I was not able to create any custom scheduled tasks in Windows (the scope of "administration" in this case is very narrow) so I ported my standard monitoring scripts to a format that could be used from a remote location and then created a PowerShell script to run all of my scripts against all of the specified servers and pipe the output to a single file per server. This method allows you to easily add servers and scripts to your process and modify input and output locations as necessary. I will first be providing the scripts and then the PowerShell script to wrap it all together.

The first script is for detecting any databases which are currently utilizing a high percentage of the log so that you can catch possible issues such as replication, log backup issues, long-running transactions, etc. Again, these particular scripts are for use with the PowerShell script which is the reason for the options specified. For this script, a minimum size is specified so that we don't get alerted to a 20 MB transaction log that has a usage which exceeds our threshold of 30%. Also, in these scripts, I use the ServerProperty function to guarantee a value is returned in case the global server name variable is returning a null value or previous machine name.

Set NoCount On
Set ANSI_Warnings Off
Select '---------------Log Usage Check---------------'
Declare @PercentThreshold Int, @SizeThreshold Int, @MinSize Int, @SQL VarChar(100)
Set @SQL = 'DBCC SQLPerf(LogSpace)'
Set @PercentThreshold = 30
Set @SizeThreshold = 10240 --10 GB
Set @MinSize = 512
Declare @LogSpace Table (DatabaseName VarChar(100), LogSize Real, LogSpaceUsed Real, [Status] Int)
Declare @LargeLogs Table (ID Int Identity, DatabaseName VarChar(100), LogSpaceUsed Real, LogSize Real)
Insert Into @LogSpace
Insert Into @LargeLogs (DatabaseName, LogSpaceUsed, LogSize)
Select DatabaseName, LogSpaceUsed, LogSize From @LogSpace Where (LogSpaceUsed >= @PercentThreshold) And LogSize >= @MinSize
If (Select Count(*) From @LargeLogs) > 0
Declare @Count Int, @Counter Int
Select @Count = Max(ID) From @LargeLogs
Select @Counter = Min(ID) From @LargeLogs
While @Counter <= @Count
Select 'Log threshold violation for DB: ' + DatabaseName + ' on server: ' + 
Cast(ServerProperty('MachineName') As sysname) + IsNull('\' + Cast(ServerProperty('InstanceName') As sysname), '') + 
'. Percent used: ' + Cast(LogSpaceUsed As VarChar(15)) + ', size: ' + Cast(LogSize As VarChar(15)) 
From @LargeLogs Where ID = @Counter
Set @Counter = @Counter + 1

Process for Changing IP Address on Cassandra Nodes in Windows Environment

I needed to perform an IP address change for Cassandra nodes in a Windows environment and couldn't find anything documented so I created the below procedure for performing the task. This process was for a Cassandra v1.2 cluster. 

1. Cleanly shut-down the node (only 1 node should be brought down at a time)
a. Perform drain
nodetool -h <TargetHost> drain
b. Disable Gossip
nodetool -h <TargetHost> disablegossip
c. Disable Thrift
nodetool -h <TargetHost> disablethrift
d. Disable binary (native protocol)
nodetool -h <TargetHost> disablebinary
e. Stop Cassandra service
2. Set Cassandra service to disabled
3. Configure purge gossip state information:
Edit cassandra-env.sh file adding: JVM_OPTS="$JVM_OPTS -Dcassandra.load_ring_state=false"
4. Change yaml file setting start_rpc = false and start_native_transport = false (this is only for the node involved in the IP change)
5. Change hosts files on all nodes to the new IP address for the node involved in the change
6. Perform IP change
7. Reboot machine
8. Set Cassandra service to automatic
9. Start service
10. Verify node state, ring, load, activity, etc
11. Run repair on current column families that may have missed writes to re-IP'd node
12. Confirm that data is in sync
13. Edit cassandra-env.sh file removing the line to purge gossip state: JVM_OPTS="$JVM_OPTS -Dcassandra.load_ring_state=false"
14. Change yaml file for the node involved in the IP change setting start_rpc = true and start_native_transport = true
15. Enable binary
nodetool -h <TargetHost> enablebinary
16. Enable thrift
nodetool -h <TargetHost> enablethrift

Database Corruption/ Torn Page Repair

Sometimes bad things happen to good databases and corruption can be one of those things. This post goes through a few different methods of recovering from the corruption with minimal or no data loss when all other standard methods have been exhausted. If you find that the corruption exists in a non-clustered index then you can fix the issue by simply scripting out the create index syntax from Management Studio, dropping the index, and using the script to recreate it (you cannot just run the create script while specifying the Drop_Existing option, it doesn’t work). If the non-clustered index index happens to be a primary key then be sure to add a unique constraint on the table with the same definition to help maintain integrity, although this will not help any foreign keys referencing the table so it will take much more work to take care of those and is beyond the scope of this post

Basic Information

A torn page error is a severity 24 error that will display similar to the following:

SQL Server Alert System: 'Severity 24 - Fatal Error: Hardware Error' occurred on <ServerName>

DESCRIPTION: SQL Server detected a logical consistency-based I/O error: torn page (expected signature: 0xaaaaaaaa; actual signature: 0x55555556). It occurred during a read of page (1:30857339) in database ID 8 at offset 0x00003adb0f6000 in file 'F:\MSSQL\Data\DBfile_data.mdf'. Additional messages in the SQL Server error log or system event log may provide more detail. This is a severe error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.



The best way to recover from data corruption is to restore the database from a known good backup and then restore any subsequent differential and transaction log backups. However, this method is for situations in which that is not an option for one reason or another. Also, this process will take the entire database off line so be sure to involve all stakeholders and take appropriate measures to accommodate. This process will remove data and should only be performed when all other courses of action have been considered.

For Page-Level Restore Instead of Repair

Note: For an article on how to do a repair with a page-level restore instead of DBCC CheckTable ('<table_name>', REPAIR_ALLOW_DATA_LOSS)) see the link below on MSSQLTips to the article: "Using Page Level Restore as a Disaster Recovery Procedure in SQL Server 2005". I have actually used this method on a production machine and it worked perfectly.


Troubleshooting and repair

1) ****The very first thing you should do is start a restore on another server of the last known good database for later use and/ or possibly restoring transaction logs if available****

2) The error message contains the information that you will need to start the repair process, specifically, the database ID and the page number. You will first want to get the name of the database via the database ID and filename. Next, use the numbers in parenthesis which specify the file number and page number to determine the ObjectID, previous page number and next page number with the following command (of course, replacing <Database Name> with the correct name):

DBCC TraceOn (3604)
DBCC Page (<Database_Name>, 1, 30857339)
DBCC TraceOff (3604)

Note that this is the corrupt page so you will receive an error, but you will also receive command output in which you will see a row that looks like this:

Metadata: ObjectId = 1108303108 m_prevPage = (1:30857338) m_nextPage = (1:30857340)

3) Make sure you are using the correct database and determine the table name from the ObjectID and note the previous and next pages for later use:

Select Object_Name(1108303108)

4) Get information on the primary key, clustered index (if different), computed columns, identity columns, guid columns, and triggers:

Exec sp_help <table_name>
Exec sp_helptrigger <table_name>

5) Next, determine the last value of the previous page and the first value of the next page in order to determine the contents of the bad page:

DBCC Page(<Database_Name>,1,30857338,3) With TableResults

This will display the contents of the previous page in tabular format and in clustered index order so note the last value in the clustered index/ primary key, then do the same for the next page, but noting the first value instead:

DBCC Page(<Database_Name>,1,30857340,3) With TableResults

6) Once available you should run the DBCC Page command against the restored good database to confirm your results then write a query for the values:

DBCC Page(<Database_Name>, 1, 30857339, 3) With TableResults

7) Once you are confident that you know the values in the corrupt page and have a way to transfer them back in to the table (e.g. restored good database or another data center), you need to repair the table:

***At this point it would be a good idea to make a full backup of the database should something not go as expected with the repair operation***

Set the database in restricted mode to help mitigate the potential for another connection intercepting yours once in single user mode:

Alter Database <Database_Name> Set restricted_user With RollBack Immediate
Use <Database_Name>

Set the database to single user in order to perform the repair (make sure you do this in a single batch as below):

Alter Database <Database_Name> Set Single_User With RollBack Immediate
Use <Database_Name>

Get the row count before the repair:

Exec sp_spaceused <table_name>

Then run the repair statement:
**Please note that this will result in loss of data and should only be used if other alternatives (restoring from backup, etc.) are not feasible due to age of backup etc. !!!**

DBCC CheckTable ('<table_name>', REPAIR_ALLOW_DATA_LOSS) With All_ErrorMsgs

Once complete, get the row count after the repair to validate your findings and query for re-insertion:

Exec sp_spaceused <table_name>

Then set the database back to multi user:

Alter Database <Database_Name> Set Multi_User With RollBack Immediate
Use <Database_Name>

8) Re-insert the rows, disabling any triggers and accounting for identity, computed, and guid columns.