SQL Server security best practice

Security! This is the word comes in mind of every concerned person when it come...

Change the Collation Settings in MS SQL Server

This post will show you how to change the collation settings in MS SQL Server for specific database...

Resolve collation conflict

In this post I will show you how you can resolve collation conflict error...

Book: SQL Server 2008 High Availability

In this book I have tried to cover every single piece of information that might requires for installing and configuring SQL Server HA option like Clustering, Replication, Log Shipping and Database Mirroring...

Why to recompile Stored Procedure

Generally, we create views and stored procedures (proc here after) ...


Common mistakes in SQL Server - Part-1

Since last 10 years I’ve worked on many projects where in the significant amount of work comes for Performance and Tuning, the first approach is to identify the culprit and quick fix it and the next course of action would be to fix it permanently. In most of the cases what I’ve found is the size of the database grows beyond the expectation (estimation!!), sometimes 300%. Sometimes it has a problem with the data type or index etc. There are many things we should have consider when we design a database and when we develop an application which uses MS SQL Server as backend.

There are some points I think we have to be careful when we design a database or an application go live, this will save our lots of effort in future!!

So, what are the Common Mistakes done in SQL Server?

Well, as I said they are so common that sometimes we just overlooked them and we came to know the significance of these pointers when we got trapped. Based on my experience I’ve pen down 10 points below; this is the part 1 of the series so keep visit this place.

I hope this would have answer some common questions like:

  • What data type to be used?
  • Difference between varchar and nvarchar!! and varchar vs nvarchar

The first thing which I’ve noticed is length of the data type to be used. One have to be very careful here, one should not use fixed length of data type when one is not sure about the length of the data which is being inserted either from website or from an application; if one fails to choose the correct length it will unnecessarily occupy the valuable disk resource. For example when data which is to be inserted is “Address” and we are not sure of the length we should use variable length data type such as varchar and when we are sure of the length of the data type we should use fixed length data type such as char for “Gender”.

There are cases when I’ve seen where Unicode data are not stored but people have used nvarchar data type for their websites and applications . This will give you a pain when database size grows over a period of time and one is trying to figure out what are the areas we’ve to consider!! nvarchar or nchar should be used when application or website requires to store data in Multilingual format. Both, nvarchar and nchar occupies double the space of stored data the reason is, ncahr or nvarchar uses UTF-16 (16 bit per character ) where as char or varchar uses UTF-8 (8 bit per character) .

Let’s see an example, create a table with 2 column (FirstName and LastName) which uses varchar and nvarchar data type respectively to store data. Now, insert record and then use function len() to count number of characters inserted and datalength() to know how much space it has occupied. Download the script

Also, we should be very careful about the trimming the white space this should be taken care either at front end or at back end.

Here, what I am advising is to use nvarchar or nchar whenever you required to add multilingual support and try to limit it to where it is required the most, say a particular column.

As I said earlier this is the part 1 of the series so keep visiting this place!!
Data Types: It requires careful attention when we design a table and choose type of data we are going to store here and what Data Type we are going to have for particular column of a table! And, that is why choosing right data type is most crucial part of database design, because, if one failed to choose the right data type there could be several pitfall he/she would encounter as the database size grows.


SQL Server Cluster Best Practices

What is the best practices for SQL Server Cluster? What are the steps to be followed when we install SP on SQL Server Cluster? How to quick review SQL Server Cluster? Where I can find SQL Server Cluster checklists!!!  These are the FAQ on SQL Server Cluster, and, here are some links you can refer for the same.

and a list of articles on SQL Server Cluster on my fav http://www.sql-server-performance.com/   for your ready reference http://www.sql-server-performance.com/articles/clustering/index.aspx  
and http://technet.microsoft.com/en-us/library/cc917693.aspx


Download SQL Server 2008 R2 November CTP

Microsoft has releases the new CTP of the MS SQL Server 2008 R2, here is the download page to Download November CTP of the MS SQL Server 2008 R2 http://www.microsoft.com/sqlserver/2008/en/us/R2Downloads.aspx

Find the information on the System Requirement here >http://msdn.microsoft.com/en-us/library/ms143506(SQL.105).aspx

More information about the SQL Server 2008 November CTP >http://www.microsoft.com/sqlserver/2008/en/us/R2.aspx


SQL Server High Availability Options Preview

As it is named, all the SQL Servers and databases should be highly available (HA) to cater the business requirement, and, the business application should be up and running all the time. To achieve this Microsoft has introduced SQL Server Cluster which is one of the highly recommended and used solutions for SQL Server high availability which is server specific that means that all the databases on the SQL Server Cluster are available in case of failover. Database Mirroring (DM) and Log Shipping (LS) along with the Replication (Repl.) are the alternate solution for the high availability but all this options are database specific that means that you can set it up for specific database(s).

Here are the brief about each (I'll post detailed article on each in near future):

Replication: As it is named, replicates the data/database to one or more than one location. Replication was first introduced in SQL Server 6 with minor support and was enhanced in SQL Server 7.0; with launch of SQL Server 2000 it has added some more features to it. And, with SQL Server 2005 Microsoft has introduced Peer-to-Peer replication which is capable of doing load balancing.

The Application:  Based on the business requirement i.e. for reporting purpose, data from more than one location needs to be merged and or if wish to have transaction level replication of data.
Component of the Replication:
Publisher:  Is a server which published the data
Distributor: Is a server which is responsible to distribute data to the subscriber(s)
Subscribers: are servers which has the subscription, the distributor sends the data which is being published by the publisher here.
Subscription Type: Subscription has two method, pull and push. With Pull subscription, subscriber connects to the publisher periodically and pulls the data where as with Push subscription publisher connects with the subscribers and push the changes over subscriber.

 There are 4 type of Replication available: 
1.       Snapshot:  Snapshot replication is the very basic and simplest type of replication available. As name implies, it took snapshot of the published data, connect to the subscriber and then applied the snapshot on subscriber.  This is most suitable when data is required for reporting purpose and are not updated on the subscriber end, and high latency is acceptable.

Application:  Where higher latency is allowed and data is used for the reporting purpose only.

2.       Transactional:  In this mode of replication, it is replicating data transaction level by continuous monitoring.  To achieve this Log Reader Agent is being created when Transactional Replication is configured, it monitors every database which is configured for Transactional Replication and then copies the transactions which are marked for replication from T-Log to distribution database. Distribution database then send (either pull/push subscription) it to the subscriber.
       Application:  Where higher latency is not allowed and data is used for the reporting
       purpose, and   sometimes subscriber is allowed to updates.
3.       Merge:  Merge replication uses Triggers to track the changes in subscriber to publisher.  And, as it named it merges the records, irrespective of who updates/inserts records.  This is the most complex topology of replication.  As it is a merge replication wherein updates/modifications are allowed from more than one subscriber at a time, it has in-built facility to resolved conflict.  There are some pre-requisites which you need to have before Merge Replication can be configured, here is my article on Merge Replication Tips here
a.       Multiple Subscribers might update the same data at various times and propagate those changes to the Publisher and to other Subscribers.
b.      It might have a chance of occurring conflict, and if they do, there should be the mechanism to resolve it.
c.       One time modification is done at subscriber end and the same needs to update at publisher and other subscriber.

4.       Peer-to-Peer:  This is the best topology of Replication introduced up till. This has been introduced in with launch of SQL Server 2005. In, Peer-to-Peer replication the participants have the complete schema and all the data modifications (DML operations) are made at all nodes.  Now, you would be thinking, hey what is the catch here, what is new then merge replication? Well, the simplest answer is Peer-to-peer Replication is designed for Load Balancing by spreading out write activity across all available nodes and then based on the sync latency it synchronizes database, thus read and write operations become faster.  And, additionally if any of the server is not available it will continue to perform as it is with another node and write operation for this node is routed to another node.
Application: Best suited for online shopping application

Database Mirroring: Database mirror is the best and cost effective solution for automatic failover. Database mirroring is introduced with launch of SQL Server 2005. A database mirroring replicate the change to secondary server at transaction level and provides two modes: High Performance (Asynchronous) and High Safety (Synchronous). The difference between two is, with High Safety mode it will not send and write any data to the secondary node until it is committed on the primary node whereas with High Performance mode it will ship the transaction and write the transaction on to secondary server no matter it is committed or roll back on the primary server.

Log Shipping:   This is the cost effective solution, as it is named, it ships T-Log backup to the secondary node(s) on regular interval and then restore job performs T-Log restoration at secondary server to keep it updated with the primary server. The disadvantage is, it don’t have automatic failover feature and hence manual intervention for role change would take 2~3 minutes time (I guess this is allowable downtime).

SQL Server Cluster:  This is the best HA feature in SQL Server, which provides automatic failover for Entire Server, in comparison to other options which provides supports to particular database(s). But, this solution is costlier than other 3 options because it requires identical hardware.