Sunday, February 9, 2014

Disk types explained: raid sata ide ata




http://technet.microsoft.com/library/Cc966414#_Alignment_and_Storage_Alignment




Appendix A: The System Components

This appendix defines and briefly describes system components that are relevant to this paper.

Bus Bandwidth

There are two factors that govern bus bandwidth:  The bus typeThe number of buses in the server

Bus Types

Typically, there are three types of buses commonly used in the industry: PCI, PCI-X (for eXtended) and PCI-E (for Express).PCI (Peripheral Component Interconnect): A computer bus that transfers data between the motherboard and peripheral devices in a computer. PCI bus is clocked at 33.33 MHz with synchronous 32-bit transfers and peak at 133 MB/sec.PCI-X (Peripheral Component Interconnect – Extended): The protocol for PCI-X has a faster clock rate than PCI, clocking up to 133 MHz with 32- or 64-bit transfers peaking at 1066 MB/sec.PCI-E (Peripheral Component Interconnect – Express): This serial bus form factor uses the same PCI concepts but with different pin layout resulting in smaller slot lengths. It supports one or more serial signaling lane each transmitting up to 250 MB/sec.

Number of Buses on Server

For the total bus bandwidth, the number of slots filled is not necessarily the same as the number of buses in the server. This depends on the make and model of the server box. For more information, consult the specifications for the server.

Disk | RAID | SAN Interfaces

Interfaces

IDE ATA ( Integrated Drive Electronics Advanced Technology Attachment): IDE is a disk drive technology has an integrated controller, and is based on parallel data processing. Data travels asynchronously from the drive to the data bus.SATA (Serial Advanced Technology Attachment):  SATA is a physical storage interface with serial data processing. Data travels synchronously on a clock edge. Transfer rates on SATA start at 150 MBps.The main design advantage over IDE is the fact that SATA II supports command queuing known as Tagged Command Queuing (TCQ). Command queuing is good becHardware vs. Software RAID:There are two ways RAID is implemented: through hardware, and through software. The hardware implementation refers to the traditional method implemented with a collection of disk drives working together for fault tolerance through striping or mirroring of data, as described previously in this paper.  The software version refers to RAID implementation provided by the operating system.Software RAID is usually the cheaper of the two solutions. However, there are a few drawbacks to using software RAID. The most significant drawback is that, because the RAID engine runs in kernel mode, it shares CPU time quanta with other kernel mode components and overlying applications. In contrast, the hardware RAID solution does not face this problem because the RAID firmware executes on a separate dedicated chip. This also allows for asymmetric multiprocessing between the system processor and the RAID controller.

Disk Controller Caching

Caches temporarily held data that has been recently read, or sometimes pre-fetched data that is likely to be accessed in the near future. This is a way to improve the time it takes to read from or write to the disk.There are two types of cache stores:Cache on the diskCache stored on the disk controllersDisk controller on serverDisk controller on the disk system such as a SANIn this section, the focus is on cache stored on disk controllers.Cache SizeAlthough disk performance is commonly attributed to the disk seek time and rotational speed, the amount of cache also plays a role. However, the performance attributed to cache size is only valid as long as the cache is not full. If the cache is full, there is a significant drop in performance. Ideally, the cache size is large enough so that it never fills for the particular I/O workload.Disk controller caches on servers ranges in size up to hundreds of MBs. The disk controller on a SAN disk system can have up to tens of GBs.Battery BackupSome higher-end disk controllers offer battery backup protection for the cache memory. Because cache is volatile, an unexpected power failure might result in loss of data that was written to cache but not yet written to disk. The battery backup unit provides temporary power to the cache store so that when the system recovers, the data in the cache is flushed to disk before the system attempts to access the data. For specific timing, see the disk controller manual.

SAN – Storage Area Network

SAN is a centralized network of storage devices. The SAN technology transfers raw disk blocks directly onto the network. A centralized storage network allows for server power and utilization to focus on the business application and I/O processing to be offloaded. Other advantages of SAN architecture include the following:Consolidation of resources and better scalability with growing business needsCost-effective management and operational efficiency, with possible cost savingsIncreased availability of data resourcesStorage Virtualization allows a network of heterogeneous devices connected together to seem homogeneous to any server connecting to the SAN network.A good advanced feature in SAN networks is remote disk mirroring and backups. Remote disk mirroring is storage extension solution that holds a copy of the business critical data at a geographically remote data center.

NAS – Network Attached Storage

A NAS device is a dedicated server to provide file sharing. The NAS can exist anywhere in the network. The major difference between NAS and SAN is that NAS accesses the data by file name and byte offsets, whereas SAN accesses data by raw data blocks. The consolidation advantages with NAS are very similar to the ones noted previously for SAN. However, some NAS systems cannot guarantee the write ordering and write-through required by for SQL Server 2005. The systems that are Windows Hardware Quality Lab (WHQL) certified meet this requirement. For more information, see the KB article Description of support for network database files in SQL Server.

Disk Drives

Capacity

Capacity refers to the storage size of the disk drive. It is important to note that the marketed disk capacity differs from the true disk capacity. Hard disk manufacturers often use metric prefixes such as giga or kilo. This is mostly from historical reasons when 2^10 (1024) bytes was called a kilobyte because it was a close enough value. This way of labeling remained as disk drive capacities grew to gigabyte and terabyte sizes. Therefore, a user will find the size reported by the OS is less than the one advertised by the manufacturer.

Rotational Speed

The rotational speed is the speed at which the disk spins. The higher the rotational speed, the more data the drive can read/write in a fixed time. At high rotational speeds, the drives produce heat as a byproduct. This can negatively impact the performance of the disk, so good ventilation in a drive is something to keep in mind. Most home computer hard drives now run at 7200 rpm. Some servers have rotational speeds at 15,000 rpm or faster.

Seek Time

Seek time refers to the time it takes for the head of the drive to find the correct place. The seek time is one of the largest factors in performance of the drive. However, the absolute seek time is dependent on the distance of the head's destination from its place of origin at the time of the read/write instruction. Often, the "average" seek time is referred to. A typical seek time for a hard disk is usually in the single-digit milliseconds.

Cache Size

Cache size is sometimes referred to as internal buffer size. The disk cache size is the size of volatile memory integrated on the disk drive which holds any recently requested, written, or pre-fetched data. Disk cache size ranges from 2 MB to 16 MB.

Interface

Of all the factors in buying hard disks, the interface might be the key factor in hard disk performance. While the time taken to access the data is important, the bulk of the time is spent on data transfer instead of moving the heads around. The different types of interfaces include SATA, IDE, SCSI, FC, and iSCSI. For more information, see the Interfaces section earlier in this paper.Top Of Page 

Appendix B: SQL Server 2005 Disk Usage

This appendix lists the major disk usages for SQL Server 2005. This might not be an exhaustive list for some specialized systems.

OS/Base Software

Any server box requires hard disk space to store the operating system, SQL Server, and any other tool binaries and library files. The space required for a full SQL Server installation depends on the SKU, computer type (32-bit vs. 64-bit), add-on server tools, etc.

Tempdb Database

Tempdb files are the data and log files associated with the tempdb database. By default, the temporary data and log files are tempdb.mdf and templog.ldf, respectively. There is only one tempdb database per instance of SQL Server. The tempdb database is used to hold temporary user objects such as tables, stored procedures, variables, and cursors. It also holds work tables, versions of the tables for snapshot isolation, and temporary sorted rowsets when rebuilding indexes with SORT_IN_TEMPDB. The total size of tempdb can have an effect on the performance of your system.  For more information about tempdb, see SQL Server Books Online.

Model Database

The model database is used as a template database on which all other user-created databases are modeled. By default, the model database has 3 MB and 1 MB for its data file size and log file size, respectively, for most editions of SQL Server 2005. The size of the model database might increase if the user adds tables, stored procedures, or functions, or other objects to the model database. It can also be increased manually by using the ALTER DATABASE commands. For more information about the model database, see SQL Server Books Online.

Master Database

The master database holds all the system level information for a SQL Server system. This includes system configurations, accounts information, and user database information. Initial configuration for master database is approximately 6 MB and 1.25MB for the data and log files, respectively. These files grow as the database system becomes more complicated and contains more databases.

User Data Files

User data files store actual data for a user-created database. The data files are created by copying the model database.

User Log Files

User log files store logging information for the specific database to which the log file belongs. By default, the log files at creation by default are copies of the model database.

Backup Files

Backup files are created during backup. The location of the backup files is user-determined at backup time. The size of the backup depends on the size of the database, the type of backup (full, full differential, differential, log, file, filegroup, etc.), and the amount of DML since the last backup (for differential backups). The size of a full backup (the largest type of backup) is at most the size of the database itself, because any space not allocated in the database does not get backed up.

Others (not covered)

Other files or data that consume hard disk space in a SQL Server system might be Full-text Search catalogs, OLAP cubes, and others.Top Of Page

Download

PhysDBStor.doc 353 KBMicrosoft

ause it allows for servicing of out-of-order I/O requests. TCQ is an intelligent mechanism built into the host adapter that reorders requests to minimize the movements of the disk head assembly. The disk can take into account rotation and seek distances and serve the commands in a more efficient order, and then return the data to the operating system in the requested order. Requests are serviced according to their tagged modes:Ordered: I/O commands are executed in the same order as the requested order.Head of Queue: This tagged command gets serviced immediately after the current I/O commandSimple: Allows hard disk to control the ordering for optimized I/O activity.Other advantages include its thin cable design and smaller form factor and length. These all help to allow for better airflow and heat dissipation. Also, the SATA cables can extend a longer distance (one meter) without data corruption compared to the 40 cm. limit of IDE.SCSI (Small Computer System Interface): SCSI is a parallel interface used for attaching peripherals such as hard disks. In general, SCSI provides a faster data transmission rate of up to 320 MB/sec. There is another version of SCSI called Serial Attached SCSI (SAS). SAS combines the benefits of SCSI with SATA's physical advantages listed above.FC DAS (Fiber Channel Direct-Attached Storage): Fiber channel is a physical layer protocol enabling serial duplex interfacing to allow communications between high performance storage systems and performing up to 2 GBps. Fiber channels in DAS topology are directly attached to a server and are not openly accessible to other servers. FC DAS is not commonly used.FC SAN (Fiber Channel Storage Area Network): Dedicated network that allows access between storage devices and servers on that network using the fiber channel technology (described above).iSCSI (Internet Small Computer System Interface): An IP (Internet Protocol)-based storage network protocol used for linking data storage to servers. The iSCSI protocol transmits SCSI packets as IP packets. Because it is using IP, iSCSI is routable and can leverage any existing TCP/IP network. There are two terminologies to take note of in the iSCSI industry:iSCSI Target: refers to the actual physical disks.iSCSI Initiators: refers to the client that performs the I/O request to that particular iSCSI Target.There are hardware and software versions of both iSCSI Initiator and iSCSI Target. The Microsoft iSCSI Software Initiator Package includes both the iSCSI Initiator service and the iSCSI Initiator software driver.

RAID

RAID stands for Redundant Array of Inexpensive (or Independent) Disks. It is a collection of disk drives working together to optimize fault tolerance and performance. There are various RAID levels, but only the RAID levels significant to SQL Server are described here.RAID0 (simple striping): Simplest configuration of disks that stripe the data. RAID0 does not provide any redundancy or fault tolerance. Data striping refers to sequentially writing data in a round-robin style up to a certain stripe size, a multiple of a disk sector (usually 512 bytes). Data striping yields good performance because multiple disks are concurrently servicing the I/O requests. The positive points for RAID0 are the cost, performance, and storage efficiency. The negative impact of no redundancy can outweigh its positive points.RAID1 (simple mirroring): This configuration creates an exact copy or mirror of all the data on two or more disks. This RAID level gives good redundancy and fault tolerance, but poor storage efficiency. To fully take advantage RAID1 redundancy, it is recommended to use independent disk controllers (referred to as duplexing or splitting). Duplexing or splitting removes single-point failures and allows multiple redundant paths.RAID5 (striping with parity): RAID5 uses block-level striping where parity is distributed among the disks. RAID5 is the most popular RAID level used in the industry. This RAID level gives fault tolerance and storage efficiency. However, RAID5 gives a larger negative impact for all write operations, especially sequential writes.RAID10 (stripe of mirrors): RAID10 is essentially many sets of RAID1 or mirrored drives in a RAID0 configuration. This configuration combines the best attributes of striping and mirroring: high performance and good fault tolerance. For these reasons, we recommend using this RAID level. However, the high performance and reliability level is the trade-off for storage capacity.Note that in all levels of RAID configuration, the storage efficiency


In Windows Vista as well as Windows Server 2008, partition alignment is usually performed by default. The default for disks larger than 4 GB is 1 MB; the setting is configurable and is found in the registry at the following location:HKLM\SYSTEM\CurrentControlSet\Services\VDS\Alignment



No comments:

Post a Comment