
- Physical and Logical database Design A logical database design is a road map of all of the data structures in a system (or organization), with the relationships among objects clearly defined. A physical database design is the implementation of a database on a specific database engine.
- Database Design Issues There are three primary design issues to consider when developing a physical database design. 1. Data Integrity 2. Ease of use 3. Performance
- Logical Design Logical database design is the process of defining end users data needs and grouping elements into logical units (for example, tables). This design should be independent of the final physical implementation. The actual physical layout of the tables, access paths, and indexes is provided at physical design time. One stage of logical design is the reduction (or, if possible, elimination) of redundant data. This usually results in many tables, each with small, compact rows. Sometimes, reducing redundancy in this way ends up increasing the number of joins required to pull related data elements back together for output.
- Normalization Conditions To understand how to change the database in the physical design to suit your performance needs, you first need to understand relational database normalization, which is the end result of logical database design.
- Normal Forms The relational database model includes a definition of six levels of normalization, or six normal forms: 1. First Normal Form 2. Second Normal Form 3. Third Normal Form 4. Boyce-Codd Normal Form 5. Fourth Normal Form 6. Fifth Normal Form
- Benefits of Normalization A normalized database reduces redundancy and therefore storage requirements in the database. Data integrity also is easier to maintain in an environment where you have to look in only one place for the data, where each entity is represented only once. In addition, when the database is normalized, the rows tend to get narrower. This fact enables more rows per data page within SQL Server, which can speed up table scanning and queries that return more than one row, improving query performance for single tables.
- Drawbacks of Normalization The drawbacks to a normalized database design are mostly related to performance: 1. Typically, in a normalized database, more joins are required to pull together information from multiple tables. 2. Joins require additional I/O to process, and are therefore more expensive from a performance standpoint than single-table lookups. 3. Additionally, a normalized database often incurs additional CPU processing. CPU resources are required to perform join logic and to maintain data and referential integrity.
- Normalization and the Database Design Picture Normalization provides a good place to start a physical database design because data is logically grouped and consistent. Third normal form should always be applied to all database design, at least initially.
- Denormalizatizing the Database Denormalizing is the process of taking a normalized logical design and intentionally disobeying the rules to increase performance. To denormalize effectively, you must understand both the bias of the data to be into the database and how the data will be accessed.
- Advantages of Denormalization Denormalization can help minimize joins and foreign keys and help resolve aggregates. By storing values that would otherwise need to be retrieved (repeatedly), you may be able to reduce the number of indexes and even tables required to process queries.
- Guidelines for Denormalization Following are some basic guidelines to help you determine whether it's time to denormalize your database design: 1. Balance the frequency of use of the data items in question, the cost of additional storage to duplicate the data, and the acquisition time of the join. 2. Understand how much data is involved in the typical query; the amount of data affects the amount of redundancy and additional storage requirements. 3. Remember that redundant data is a performance benefit at query time, but is a performance liability at update time because each copy of the data needs to be kept up to date. You typically write triggers to maintain the integrity of the duplicated data.
- Basic Denormalization Techniques Be aware that denormalization is a technique for tuning a database for a specific application, and as such tends to be a last resort when tuning performance. Adding indexes to a table is a tuning method that's transparent to your end users and applications (as long as all are writing queries properly); modifying the database schema isn't transparent. If you change the database design, application code that accesses that database needs to be modified as well. A variety of denormalization methods can be used to modify the physical database design in an effort to improve performance: 1. Adding redundant data by duplicating columns or defining summary data 2. Changing your column definitions by combining columns or shortening existing columns 3. Redefining your tables by combining tables, duplicating entire tables or portions of a table, and partitioning tables into multiple tables
- Redundant Data Redundant data helps performance by reducing joins or computations, which in turn reduces I/O and CPU processing, respectively. It tends to be either an exact copy of the data or summary data. Duplicate data should be of exactly the same name, type, and domain, and have the same integrity checking as the original data. Triggers should be used to maintain integrity between the original and the duplicate data. Note that the more volatile the data, the more often you incur overhead to maintain the duplicate data. You need to balance frequency of use and the cost of the acquisition join against the frequency of modification and the cost of the extra update.
- Derived or Summary Columns Derived columns contain data duplicated for the purpose of avoiding repeated calculations. This is called summary data. In any environment, application designers have recognized that certain summary values need to be maintained and available constantly to speed their retrieval. If you store frequently accessed sums, averages, and running tallies, reports and queries can run substantially faster, but data modification is slower somewhat.
- Redefining Tables Creating duplicate tables tends not to be the only solution, and frequently isn't the best solution. It is, however, a viable method of reducing I/O and search speed for specific types of queries. There are two basic methods of duplicating tables: subsets and partitioning. Subsets are duplicates of the original data-by row, column, or both.
- Data Partitioning Data partitioning takes on two basic variations: 1. Vertically, by separating infrequently used columns 2. Horizontally, by separating infrequently used rows
- Database Devices and Performance A database can span many devices. Although some device allocations can be reserved for use by the transaction log, any other devices can be reserved for use by tables and indexes. Create database students On device-1 = 100, On device_2 = 100, On device_3 = 100 * The create database statement here allocates 100 MB on each device for use by the database students. What is at stake here is the ability to improve server performance by spreading out data so that it can be read more quickly, in parallel, from different disk drives using different controllers.
- RAID Technology RAID, or Redundant Array of Inexpensive Disks, is a relatively new technology, having debuted near the end of the 1980s. The basic concept is simple: Instead of using a single, fixed disk to store data, use multiple disks simultaneously and in parallel. When you do this, I/O operations are much faster because the read and write operations happen in parallel across multiple read-write heads on multiple fixed disks, rather than across a single read-write head on a single disk. Simple parallelization of I/O in this way has demonstrated immediate-and drastic-performance improvements. SQL Server has traditionally used segments to achieve this, by breaking up a table across multiple disks.
Last update March 7, 1998.