Databases

Posted on Sep 26, 2023

Introductory Concepts

A database (DB) is defined as a collection of related data; A database management system (DBMS) is the software that manages and controls access to the DB.

A database application is a program which interacts with the database at some point in its execution; Database system is a term used for a collection of application programs that interact with the DB along with the DBMS.

File-Based systems

Before the database system there was the file-based system which is now obsolete.

File-based system

A collection of application programs that perform services for the end-users. Each program defines and manages its own data.

The database system emerged due to problems inherent in file-based systems; Understanding these problems is critical and may prevent us from repeating them in database systems.

  1. Separation and isolation of data

    • When data is isolated in separate files, it is harder to access data that should be available.
  2. Duplication of data

    • File-based systems encouraged and needed the uncontrolled duplication of data.
    • Duplication costs time and money to enter the data more than once.
    • It takes up additional storage space to have duplicated data.
    • More importantly, duplication leads to loss of data integrity i.e. data is no longer consistent.
  3. Data dependence

    • The physical structure and storage of the data files and records are defined in the application code; Meaning that changing an already existing structure difficult.
  4. Incompatible file formats

    • The structure of files is embedded in the application programs, making them dependent on the application programming language.
  5. Fixed queries/proliferation of application programs

    • File-based systems are very dependent upon the application developer, who has to write any queries that are required; There was no facility for asking unplanned or ad hoc queries about the data.

The Database Approach

These limitations of the file-based approach can be attributed to two reasons:

  1. The definition of data is embedded into the application programs; It should be stored separately and independently.

  2. There is no control over the access and manipulation of data beyond that dictated by the application programs.

A new, more efficient approach was required. Then the Database and the DBMS emerged.

Database

A shared collection of logically related data and its description, designed to meet the information needs of an organization.

The database is no longer owned by one department but is a shared corporate resource. It is a single repository of data that can be used simultaneously by many departments and users.

A database is also defined as a self-describing collection of integrated records because it contains the description of the data, known as system catalog (or metadata); This nature of the database is what provides program-data independence.

Another very important aspect in the definition of a database is the concept of logically related. When analyzing the information needs of an organization, we need to identify entities, attributes and relationships; Attributes are the properties of entities and a relationship is an association between entities.

DBMS

A software system that enables users to define, create, maintain, and control access to the database.

A Database Management System provides the following facilities:

  1. Allow users to define the database, through a Data Definition Language (DDL); Specifying data types and structures and the restrictions on the data to be stored.

  2. Allow users to insert, update, delete and retrieve data through a Data Manipulation Language (DML); The primary language for performing DML operations is Structured Query Language (SQL) which provides specific commands like INSERT, UPDATE, DELETE and SELECT which perform the corresponding DML actions on the data stored in a database.

  3. Provide controlled access to the database; For example a security system which prevents unauthorized access to the DB, an integrity system to maintain the consistency of stored data, a concurrency control system, a recovery control system and a user-accessible catalog, which contain descriptions of the data in the database.

Application Programs

A computer program that interacts with the database by issuing an appropriate request (typically an SQL statement) to the DBMS.

Users may interact with the database through application programs that are used to create and maintain the database and to generate information.

Database Design

Database Design

The process of creating a design that will support the enterprise’s mission statement and mission objectives for the required database system.

There are three models of database design, from top to down:

  1. Conceptual Model

    • Represents the domain model of the data used in the organization, independent of all physical considerations.
    • Defines entity relationships.
  2. Logical Model

    • Utilizes the Relational Model (common in SQL databases).
    • Specifies entity associations.
    • Bridge between the high-level conceptual model and the physical implementation of the database.
  3. Physical Model

    • Describes the implementation of the database on secondary storage, including relations, file organizations and indexes to efficiently access data.

Normalization

When designing a database for an enterprise, the main goal is to create an accurate representation of the data; Relationships between the data and restrictions on the data.

Normalization is a database design technique that focuses on organizing data by examining the relationships (functional dependencies) between attributes; It involves a series of steps called normal forms to ensure the data is structured efficiently while minimizing redundancy and maintaining data integrity.

Normalization

A technique for producing a set of relations with desirable properties, given the data requirements of an enterprise.

Functional Dependencies

A functional dependency is a very important concept associated with normalization which describes the relationship between attributes.

A --> B

This representation of a functional dependency means: B is functionally dependent on A (or A functionally determines B). In this case, A is called the determinant.

Functional Dependency

Describes the relationship between attributes in a relation. If A and B are attributes of a relation, B is functionally dependent on A, if each value of A is associated with exactly one value of B. (A and B may each consist of one o more attributes.)

Additionally, if the determinants have the minimal number of attributes to maintain the functional dependency with the attribute(s) on the righthand side, it is called a full functional dependency.

A partial dependency means that there is some attribute that can be removed from A and yet the dependency still holds.

Now a transitive dependency is another condition where A, B and C are attributes of a relation such that if A –> B and B –> C, then C is transitively dependent on A via B (if A is not functionally dependent on B or C).

Functional Dependency Examples

Dependency Type Description
Functional Dependency In a relation “Employees” with attributes (EmployeeID, FirstName, LastName, Department), “Department” is functionally dependent on “EmployeeID” if each EmployeeID corresponds to exactly one Department. In this case, EmployeeID is the determinant, and Department is the dependent attribute.
Full Functional Dependency In the “Employees” relation, if both “FirstName” and “LastName” depend on “EmployeeID,” it’s a full functional dependency. For instance, if EmployeeID 101 corresponds to FirstName “John” and LastName “Smith,” and no other EmployeeID has “John” as the FirstName, then FirstName and LastName are fully functionally dependent on EmployeeID.
Partial Dependency Imagine that in the “Employees” relation, “Department” depends on “EmployeeID,” but “PhoneNumber” depends only on “Department.” In this case, “PhoneNumber” partially depends on the primary key “EmployeeID” because it indirectly depends on it through “Department.”
Transitive Dependency Suppose you have a “Courses” relation with attributes (CourseID, Professor, Department). If “Professor” depends on “Department,” and “Department” depends on “CourseID,” then “Professor” is transitively dependent on “CourseID” via “Department.”

First Normal Form (1NF)

  1. Contains only atomic (indivisible) values
  2. Each row must have a unique identifier (primary-key)
  3. The order of data storage is irrelevant

1NF ensures that each column contains indivisible values, and it requires a unique identifier (primary-key) for each row to distinguish them from one another.

Second Normal Form (2NF)

  1. Satisfies 1NF
  2. Every non-primary-key attribute is fully functionally dependent on the entire primary-key

2NF eliminates partial dependencies, ensuring that each non-primary-key attribute depends on the entire composite primary-key, if applicable.

Third Normal Form (3NF)

  1. Satisfies 2NF
  2. No non-primary-key attribute is transitively dependent on the primary-key

3NF eliminates transitive dependencies, ensuring that each non-primary-key attribute depends directly on the primary key. Furthermore, it is worth noting that we do not remove candidate keys.

Forth and Fifth Normal Forms (4NF and 5NF)

While the concepts of Fourth Normal Form (4NF) and Fifth Normal Form (5NF) exist, their practical application is quite limited, and they are rarely used in the industry. Typically, database designs are considered to be adequately normalized when they achieve the Third Normal Form (3NF).


Normalization Example

After comprehending these abstract concepts, it is essential to grasp how normalization translates into the conceptual and logical models.

In the conceptual model, we focus on defining the essential entities and their attributes, as well as the relationships between these entities; The goal is to create a high-level representation of the organization data requirements.

In contrast, in the logical model, we delve deeper into the specifics of how data will be organized and stored within the database; Creating detailed representations of tables, columns, primary-keys, foreign-keys and relationships between tables.

Normalization contributes to these models by guiding us on how to best structure entities, attributes, and relationships; Eliminating anomalies of data, such as redundancy and ensuring a more organized data structure.

Now, let us go ahead and look at some examples.

Initial Database structure

Student table:

StudentID{PK} StudentName Course Instructor
101 Alice Math Dr. Williams
102 Bob Physics Dr. Louis
103 John Math Dr. Williams

NOTE: “PK” means primary-key and “FK” is foreign-key.

  • We have a “Students” table with attributes: StudentID, StudentName, Course and Instructor
  • The table does not satisfy 1NF because the “Course” and “Instructor” columns are not atomic (indivisible) values.

1NF

In 1NF we ensure that each column contains only atomic values, we create separate tables for “Course” and “Instructor”. Now, the “Student” table contains a foreign-key reference to these tables.

Course table:

CourseID{PK} CourseName CourseCredits
1 Math 6
2 Physics 4

Instructor table:

InstructorID{PK} IntructorName Acronym
1 Dr. Williams drw
2 Dr. Louis drl

Student table (after 1NF):

StudentID{PK} StudentName CourseID{FK} InstructorID{FK}
101 Alice 1 1
102 Bob 2 2
103 John 1 1

2NF

In 2NF, we ensure that non-primary-key attributes are fully functionally dependent on the primary-key. To help demonstrate 2NF, I will introduce a new relationship: “CourseRegistration” to handle the registration of students for specific courses.

NOTE: In cases where you have a many-to-many association between tables, it is often necessary to create a new entity, known as a relationship entity in order to properly manage this association. This intermediate entity helps resolve many-to-many relationships in a relational database.

CourseRegistration table:

StudentID{FK} CourseID{FK} RegistrationDate
101 1 2023-01-13
102 2 2023-04-20
103 1 2022-9-19

3NF

In 3NF, we ensure that no non-primary-key attribute is transitively dependent on the primary key. To demonstrate this, let’s introduce a new concept: “Department” to handle the department of each instructor.

Department table:

DepartmentID{PK} DepartmentName
1 Mathematics
2 Physics

Instructor table (after 3NF):

InstructorID{PK} InstructorName Acronym DepartmentID{FK}
1 Dr. Williams drw 1
2 Dr. Louis drl 2

This arrangement satisfies 3NF as there are no transitive dependencies; each non-primary-key attribute (InstructorName, Acronym) depends directly on the primary key (InstructorID), and the department information is now separate.

With this 3NF structure, our database is well-organized, and data integrity is maintained.

Conclusion

We have explored the evolution of information management, from outdated file-based systems to databases. The introduction of DBMS has revolutionized how we create, maintain and control access to data.

Furthermore, we have also explored Database Design, covering Conceptual, Logical and Physical Models. However, the true star of the show was Normalization – a vital technique that ensures data efficiency and integrity by resolving functional dependencies and adhering to various normal forms.

References

Database Systems: A Practical Approach to Design, Implementation, and Management