Sunday, August 25, 2013

What are Transactions ?


What are Transactions ?
In an RDBMS, when several people access the same data or if a server dies in the middle of an update, there has to be a mechanism to protect the integrity of the data. Such a mechanism is called a Transaction. A transaction groups a set of database actions into a single instantaneous event. This event can either succeed or fail. i.e .either get the job done or fail.
The definition of a transaction can be provided by an Acronym called 'ACID'.
(A)tomicity: If an action consists of multiple steps - it's still considered as one operation.
(C) Consistency: The database exists in a valid and accurate operating state before and after a transaction.
(I) Isolation: Processes within one transaction are independent and cannot interfere with that in others.
(D) Durability: Changes affected by a transaction are permanent.
To enable transactions a mechanism called 'Logging' needs to be introduced. Logging involves a DBMS writing details on the tables, columns and results of a particular transaction, both before and after, onto a log file. This log file is used in the process of recovery. Now to protect a certain database resource (ex. a table) from being used and written onto simulatneously several techniques are used. One of them is 'Locking' another is to put a 'time stamp' onto an action. In the case of Locking, to complete an action, the DBMS would need to acquire locks on all resources needed to complete the action. The locks are released only when the transaction is completed.
Now if there were say a large numbers of tables involved in a particular action, say 50, all 50 tables would be locked till a transaction is completed.
To improve things a bit, there is another technique used called 2 Phase Locking or 2PL. In this method of locking, locks are acquired only when needed but are released only when the transaction is completed.
This is done to make sure that that altered data can be safely restored if the transaction fails for any reason.
This technique can also result in problems such as "deadlocks".
In this case - 2 processes requiring the same resources lock each other up by preventing the other to complete an action. Options here are to abort one, or let the programmer handle it.
MySQL implements transactions by implementing the Berkeley DB libraries into its own code. So it's the source version you'd want here for MySQL installation. Read the MySQL manual on implementing this.

Beyond MySQL

What are Views ?
A view allows you to assign the result of a query to a new private table. This table is given the name used in your VIEW query.
Although MySQL does not support views yet a sample SQL VIEW construct statement would look like:
CREATE VIEW TESTVIEW AS SELECT * FROM names;

What are Triggers ?
A trigger is a pre-programmed notification that performs a set of actions that may be commonly required. Triggers can be programmed to execute certain actions before or after an event occurs. Triggers are very useful as they they increase efficiency and accuracy in performing operations on databases and also are increase productivity by reducing the time for application development. Triggers however do carry a price in terms of processing overhead.

What are Procedures ?
Like triggers, Procedures or 'Stored' Procedures are productivity enhancers. Suppose you needed to perform an action using a programming interface to the database in say PERL and ASP. If a programmed action could be stored at the database level, it's obvious that it has to be written only once and cam be called by any programming language interacting with the database.
Procedures are executed using triggers.

Beyond RDBMS

Distributed Databases (DDB)
A distributed database is a collection of several, logically interrelated database located at multiple locations of a computer network. A distributed database management system permits the management of such a database and makes the operation transparent to the user. Good examples of distributed databases would be those utilized by banks, multinational firms with several office locations where each distributed data system works only with the data that is relevant to it's operations. DDBs have have full functionality of any DBMS. It's also important to know that the distributed databases are considered to be actually one database rather than discrete files and data within distributed databases are logically interrelated.

Object Database Management Systems or ODBMS
When the capabilities of a database are integrated with object programming language capababilities, the resulting product is an ODBMS. Database objects appear as programming objects in an ODBMS. Using an ODBMS offers several advantages. The ones that can be most readily appreciated are:
1. EfficiencyWhen you use an ODBMS, you're using data the way you store it. You will use less code as you're not dependent on an intermediary like SQL or ODBC. When this happens you can create highly complex data structures through your programming language.
2. SpeedWhen data is stored the way you'd like it to be stored (i.e. natively) there is a massive performance increase as no to-and-fro translation is required.

A Quick Tutorial on Database Normalization
Let's start off by taking some data represented in a Table.
Table Name: College Table
StudentNameCourseID1CourseTitle1CourseProfessor1CourseID2CourseTitle2CourseProfessor2StudentAdvisorStudentID
Tia CarreraCS123Perl Regular ExpressionsDon CorleoneCS003Object Oriented Programming 1Daffy DuckFred Flintstone400
John WayneCS456Socket ProgrammingDJ TiestoCS004AlgorithmsHomer SimpsonBarney Rubble401
Lara CroftCS789OpenGLBill ClintonCS001Data StructuresPapa SmurfSeven of Nine402
(text size has been shrunk to aid printability on one page)
The First Normal Form: (Each Column Type is Unique and there are no repeating groups [types] of data)
This essentially means that you indentify data that can exist as a separate table and therefore reduce repetition and will reduce the width of the original table.
We can see that for every student, Course Information is repeated for each course. So if a student has three course, you'll need to add another set of columns for Course Title, Course Professor and CourseID. So Student information and Course Information can be considered to be two broad groups.
Table Name: Student Information
StudentID (Primary Key)
StudentName
AdvisorName

Table Name: Course Information
CourseID (Primary Key)
CourseTitle
CourseDescription
CourseProfessor
It's obvious that we have here a Many to Many relationship between Students and Courses.
Note: In a Many to Many relationship we need something called a relating table which basically contains information exclusively on which relatioships exist between two tables. In a One to Many relationship we use a foreign key.
So in this case we need another little table called: Students and Courses
Table Name: Students and Courses
SnCStudentID
SnCCourseID

The Second Normal Form: (All attributes within the entity should depend solely on the entity's unique identifier)
The AdvisorName under Student Information does not depend on the StudentID. Therefore it can be moved to it's own table.
Table Name: Student Information
StudentID (Primary Key)
StudentName

Table Name: Advisor Information
AdvisorID
AdvisorName

Table Name: Course Information
CourseID (Primary Key)
CourseTitle
CourseDescription
CourseProfessor

Table Name: Students and Courses
SnCStudentID
SnCCourseID
Note: Relating Tables can be created as required.

The Third Normal Form:(no column entry should be dependent on any other entry (value) other than the key for the table)
In simple terms - a table should contain information about only one thing.
In Course Information, we can pull CourseProfessor information out and store it in another table.
Table Name: Student Information
StudentID (Primary Key)
StudentName

Table Name: Advisor Information
AdvisorID
AdvisorName

Table Name: Course Information
CourseID (Primary Key)
CourseTitle
CourseDescription

Table Name: Professor Information
ProfessorID
CourseProfessor

Table Name: Students and Courses
SnCStudentID
SnCCourseID
Note: Relating Tables can be created as required.
Well that's it. One you are done with 3NF the database is considered Normalized.
Now lets consider some cases where normalization would have to avoided for practical purposes.
Suppose we needed to store a students home address along with State and Zip Code information. Would you create a separate table for every zipcode in your country along with one for cities and one for states ? It actually depends on you. I would prefer just using a non-normalized address table and stick everything in there. So exceptions crop up often and it's up to your better judgement.

0 comments:

Post a Comment