A text file is the simplest database in the world. If you want to store your friend's contact numbers, all you need is a simple file - contacts.txt, where you can store a number per line, and use the space character as a delimiter.
Joe Smith 4014
Jack Brown 7864
July Dustin 8768
This is fine, until you decide to create a multiuser application which will allow your entire family to read and write phone numbers. Because this is a multi-user application, it is possible that 2 or more people may write numbers to the file at the same time. Because we do not have concurrency control in files, we might get into a scenario where if two writes happen simultaneously, one write may override the other write, or they both might even get garbled.
It is also possible that your power gets tripped while your process is writing data, and only half the data gets written. Now you have a database which could be corrupt in a variety of ways.
If you choose a Relational Database Management System, it will give you
ACID,
Indexes (also
here is a nice introduction to indexes), and the ability to use
SQL to manage (retrieve and insert) the data. These properties will be immensely useful as your software and it's requirements grow.
Databases keep their information in tables, where each table can be thought of as a spreadsheet, which has rows and columns. The columns describe one aspect of the data being described, and each row describes one object of data. We create these tables with SQL, and very specifically the DDL (data definition language) part of SQL.
Once we have defined the database structure, we insert and retrieve data from it. This is also done using SQL, specifically the DML (data manipulation language) part of it.
The following SQL statement is the simplest join:
select * from mailing_list, phone_number;
This returns the Cartesian product of the rows in both the tables. In most practical cases, we will restrict the query with a where clause.
select * from mailing_list, phone_number where mailing_list.email == phone_number.email;
The above query will only return those rows together if the email fields match.
Things to understand:
- What are the ACID properties of databases and what are their advantages?
- What is concurrency and how is it handled?
- Why is it a huge benefit to have SQL as a declarative language?
- What is the advantage of using Indexes?
- What is a table?
- What is a primary key, and why do we need one?
- What relationships can exist between tables, and what do they mean?