Pages

Monday, June 27, 2011

Complex SQL queries

In simple SQL queries we usually query one or multiple tables, and possibly with different types of joins.

The notion of a complex query is rather arbitrary, some people would consider any query with a join to be complex. However for this discussion we consider a complex query, one in which we need to group results, perform some aggregations, etc.

Imagine we have a database containing tables for books and authors. We want to find the number of authors for each book. We can write this query with a GroupBy clause and the COUNT aggregate function.

SELECT Book.title,         COUNT(*) AS Authors     FROM  Book JOIN Book_author        ON Book.isbn = Book_author.isbn     GROUP BY Book.title;
Now let's look at another query which is a bit more complex than the previous query.

select users.user_id,         users.email,         count(*) as how_many,        max(classified_ads.posted) as how_recent from users, classified_ads where users.user_id = classified_ads.user_id group by users.user_id, users.email order by how_recent desc, how_many desc;
The query above joins the users table with the classified_ads table. It selects user_id and email from users, while joining the user_id field in users and classified_ads tables. So basically the result is going to contain the user_id and email of all users who are in the classified_ads table (basically those users who have published classified ads). The group by users.user_id, users.email clause specifies that we want to bundle together all rows which have the same values in the users.user_id and users.email column. What this means is that all these rows will be represented as one row. Representing multiple rows as one row would in make cases make sense only if we run some aggregate function on these rows. In the above case, we use two aggregate functions count, and max. The count will compute the number of rows clumped by due to the group by clause. The max aggregate has been specified on the classified_ads.posted field. What this will do is compute the row which has the maximum value in the classified_ads.posted field. The order by clause specifies the order in which the rows will be listed. The above sql specifies that we want to order by how_recent desc, which will display the most recent classified ad posting at the top. For those rows which have the same how_recent value, the rows will be ordered by the how_many computed value. One thing to remember here is that most databases will record timestamps with a subsecond precision. So two dates showing the same data (upto day level precision) may not be the same in reality.
Having:
The where clause works on rows, whereas the having clause works on groups.

No comments:

Post a Comment