Introduction to SQL

Overview

These notes cover only the basics of SQL programming. They do not cover the whole language. They do not cover database design.

There is an international standard for SQL, the most recent being SQL:2003. A brief history (taken from Wikipedia):

Although there's an official standard, it's likely no one implements it 100%; all vendors have tons of extensions and tend to omit quite a bit of the standard they don't like. This situation might change as time goes on.

Tables

SQL is a programming language for relational databases. The most important, if not the only, data structure, is the table. Here is how you make tables in SQL:

create table Employee (
    id integer not null primary key,
    name varchar(40) not null,
    departmentName varchar(20),
    floor integer,
    managerId integer,
    salary decimal);

create table Department (
    name varchar(20) not null primary key,
    managerId integer,
    foreign key (managerId) references Employee(id));

alter table Employee
    add foreign key (departmentName) references Department(name);

create table Project (
    id integer not null primary key,
    name varchar(30),
    budget decimal);

create table Assignment (
    employeeId integer not null,
    projectId integer not null,
    role varchar(35),
    primary key (employeeId, projectId),
    foreign key (employeeId) references Employee(id),
    foreign key (projectId) references Project(id));

create table Report (
    id integer not null primary key,
    title varchar(128),
    departmentName varchar(20),
    foreign key (departmentName) references Department(name));

you can visualize the tables as two-dimensional data structures, like

Project
idnamebudget
 


  

Here PROJECT is the name of the table; id, name, and budget are called the attributes of the table. Each row in the table represents an instance of some particular project.

Inserting into Tables

You can put data tables with an "insert" command:

insert into Project
values (10, 'Longhorn', 250000.00);

Sometimes a particular attribute may be unknown or unnecessary for a particular row:

insert into Department
values ('Planning', null);

insert into Employee
values (36, 'Milner', 'Planning', 2, 16, 43911.26);

insert into Project
values (42299, 'Surface Analysis', null);

insert into Employee (name, floor, id, salary)
values ('Coquand', 10, 33, 92188.00);

Null values often make things pretty ugly. A thorough treatement of nulls is beyond the scope of this introduction. But we will study them later.

Queries

SQL is a declarative language, as opposed to an imperative language like C++, Java, Perl, Fortran and Ada. That is, the programmer states what she wants, not how to get it. Here are some sample queries:

select id, name
from Employee
where departmentName = 'Planning' and floor = 1;

select name
from Project
where budget between 100000.00 and 5311882.88;

select *
from Assignment
where employeeId = 21;

Sometimes you need to nest queries because information is contained in more than one table. Here is how you find the names of all employees that work on project #121:

select name
from Employee
where id in (
    select employeeId
    from Assignment
    where projectId = 121);

Often the same query can be phrased in more than one way. A different way to write the previous query is:

select name
from Employee, Assignment
where Employee.id = Assignment.employeeId and projectId = 121;

or, perhaps better,

select name
from Employee
join Assignment
on Employee.id = Assignment.employeeId
and projectId = 121;

Sometimes it is nice to be able to abbreviate the table names, as in

select p.budget
from Project p, Assignment a
where p.id = a.projectId and (a.Role = 'Electrician' or A.employeeId != 24);

This is actually necessary when phrasing a query that refers to distinct occurence of a single table. Which employees make more than their own manager?

select id, name, salary
from Employee e, Employee m
where e.managerId = m.id and e.salary > m.salary;

You can have your own result sorted on any column.

select *
from Project
where budget < 3200000.22
order by name;

Sometimes when an English query has words like "every" and "all" in it, the SQL command messy. To find out which employees work on every project you write:

select id, name
from Employee e
where not exists (
    select *
    from Project p
    where not exists (
        select *
        from Assignment a
        where e.id = a.employeeId and a.projectId = P.id));

There are five built-in functions in SQL that operate on columns. They are min, max, avg, sum and count. Here is how to find the minimum, maximum and average budget of all projects:

select min(budget), avg(budget), max(budget)
from Project;

How many managers are there?

select count(distinct managerId)
from Employee;

What is the highest floor that an employee of the 'Mars' project works on?

select max(Floor)
from Employee e, Project p, Assignment a
where e.id = a.employeeId and p.id = a.projectId and p.name = 'Mars';

One nice feature of SQL is that it is easy to partition the rows of a table into "groups" and apply these functions to a column within a group. Let's get the average salary of all workers for each department:

select departmentName, avg(Salary)
from Employee e
group by departmentName;

Suppose you wanted the number of reports produced by each department:

select departmentName, count(*)
from Report
group by departmentName;

Remember that the "where" clause filters out rows. Suppose we wanted to get the average salary of employees for each department except the Sales department. We write

select departmentName, avg(salary)
from Employee e
where departmentName != 'Sales'
group by departmentName;

The "having" clause filters groups. Lets do the previous query again but only print those departments with an average salary above 50000.

select departmentName, avg(salary)
from Employee e
where departmentName != 'Sales'
group by departmentName
having avg(Salary) > 50000.00;

Don't forget: the where clause filters rows and the having clause filters groups.

Here is something you should memorize. The order in which the clauses of a query are executed is

  1. FROM — chooses the desired tables
  2. WHERE — filters the rows you want
  3. GROUP BY — partitions the rows
  4. HAVING — filters the groups you want
  5. SELECT — filters the columns you want
  6. ORDER BY — sorts

Only the FROM and SELECT clauses are required. You can only have a HAVING if you have a GROUP BY (that one should be obvious!). The ORDER BY can not be used in a subquery.

Here is a query that uses all six cluases. Find for each department the number of employees that work in that department and the sum of their salaries. Do not consider the planning department nor any department managed by Wirth nor any department whose total salary expense is below 100000. Print the results in reverse alphabetical order by department name.

select departmentName, count(*), sum(salary)
from Employee
where departmentName != 'Planning' and departmentName not in (
    select departmentName
    from Department d, Employee e
    where d.name = E.departmentName and e.name = 'Wirth')
group by departmentName
having sum(Salary) > 100000.00
order by departmentName desc;

If we wanted to sort the previous result by decreasing salary totals then the last clause would be

order by 3 desc

where the 3 refers to the third column mentioned in the select clause.

Updating and Deleting

Here are the basics of updating and deleting:

delete
from Employee
where Floor = 3;

update Project
set budget = budget * 1.05
where name = 'Engineering';

That last command effectively gave the Engineering department a 5% budget increase.

More...

The SQL commands not covered in these notes include the following:

    drop table 			create view
    create index		drop view
    drop index			grant
    create cluster		revoke
    alter cluster               lock table
    drop cluster