These notes cover only the basics of SQL programming. They do not cover the whole language. They do not cover database design.
There is an international standard for SQL, the most recent being SQL:2003. A brief history (taken from Wikipedia):
Although there's an official standard, it's likely no one implements it 100%; all vendors have tons of extensions and tend to omit quite a bit of the standard they don't like. This situation might change as time goes on.
SQL is a programming language for relational databases. The most important, if not the only, data structure, is the table. Here is how you make tables in SQL:
create table Employee (
id integer not null primary key,
name varchar(40) not null,
departmentName varchar(20),
floor integer,
managerId integer,
salary decimal);
create table Department (
name varchar(20) not null primary key,
managerId integer,
foreign key (managerId) references Employee(id));
alter table Employee
add foreign key (departmentName) references Department(name);
create table Project (
id integer not null primary key,
name varchar(30),
budget decimal);
create table Assignment (
employeeId integer not null,
projectId integer not null,
role varchar(35),
primary key (employeeId, projectId),
foreign key (employeeId) references Employee(id),
foreign key (projectId) references Project(id));
create table Report (
id integer not null primary key,
title varchar(128),
departmentName varchar(20),
foreign key (departmentName) references Department(name));
you can visualize the tables as two-dimensional data structures, like
| Project | ||
|---|---|---|
| id | name | budget |
| | ||
Here PROJECT is the name of the table; id, name, and budget are called the attributes of the table. Each row in the table represents an instance of some particular project.
You can put data tables with an "insert" command:
insert into Project values (10, 'Longhorn', 250000.00);
Sometimes a particular attribute may be unknown or unnecessary for a particular row:
insert into Department
values ('Planning', null);
insert into Employee
values (36, 'Milner', 'Planning', 2, 16, 43911.26);
insert into Project
values (42299, 'Surface Analysis', null);
insert into Employee (name, floor, id, salary)
values ('Coquand', 10, 33, 92188.00);
Null values often make things pretty ugly. A thorough treatement of nulls is beyond the scope of this introduction. But we will study them later.
SQL is a declarative language, as opposed to an imperative language like C++, Java, Perl, Fortran and Ada. That is, the programmer states what she wants, not how to get it. Here are some sample queries:
select id, name from Employee where departmentName = 'Planning' and floor = 1; select name from Project where budget between 100000.00 and 5311882.88; select * from Assignment where employeeId = 21;
Sometimes you need to nest queries because information is contained in more than one table. Here is how you find the names of all employees that work on project #121:
select name
from Employee
where id in (
select employeeId
from Assignment
where projectId = 121);
Often the same query can be phrased in more than one way. A different way to write the previous query is:
select name from Employee, Assignment where Employee.id = Assignment.employeeId and projectId = 121;
or, perhaps better,
select name from Employee join Assignment on Employee.id = Assignment.employeeId and projectId = 121;
Sometimes it is nice to be able to abbreviate the table names, as in
select p.budget from Project p, Assignment a where p.id = a.projectId and (a.Role = 'Electrician' or A.employeeId != 24);
This is actually necessary when phrasing a query that refers to distinct occurence of a single table. Which employees make more than their own manager?
select id, name, salary from Employee e, Employee m where e.managerId = m.id and e.salary > m.salary;
You can have your own result sorted on any column.
select * from Project where budget < 3200000.22 order by name;
Sometimes when an English query has words like "every" and "all" in it, the SQL command messy. To find out which employees work on every project you write:
select id, name
from Employee e
where not exists (
select *
from Project p
where not exists (
select *
from Assignment a
where e.id = a.employeeId and a.projectId = P.id));
There are five built-in functions in SQL that operate on columns. They are min, max, avg, sum and count. Here is how to find the minimum, maximum and average budget of all projects:
select min(budget), avg(budget), max(budget) from Project;
How many managers are there?
select count(distinct managerId) from Employee;
What is the highest floor that an employee of the 'Mars' project works on?
select max(Floor) from Employee e, Project p, Assignment a where e.id = a.employeeId and p.id = a.projectId and p.name = 'Mars';
One nice feature of SQL is that it is easy to partition the rows of a table into "groups" and apply these functions to a column within a group. Let's get the average salary of all workers for each department:
select departmentName, avg(Salary) from Employee e group by departmentName;
Suppose you wanted the number of reports produced by each department:
select departmentName, count(*) from Report group by departmentName;
Remember that the "where" clause filters out rows. Suppose we wanted to get the average salary of employees for each department except the Sales department. We write
select departmentName, avg(salary) from Employee e where departmentName != 'Sales' group by departmentName;
The "having" clause filters groups. Lets do the previous query again but only print those departments with an average salary above 50000.
select departmentName, avg(salary) from Employee e where departmentName != 'Sales' group by departmentName having avg(Salary) > 50000.00;
Don't forget: the where clause filters rows and the having clause filters groups.
Here is something you should memorize. The order in which the clauses of a query are executed is
Only the FROM and SELECT clauses are required. You can only have a HAVING if you have a GROUP BY (that one should be obvious!). The ORDER BY can not be used in a subquery.
Here is a query that uses all six cluases. Find for each department the number of employees that work in that department and the sum of their salaries. Do not consider the planning department nor any department managed by Wirth nor any department whose total salary expense is below 100000. Print the results in reverse alphabetical order by department name.
select departmentName, count(*), sum(salary)
from Employee
where departmentName != 'Planning' and departmentName not in (
select departmentName
from Department d, Employee e
where d.name = E.departmentName and e.name = 'Wirth')
group by departmentName
having sum(Salary) > 100000.00
order by departmentName desc;
If we wanted to sort the previous result by decreasing salary totals then the last clause would be
order by 3 desc
where the 3 refers to the third column mentioned in the select clause.
Here are the basics of updating and deleting:
delete from Employee where Floor = 3; update Project set budget = budget * 1.05 where name = 'Engineering';
That last command effectively gave the Engineering department a 5% budget increase.
The SQL commands not covered in these notes include the following:
drop table create view
create index drop view
drop index grant
create cluster revoke
alter cluster lock table
drop cluster