Table of Contents

第三章 SQL

3.1 SQL查询语言概览

SQL：结构化查询语言。最新的SQL标准是2008。

SQL语言包括以下几个部分：

数据定义语言（DDL）：DDL提供定义关系模式、删除关系、以及修改关系模式的命令。
数据操作语言（DML）：DML提供从数据库中查询信息，以及在数据库中插入元组、删除元组、修改元组的能力。
完整性（integrity）：DDL包括定义完整性约束的命令，保存在数据库中的数据必须满足所定义的完整性约束。破坏完整性约束的更新是不允许的。
视图定义（view definition）：DDL包括定义视图的命令。
事务控制（transaction control）：SQL包括定义事务的开始和结束的命令。
嵌入式SQL和动态SQL（embedded SQL and dynamic SQL)：嵌入式和动态SQL定义SQL语句如何嵌入到通用编程语言，如C和C++
授权（authorization）：DDL包括定义对关系的访问权限的命令

3.2 SQL数据定义

SQL DDL不仅能够定义一组关系，还能定义每个关系的信息，包括：

每个关系的模式
每个属性的取值类型
完整性约束
每个关系维护的索引集合
每个关系的安全性和权限信息
每个关系在磁盘上的物理存储结构

3.2.1 基本类型

char(n): 固定长度的数组。当存入的值不足n位时，自动追加空格。
varchar(n): 可变长度的数组，用户指定最大长度n
int
smallint：小整数类型
numeric（p，d）：定点数，精度由用户指定。这个数有p位数字，其中有d位在小数点右边。
real, double precision: 浮点数与双精度浮点数，精度与机器有关
float（n）：精度至少为n位的浮点数

每种类型都可能包括一个被称为空的特殊值。

注意：当比较char和varchar时，是否会自动在varchar类型后面追加空格取决于数据库的实现。

3.2.2 基本模式定义

创建表：其中Ai是属性，D1是属性Ai的域。
- 常用的完整性约束：
  - primary key(Aj1, Aj2, ..., Ajm): 声明属性Aj1, Aj2, ..., Ajm为关系的主码。主码必须非空且唯一。
  - foreign key (Ak1, Ak2,..., Akn) references table_B: 声明关系中任意元组对Ak1, Ak2,..., Akn上的取值必须对应于关系table_B中某元组在主码上的取值。
  - not null: 一个属性上的not null约束表明该属性上不允许控制。

注意：SQL禁止破坏完整性约束的任何数据库更新。

create table R ( A1 D1 not null, A2 D2, ..., An Dn, <完整性约束1>， ..., <完整性约束2>);

插入元组：

insert into R (A1, A2, ..., An) values (V1, V2, ..., Vn); insert into R values (V1, V2,..., Vn);

删除元组：

delete from R; --从表R中删除全部元组 drop table R; --从数据库中删除关系R

修改关系的属性

alter table R add A D; --给表R添加属性A，其域为D alter table R drop A; --删除表R的属性A

3.3 SQL查询的基本结构

SQL查询的基本结构由三个子句构成：select, from 和where。

3.3.1 单关系查询

-- 可以包括重复数据 
select A1, A2,.., An from R; 
-- 去除重复 
select distinct A1, A2,.., An from R; 
-- 显示的不去除重复 
select all A1, A2, ..., An from R; 
-- select子句可以含有+-*/运算符的算术表达式；运算对象可以是常数或元组的属性。 
select A1, A2, ..., A3 * 1.1 from R; 
-- where子句可以包括逻辑连词and, or 和not 
-- 逻辑连词的运算对象可以使包含比较运算符的<, <=, >, >=, = 和<>的表达式 
select A1, A2, ... An from R where A1=V1 and A2=V2;

3.3.2 多关系查询

SQL查询的基本结构由三个子句构成：select, from 和where子句构成：

select子句：列出查询结果中所需要的属性
from子句：一个查询求值中需要访问的关系列表
where子句：一个作用在from子句中关系的属性上的谓词

执行顺序：from->where->select。

在SQL的实际实现中，它会通过尽可能只产生满足where子句谓词的笛卡尔积元素来进行优化执行。如果省略where子句，则会输出笛卡尔积。

3.3.3 自然连接

笛卡尔积：它将第一个关系的每个元组与第二个关系的所有元组都进行连接。

自然连接：只考虑连接在两个关系上都出现的属性上取值相同的元组对。

-- 在R1和R2上都有的属性做连接 
select A1, A2,...,An from R1 natrual join R2; 

-- 在仅在A1和A2属性连接R1和R2 - 自然连接的一种形式 
-- 相当于 select A1, A2,...,An from R1, R2 where R1.A1=R2.A1 and R1.A2 = R2.A2; 
select A1, A2,...,An from R1 join R2 using (A1, A2);

例如：

select name, course_id from instructor, teaches where instructor.ID = teaches.ID; 
--可以简写为： 
select name, course_id from instructor natural join teaches;

3.4 附加的基本运算

3.4.1 更名运算 as

oldname as newname 
-- 找出满足下面条件的所有教师的姓名，他们的工资至少比Biology系的某一个教师的工资要高 
select distinct T.name from instructor as T, instructor as B where T.salary>S.salary and S.dept_name = 'Biology'; 
/*其中T和S 被称作相关名称(correlation name)/表别名（table alias）/相关变量（correlation variable）/元组变量（tuple variable)*/

3.4.2 字符串运算

字符串上可以使用的常见函数：

upper(s)：将字符串s转为大写
lower(s)：将字符串s转为小写
trim(s): 去掉字符串后面的空格

在字符串上使用like操作符实现模式匹配：

百分号(%): 匹配任意字符
下划线（_）: 匹配任意一个字符

模式是大小写敏感的。

'Intro%' - 匹配“Intro”打头的字符串 
'%Comp%' - 匹配任何包括“Comp”子串的字符串 
'___' - 匹配只含三个字符的字符串 
'___%' - 匹配至少包含三个字符的字符串 

// 可以使用escape关键字来定义转义符 like 'ab\%cd%' escape 
'\' - 匹配以“ab%cd”开头的字符串

3.4.3 select 子句中的属性说明

select R.* from R;

3.4.4 排列元组的显示次序 order by

order by 默认使用升序排列，asc表示升序，desc表示降序

select * from R order by A1; select * from instructor order by salary desc, name asc;

3.4.5 where子句谓词

between, not between关键字

select name from instructor where salary between 90000 and 100000;

在元组上使用比较运算符, 那么会按照字典顺序进行比较

select name, course_id from instructor, teaches where instructor.ID= teaches.ID and dept_name = "Biology"; 
--可以重写为： 
select name, course_id from instructor, teaches where (instructor.ID, dept_name) = (teaches.ID, 'Biology');

3.5 集合运算

并运算union、union all：与select子句不同，union运算会自动去除重复；若想保留重复，需要使用union all

(select course_id from section where semester = "Fall" and year = 2019) union (select course_id from section where semester = "Spring" and year = 2020)

交运算intersect、intersect all: intersect运算也会自动去除重复

(select course_id from section where semester = "Fall" and year = 2019) intersect (select course_id from section where semester = "Spring" and year = 2020)

差运算except、except all: except运算也会自动去除重复

(select course_id from section where semester = "Fall" and year = 2019) except (select course_id from section where semester = "Spring" and year = 2020)

3.6 空值null

空值运算规则：

如果算数表达式（+-*/）的任意输入为空，则该算数表达式结果为空
涉及空值的任何比较运算的结果视为unknown（既不是谓词is null, 也不是is not null），是true和false之外的第三个逻辑值

扩展到unknown的布尔运算：

and:
- true and unknown ===> unknown
- false and unknown ===> false
- unknown and unknown ===> unknown
or:
- true or unknown ===> true
- false or unknown ===> unknown
- unknown or unknown ===> unknown
not
- not unknown ==> unknown

注意：

select distinct会去除重复元组。在该过程中，需要比较两个元组的对应的属性值。如果两个值都是非空并且值相等，或者都为空，那么它们是相同的。

但是这与谓词对待空值的方式不同，谓词中null = null 会返回unknown，而不是true。

3.7 聚集函数

聚集函数是以值的一个集合为输入，返回单个值的函数。

常用的聚集函数：
- 平均值 avg
- 最小值 min
- 最大值 max
- 总和 sum
- 计数 count
分组聚集 group by
having子句：对分组限定条件。

包含having和group by 子句的查询的执行顺序：

先根据from子句计算出一个关系
如果出现where子句，where子句中的谓词将应用到from子句的结果关系上
如果出现group by，则对上面的结果形成分组
如果出现having，则执行哈慈那个语句
执行select子句

对空值null进行聚集的原则：除了count(*)外，所有的聚集函数都会忽略输入集合中的空值

3.8 嵌套子查询

SQL提供嵌套子查询机制。子查询是嵌套在另一个查询中的select-from-where表达式。

3.8.1 集合成员资格：in, not in

in: 测试元组是否是集合中的成员

not in: 测试元组是否不是集合中的成员

select distinct course_id from section where semester = "Fall" and year = 2009 and course_id in 
(select course_id from section where semester = "Spring" and year = 2010)

3.8.2 集合的比较

some:
- < some, <=some, >=some, = some, <>some

注意：= some 等价于 in， <>some不等价于not in

select name from instructor where salary > some (select salary from instructor where dept_name = 'Biology');

all:
- < all, <=all, >=all, = all , <>all

注意：<>all 等价于not in, =al不等价于in

select name from instructor where salary > all (select salary from instructor where dept_name = 'Biology');

3.8.3 空关系测试：exists, not exists

exists: 测试一个子查询的结果中是否存在元素

not exists: 测试一个子查询的结果中是否不存在元素

3.8.4 重复元组存在性测试unique

unique：用于测试一个子查询的结果中是否含有重复元组。

not unique: 用于测试在一个子查询结果中是否存在重复元组

-- 找出所有在2009年最多开过一次的课程 
select T.course_id from course as T where unique (select R.course_id from section as R where T.course_id = R.course_id and R.year = 2009) 
-- 找出所有在2009年最少开过两次的课程 
select T.course_id from course as T where not unique (select R.course_id from section as R where T.course_id = R.course_id and R.year = 2009)

3.8.5 from子句中的子查询

注意：有的数据库可以支持对嵌套子查询的进行属性重命名。但是oracle不支持。

select dept_name, avg_salary from (select dept_name, avg(salary) from instructor group by dept_name) as dept_avg(dept_name,avg_salary) where avg_salary > 42000;

from子句中的嵌套子查询不能使用凯子from子句中其他关系的相关变量，但是SQL 2003允许from子句中的子查询使用关键词lateral最为前缀，以便访问from子句中在它前面的表或子查询中的属性。

-- 打印每位教师的姓名，工资以及他们所在的系的平均工资 
select name, salary, avg_salary from instructor I1, lateral (select avg(salary) as avg_salary from instructor I2 where I2.dept_name = I1.dept_name);

3.8.6 with子句

with子句提供定义临时关系的方法，这个定义只对包含with子句的查询有效。

-- 找出具有最大预算值的系 
with max_budget(value) as (select max (budget) from department) select budget from department, max_budget where department.budget = max_budget.value;

3.8.7 标量子查询scalar subquery

若子查询返回单个属性的单个元组，则该子查询称为标量子查询。标量子查询可以出现select-from-where子句以及having子句中。

-- 列出所有的系，即它们所拥有的教师数 
select dept_name, (select count(*) from instructor where department.dept_name = instructor.dept_name) as num_instructors from department;

3.9 数据库的修改

删除

delete from r where P;

插入

insert into r values (v1, v2, ..., vn); 
-- 在执行insert前，先执行完select语句 
insert into r (select (A1, A2, ..., An) from r where P);

更新

update r set A1 = v1 where P; 
-- case结构 
case when pred1 then res1 when pred2 then res2 ... when predn then resn else res0 end 
/* 更新每个student的tot_cred属性值设为该生成功 学完的课程的总学分。grade既不是F也不是null，则表明成功学完了该门课程*/
update student S set tot_cred = ( select sum(credits) from takes natural join course where S.ID = takes.ID and takes.grade<>'F' and takes.grode is not null ); 
/*如果学生没有成功完成任何一门课程，则tot_cred被设置为null。 如果希望将这样的属性值设为0，那么需要使用另一条update */
-- ===>上述sql可改写为 
update student S set tot_cred = ( select case when sum(credits) is not null then sum(credits) else 0 end from takes natural join course
where S.ID = takes.ID and takes.grade<>'F' and takes.grode is not null);

总结：

SQL语言包括几部分：
- 数据定义语言DDL：提供了定义关系模式、删除关系以及修改关系模式的命令
- 数据操作语言DML：包括查询语言、以及向数据库中插入、删除、修改元组的命令
SQL的数据定义语言用于创建具有特定模式的关系。除了声明关系属性的名称和类型之外，SQL还允许声明完整性约束，例如主码约束和外码约束
SQL提供了多种用于数据查询的语言结构：select-from-where子句。SQL支持自然连接操作。
SQL提供了对属性和关系的重命名，以及对查询结果按照特定属性排序的机制
SQL支持关系上的基本集合运算，包括：并、交、和差运算
SQL通过true，false和unknown来处理包含空值的关系的查询
SQL支持聚集函数、可以把关系进行分组，每个分组上单独运用聚集函数。SQL还支持分组上的集合运算。
SQL支持在外层查询的where和from子句中嵌套子查询。它还在一个表达式返回的单个值所允许出现的地方支持标量子查询。
SQL提供了更新、插入、删除信息的结构。

《数据库系统概念》学习笔记——第三章 SQL