2009年8月28日星期五

MySQL 字符集

MySQL(4.1以后版本) 服务器中有六个关键位置使用了字符集的概念,他们是:client 、connection、database、results、server 、system。MySQL有两个和字符集有关的概念。一个就是字符集本身,一个是字符集校验规则。字符集影响数据在传输和存储过程中的处理方式,而字符集校验则影响ORDER BY和GROUP BY这些排序方式。

和存储有关的
服务器字符集 (@@character_set_server)
库字符集 (@@character_set_database)
表字符集
字段字符集

character_set_server: 服务器安装时指定的默认字符集设定。
character_set_database: 数据库服务器中某个库使用的字符集设定,如果建库时没有指明,将使用服务器安装时指定的字符集设置。
character_system: 数据库系统使用的字符集设定。

在创建一个表的时候,每个字段只要不是binary,都会有一个字符集。如果不指定,那么在SHOW CREATE TABLE的时候,它是不会显示出来的。
建表时候,字段字符集的选取方式如下:
[text]
if 字段指定的字符集
else if 表指定的字符集
else if @@character_set_database
else @@character_set_server (如果没有设定,这个值为latin1)
[/text]

和传输有关的
@@character_set_connection
@@character_set_results
@@character_set_client

character_set_connection: 连接数据库的字符集设置类型,如果php没有指明连接数据库使用的字符集类型就按照服务器端默认的字符设置
character_set_results: 数据库给客户端返回时使用的字符集设定,如果没有指明,使用服务器默认的字符集
character_set_client: 客户端使用的字符集,相当于网页中的字符集设置

字符集的校对规则
字符集的校对规则设定分别由上面的character_set_connection, character_set_database, character_set_server决定

collation_connection: 连接字符集的校对规则
collation_database: 默认数据库使用的校对规则。当默认数据库改变时服务器则设置该变量。如果没有默认数据库,变量的值同collation_server
collation_server: 服务器的默认校对规则

以上内容中character_set_client, character_set_connection, character_set_results 受客户端默认字符集影响,其中php编译mysql模块时的默认字符集同样也受到它链接的mysql动态库影响,从而影响到php的character_set_connection, character_set_client设定。当默认字符集不是utf8时,设置my.cnf
[bash]
[mysqld]
default-character-set=utf8
default-collation=utf8_general_ci
[/bash]
default-character-set只能改变对存储层(server,database,table,column,system)的设定,对于客户端和服务器端的通讯层没有任何影响。

查看默认字符集(默认情况下,mysql的字符集是latin1(ISO_8859_1)
通常,查看系统的字符集和排序方式的设定可以通过下面的两条命令:
[sql]

mysql> SHOW VARIABLES LIKE 'character_set%';
+--------------------------+----------------------------------------+
| Variable_name | Value |
+--------------------------+----------------------------------------+
| character_set_client | latin1 |
| character_set_connection | latin1 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | latin1 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/local/mysql/share/mysql/charsets/ |
+--------------------------+----------------------------------------+
8 rows in set (0.00 sec)

mysql> SHOW VARIABLES LIKE 'collation%';
+----------------------+-----------------+
| Variable_name | Value |
+----------------------+-----------------+
| collation_connection | utf8_unicode_ci |
| collation_database | utf8_unicode_ci |
| collation_server | utf8_unicode_ci |
+----------------------+-----------------+
3 rows in set (0.00 sec)
[/sql]

修改默认字符集
最简单的修改方法,就是修改mysql的my.cnf文件中的字符集键值,
[bash]
[mysqld]
default-character-set=utf8
default-collation=utf8_general_ci
[/bash]

修改完后,重启mysql的服务,service mysql restart
使用SHOW VARIABLES LIKE ’character%’;查看,发现数据库编码均已改成utf8
[sql]
mysql> SHOW VARIABLES LIKE 'character_set%';
+--------------------------+----------------------------------------+
| Variable_name | Value |
+--------------------------+----------------------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/local/mysql/share/mysql/charsets/ |
+--------------------------+----------------------------------------+
8 rows in set (0.00 sec)
[/sql]

另外使用mysql的命令也可以修改编码
[sql]
mysql> SET character_set_client = utf8 ;
mysql> SET character_set_connection = utf8 ;
mysql> SET character_set_database = utf8 ;
mysql> SET character_set_results = utf8 ;
mysql> SET character_set_server = utf8 ;

mysql> SET collation_connection = utf8 ;
mysql> SET collation_database = utf8 ;
mysql> SET collation_server = utf8 ;
[/sql]

设置了表的默认字符集为utf8并且通过UTF-8编码发送查询,会发现存入数据库的仍然是乱码。问题就出在这个connection连接层上。解决方法是在发送查询前执行一下下面这句:
[sql]
SET NAMES 'utf8';
[/sql]
它相当于下面的三句指令:
[sql]
SET character_set_client = utf8;
SET character_set_results = utf8;
SET character_set_connection = utf8;
[/sql]

旧数据升级办法
以原来的字符集为latin1为例,升级成为utf8的字符集。原来的表: old_table (default charset=latin1),新表:new_table(default charset=utf8)。
A 导出旧数据
[sql]
mysqldump --default-character-set=latin1 -hlocalhost -uroot -p123456 --opt -B olddatabase --tables old_table > olddatabase.sql
[/sql]

B 转换编码
[sql]
iconv -t utf-8 -f gb2312 -c olddatabase.sql > newdatabase.sql
[/sql]
假定原来的数据默认是gb2312编码。

另外也有
[sql]
mysqldump -uroot -p123456 --default-character-set=latin1 --set-charset=utf8 --opt olddatabase > newdatabase.sql
[/sql]
代替上面两个步骤
C 导入
修改 olddatabase.sql,增加一条sql语句: SET NAMES utf8;,保存。
[sql]
mysql -hlocalhost -uroot newdatabase < newdatabase.sql
[/sql]

2009年8月2日星期日

常用的 SQL 内部函数

AVG()
[sql]
mysql> SELECT * FROM tbl_1;
+----+------+------+
| id | a | b |
+----+------+------+
| 2 | 1 | a |
| 4 | 2 | a |
| 6 | 3 | b |
| 8 | 4 | c |
+----+------+------+
4 rows in set (0.00 sec)

mysql> SELECT AVG(a) FROM tbl_1;
+--------+
| AVG(a) |
+--------+
| 2.5000 |
+--------+
1 row in set (0.00 sec)
[/sql]


COUNT()
[sql]
mysql> SELECT COUNT(*) FROM tbl_1 WHERE b = 'a';
+----------+
| COUNT(*) |
+----------+
| 2 |
+----------+
1 row in set (0.00 sec)
[/sql]


MAX() 返回数据集里最大值
[sql]
mysql> SELECT MAX(a) FROM tbl_1;
+--------+
| MAX(a) |
+--------+
| 4 |
+--------+
1 row in set (0.00 sec)
[/sql]


MIN() 返回数据集里最小值
[sql]
mysql> SELECT MIN(a) FROM tbl_1;
+--------+
| MIN(a) |
+--------+
| 1 |
+--------+
1 row in set (0.00 sec)
[/sql]


SUM() 求和
[sql]
mysql> SELECT SUM(a) FROM tbl_1;
+--------+
| SUM(a) |
+--------+
| 10 |
+--------+
1 row in set (0.00 sec)
[/sql]


ABS() OR ABSVAL() 计算绝对值
[sql]
mysql> SELECT ABS(a) FROM tbl_1;
+--------+
| ABS(a) |
+--------+
| 1 |
| 2 |
| 3 |
| 4 |
+--------+
4 rows in set (0.00 sec)
[/sql]

CEILING()
[sql]
mysql> SELECT CEILING(1.1), CEILING(1.5), CEILING(-1.1), CEILING(-1.5);
+--------------+--------------+---------------+---------------+
| CEILING(1.1) | CEILING(1.5) | CEILING(-1.1) | CEILING(-1.5) |
+--------------+--------------+---------------+---------------+
| 2 | 2 | -1 | -1 |
+--------------+--------------+---------------+---------------+
1 row in set (0.00 sec)
[/sql]


ROUND() 四舍五入
[sql]
mysql> SELECT ROUND(111.111, 1), ROUND(111.111, 2), ROUND(111.111, 3), ROUND(111.111, 4), ROUND(111.111, 5), ROUND(111.111, 0), ROUND(111.111, -1), ROUND(111.111, -2), ROUND(111.111, -3)\G
*************************** 1. row ***************************
ROUND(111.111, 1): 111.1
ROUND(111.111, 2): 111.11
ROUND(111.111, 3): 111.111
ROUND(111.111, 4): 111.1110
ROUND(111.111, 5): 111.11100
ROUND(111.111, 0): 111
ROUND(111.111, -1): 110
ROUND(111.111, -2): 100
ROUND(111.111, -3): 0
1 row in set (0.01 sec)

mysql> SELECT ROUND(111.116, 1), ROUND(111.116, 2), ROUND(111.116, 3), ROUND(111.116, 4), ROUND(111.116, 5), ROUND(111.116, 6), ROUND(111.116, 0), ROUND(111.116, -1), ROUND(111.116, -2), ROUND(11.116, -3)\G
*************************** 1. row ***************************
ROUND(111.116, 1): 111.1
ROUND(111.116, 2): 111.12
ROUND(111.116, 3): 111.116
ROUND(111.116, 4): 111.1160
ROUND(111.116, 5): 111.11600
ROUND(111.116, 6): 111.116000
ROUND(111.116, 0): 111
ROUND(111.116, -1): 110
ROUND(111.116, -2): 100
ROUND(11.116, -3): 0
1 row in set (0.00 sec)
[/sql]


CURTIME() 返回系统时间
[sql]
mysql> SELECT CURTIME();
+-----------+
| CURTIME() |
+-----------+
| 13:40:30 |
+-----------+
1 row in set (0.00 sec)
[/sql]


CURDATE() 返回系统日期
[sql]
mysql> SELECT CURDATE();
+------------+
| CURDATE() |
+------------+
| 2009-08-03 |
+------------+
1 row in set (0.00 sec)
[/sql]


DATE()
[sql]
mysql> SELECT DATE('2009-08-03');
+--------------------+
| DATE('2009-08-03') |
+--------------------+
| 2009-08-03 |
+--------------------+
1 row in set (0.00 sec)
[/sql]


DAY() 返回日期的日部分
[sql]
mysql> SELECT * FROM tbl_2;
+---------------------+---------------------+
| a | b |
+---------------------+---------------------+
| 2009-08-03 00:00:00 | 2009-07-03 00:00:00 |
| 2009-08-03 00:00:00 | 2009-07-01 00:00:00 |
| 2009-08-08 00:00:00 | 2009-07-01 00:00:00 |
| 2009-08-09 00:00:00 | 2009-07-10 00:00:00 |
+---------------------+---------------------+
4 rows in set (0.00 sec)

mysql> SELECT DAY(a), DAY(b) FROM tbl_2;
+--------+--------+
| DAY(a) | DAY(b) |
+--------+--------+
| 3 | 3 |
| 3 | 1 |
| 8 | 1 |
| 9 | 10 |
+--------+--------+
4 rows in set (0.00 sec)
[/sql]


DAYOFMONTH() 返回参数日部分
[sql]
mysql> SELECT DAYOFMONTH(a) FROM tbl_2;
+---------------+
| DAYOFMONTH(a) |
+---------------+
| 3 |
| 3 |
| 8 |
| 9 |
+---------------+
4 rows in set (0.00 sec)
[/sql]


DAYOFWEEK() 返回参数的星期值1~7,1-星期日;7-星期六
[sql]
mysql> SELECT DAYOFWEEK(a) FROM tbl_2;
+--------------+
| DAYOFWEEK(a) |
+--------------+
| 2 |
| 2 |
| 7 |
| 1 |
+--------------+
4 rows in set (0.00 sec)
[/sql]


DAYOFYEAR() 返回值1~366
[sql]
mysql> SELECT DAYOFYEAR(a), DAYOFYEAR(b) FROM tbl_2;
+--------------+--------------+
| DAYOFYEAR(a) | DAYOFYEAR(b) |
+--------------+--------------+
| 215 | 184 |
| 215 | 182 |
| 220 | 182 |
| 221 | 191 |
+--------------+--------------+
4 rows in set (0.00 sec)
[/sql]

HOUR() 返回参数小时部分,参数为时间或时间戳类型
[sql]

mysql> SELECT * FROM tbl_2;
+---------------------+---------------------+
| a | b |
+---------------------+---------------------+
| 2009-08-03 00:00:00 | 2009-07-03 00:00:00 |
| 2009-08-03 00:00:00 | 2009-07-01 00:00:00 |
| 2009-08-08 00:00:00 | 2009-07-01 00:00:00 |
| 2009-08-09 00:00:00 | 2009-07-10 00:00:00 |
| 2009-08-03 01:11:11 | 2009-08-03 02:21:12 |
+---------------------+---------------------+
5 rows in set (0.00 sec)

mysql> SELECT HOUR(a), HOUR(b) FROM tbl_2;
+---------+---------+
| HOUR(a) | HOUR(b) |
+---------+---------+
| 0 | 0 |
| 0 | 0 |
| 0 | 0 |
| 0 | 0 |
| 1 | 2 |
+---------+---------+
5 rows in set (0.00 sec)
[/sql]