MDL元数据锁
MetaData Lock即元数据锁,MetaData Lock主要为了保证元数据的一致性,用于处理不同线程操作同一数据对象的同步与互斥问题。MySQL 5.5版本开始,引入了MDL锁,但是因为MDL锁,会导致表级别的锁,无论是读或者写操作,都无法进行,导致SQL的阻塞。
[TOC]
MDL锁与实现
MySQL5.5版本引入了MDL锁(metadata lock),用于解决或者保证DDL操作与DML操作之间的一致性。例如下面的这种情形:
会话1 | 会话2 |
---|---|
BEGIN; | |
SELECT * FROM xxx; | |
DROP TABLE xxx; | |
SELECT * FROM xxx; |
若没有MDL锁的保护,则事务2可以直接执行DDL操作,并且导致事务1出错,5.1版本即是如此。5.5版本加入MDL锁就在于保护这种情况的发生,由于事务1开启了查询,那么获得了MDL锁,锁的模式为SHARED_READ,事务2要执行DDL,则需获得EXCLUSIVE锁,两者互斥,所以事务2需要等待。
目前MDL有如下锁模式:
锁模式 | 对应SQL |
---|---|
MDL_INTENTION_EXCLUSIVE | GLOBAL对象、SCHEMA对象操作会加此锁 |
MDL_SHARED | FLUSH TABLES with READ LOCK |
MDL_SHARED_HIGH_PRIO | 仅对MyISAM存储引擎有效 |
MDL_SHARED_READ | SELECT查询 |
MDL_SHARED_WRITE | DML语句 |
MDL_SHARED_WRITE_LOW_PRIO | 仅对MyISAM存储引擎有效 |
MDL_SHARED_UPGRADABLE | ALTER TABLE |
MDL_SHARED_READ_ONLY | LOCK xxx READ |
MDL_SHARED_NO_WRITE | FLUSH TABLES xxx,yyy,zzz READ |
MDL_SHARED_NO_READ_WRITE | FLUSH TABLE xxx WRITE |
MDL_EXCLUSIVE | ALTER TABLE xxx PARTITION BY … |
MDL锁的诊断
MySQL 5.7版本之前并没有提供一个方便的途径来查看MDL锁,而在MySQL 5.7中的 performance_schea 库下新增了一张表 metadata_locks,用其来查看MDL锁。
前提:开启 performance_schema(对性能影响有限),并重启实例生效。
[mysqld]
performance_schema=ON
不过默认 performance_schema 并没有打开此功能,需要手工将wait/lock/metadata/sql/mdl监控给打开:
show variables like '%performance_schema%';
SELECT * FROM performance_schema.setup_instruments;
UPDATE performance_schema.setup_consumers SET ENABLED = 'YES' WHERE NAME ='global_instrumentation';
UPDATE performance_schema.setup_instruments SET ENABLED = 'YES' WHERE NAME ='wait/lock/metadata/sql/mdl';
SELECT * FROM performance_schema.metadata_locks;
准备数据
drop table if exists test;
create table test(id int,name varchar(4));
insert into test values(1,'a'),(2,'b');
会话1:不提交
begin;
select * from test;
delete from test where id=2;
update test set name='c' where id=1;
SELECT * FROM performance_schema.metadata_locks;
主:重点关注 lock_status,"PENDING"代表线程在等待MDL,而"GRANTED"则代表线程持有MDL。
会话2:通过metadata_locks查看mdl锁
alter table test add c1 int;
SELECT * FROM performance_schema.metadata_locks;
会话3:可看到 Waiting for table metadata lock
show processlist;
SELECT * FROM performance_schema.metadata_locks;
如何找出引起阻塞的会话
select object_type,object_schema,object_name,lock_type,lock_duration,lock_status,owner_thread_id from performance_schema.metadata_locks;
结果如下:
object_type | object_schema | object_name | lock_type | lock_duration | lock_status | owner_thread_id |
---|---|---|---|---|---|---|
TABLE | slowtech | test | SHARED_WRITE | TRANSACTION | GRANTED | 27 |
GLOBAL | NULL | NULL | INTENTION_EXCLUSIVE | STATEMENT | GRANTED | 29 |
SCHEMA | slowtech | NULL | INTENTION_EXCLUSIVE | TRANSACTION | GRANTED | 29 |
TABLE | slowtech | test | SHARED_UPGRADABLE | TRANSACTION | GRANTED | 29 |
TABLE | slowtech | test | EXCLUSIVE | TRANSACTION | PENDING | 29 |
TABLE | performance_schema | metadata_locks | SHARED_READ | TRANSACTION | GRANTED | 28 |
结合owner_thread_id,可以可到,是29号线程在等待27号线程的MDL,此时,可kill掉xx号线程。
但需要注意的是,owner_thread_id给出的只是线程ID,并不是show processlist中的ID。如果要查找线程对应的processlist id,需查询performance_schema.threads表。
select * from performance_schema.threads where thread_id in (27,29);
将这两张表结合,借鉴sys.innodb_lock _waits的输出,实际上我们也可以直观地呈现MDL的等待关系。
SELECT
a.OBJECT_SCHEMA AS locked_schema,
a.OBJECT_NAME AS locked_table,
"Metadata Lock" AS locked_type,
c.PROCESSLIST_ID AS waiting_processlist_id,
c.PROCESSLIST_TIME AS waiting_age,
c.PROCESSLIST_INFO AS waiting_query,
c.PROCESSLIST_STATE AS waiting_state,
d.PROCESSLIST_ID AS blocking_processlist_id,
d.PROCESSLIST_TIME AS blocking_age,
d.PROCESSLIST_INFO AS blocking_query,
concat('KILL ', d.PROCESSLIST_ID) AS sql_kill_blocking_connection
FROM
performance_schema.metadata_locks a
JOIN performance_schema.metadata_locks b ON a.OBJECT_SCHEMA = b.OBJECT_SCHEMA
AND a.OBJECT_NAME = b.OBJECT_NAME
AND a.lock_status = 'PENDING'
AND b.lock_status = 'GRANTED'
AND a.OWNER_THREAD_ID <> b.OWNER_THREAD_ID
AND a.lock_type = 'EXCLUSIVE'
JOIN performance_schema.threads c ON a.OWNER_THREAD_ID = c.THREAD_ID
JOIN performance_schema.threads d ON b.OWNER_THREAD_ID = d.THREAD_ID;
输出结果:
*************************** 1. row ***************************
locked_schema: slowtech
locked_table: t1
locked_type: Metadata Lock
waiting_processlist_id: 4
waiting_age: 259
waiting_query: alter table slowtech.t1 add c1 int
waiting_state: Waiting for table metadata lock
blocking_processlist_id: 2
blocking_age: 301
blocking_query: NULL
sql_kill_blocking_connection: KILL 2
输出一目了然,DDL操作如果要获得MDL,执行kill 2即可。
遇到的问题
2020.07.14.dms工单执行ddl语句报错"无法获取到原表上的MDL锁"
工单里的ddl语句如下:
ALTER TABLE one_activity ADD collect_user_num int(10) DEFAULT 0 NOT NULL COMMENT '线索数,默认0' after collect_type
工单的调度详情里看到如下报错,重试依旧报错。
------[2020-07-14 15:03:14]------
任务执行失败:
iDB-OnlineDDL Execute Error:无法获取到原表上的MDL锁,已尝试3次,请排查原表上的大事务、大查询,或到业务低峰期时重试任务
TraceId : 0b736e9816262416409597186d428a
------[2020-07-14 15:03:14]------
jobFail callback End.
开发事后反馈如下报错日志:
responseBody error
org.springframework.dao.CannotAcquireLockException:
### Error updating database. Cause: com.mysql.cj.jdbc.exceptions.MySQLTransactionRollbackException: Lock wait timeout exceeded; try restarting transaction
### The error may exist in com/oneonline/soa/activity/dao/WechatUserDAO.java (best guess)
### The error may involve com.oneonline.soa.activity.dao.WechatUserDAO.updateById-Inline
### The error occurred while setting parameters
### SQL: UPDATE one_wechat_user SET openid=?, unionid=?, origin=?, session_key=?, phone_number=?, pure_phone_number=?, country_code=?, created_at=?, updated_at=? WHERE id=?
### Cause: com.mysql.cj.jdbc.exceptions.MySQLTransactionRollbackException: Lock wait timeout exceeded; try restarting transaction
; Lock wait timeout exceeded; try restarting transaction; nested exception is com.mysql.cj.jdbc.exceptions.MySQLTransactionRollbackException: Lock wait timeout exceeded; try restarting transaction
at org.springframework.jdbc.support.SQLErrorCodeSQLExceptionTranslator.doTranslate(SQLErrorCodeSQLExceptionTranslator.java:262)
at org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:72)
at org.mybatis.spring.MyBatisExceptionTranslator.translateExceptionIfPossible(MyBatisExceptionTranslator.java:73)
at org.mybatis.spring.SqlSessionTemplate$SqlSessionInterceptor.invoke(SqlSessionTemplate.java:446)
at com.sun.proxy.$Proxy158.update(Unknown Source)
at org.mybatis.spring.SqlSessionTemplate.update(SqlSessionTemplate.java:294)
at com.baomidou.mybatisplus.core.override.MybatisMapperMethod.execute(MybatisMapperMethod.java:63)
at com.baomidou.mybatisplus.core.override.MybatisMapperProxy.invoke(MybatisMapperProxy.java:62)
at com.sun.proxy.$Proxy228.updateById(Unknown Source)
at sun.reflect.GeneratedMethodAccessor539.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:343)
at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:198)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163)
at org.springframework.dao.support.PersistenceExceptionTranslationInterceptor.invoke(PersistenceExceptionTranslationInterceptor.java:139)
验证过程
1.该ddl一直处于执行中,执行如下语句:
SELECT id, USER,db,SUBSTRING_INDEX(HOST,':',1),COMMAND,TIME,info,state FROM information_schema.processlist WHERE command !='Sleep';
发现如下越来越多的block语句:
UPDATE one_wechat_user SET openid=?, unionid=?, origin=?, session_key=?, phone_number=? WHERE id=?
2.执行如下语句:
show engine innodb status\G;
看到 TRANSACTIONS 有如下信息:
---TRANSACTION 421875086800592, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 955083244, ACTIVE 48 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 2 lock struct(s), heap size 1136, 1 row lock(s)
MySQL thread id 5195068, OS thread handle 140351464961792, query id 18076351285 10.200.131.51 jronline updating
UPDATE one_wechat_user SET openid='om3P64pq-jvLI-_mHrh82k83OXLg',
unionid='',
origin=1,
session_key='sSI9dQ14GZPKfiwsGgAAug==',
phone_number='18608774809',
pure_phone_number='18608774809',
country_code='86',
created_at='2020-06-02 16:23:18.0',
updated_at='2020-06-02 16:23:18.0' WHERE id=30684
------- TRX HAS BEEN WAITING 48 SEC FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 521 page no 261 n bits 208 index PRIMARY of table `jronline_activity`.`one_wechat_user` trx id 955083244 lock_mode X locks rec but not gap waiting
Record lock, heap no 136 PHYSICAL RECORD: n_fields 12; compact format; info bits 0
0: len 4; hex 000077dc; asc w ;;
1: len 6; hex 000038b20927; asc 8 ';;
2: len 7; hex 2d0000023739cf; asc - 79 ;;
3: len 28; hex 6f6d3350363470712d6a764c492d5f6d48726838326b38334f584c67; asc om3P64pq-jvLI-_mHrh82k83OXLg;;
4: len 0; hex ; asc ;;
5: len 1; hex 01; asc ;;
6: len 24; hex 49436550514539522b475a5a3371315a3843725355513d3d; asc ICePQE9R+GZZ3q1Z8CrSUQ==;;
7: len 11; hex 3138363038373734383039; asc 18608774809;;
8: len 11; hex 3138363038373734383039; asc 18608774809;;
9: len 2; hex 3836; asc 86;;
10: len 4; hex 60b73ff6; asc ` ? ;;
11: len 4; hex 60b73ff6; asc ` ? ;;
------------------