线上开启了hive的并发支持,因此job之间会有lock的竞争问题。
在产生锁的竞争时会有如下的信息:
conflicting lock present for table mode EXCLUSIVE
在一些场景下,有些job运行完毕但是不自动释放锁(需要手动unlock或者去zookeeper里面删除掉),因此需要对锁进行监控,主要是用show locks的信息。
具体的python脚本如下:
import osimport subprocessimport utilimport reimport sendmailimport timeimport sysimport propertyif __name__ == "__main__": allInfo = [] now = time.time() sql = "show locks" lock_query_id = "" lock_create_time = "" lock_sql = "" allLock = util.hive_run_cmd(sql) for line in allLock: if len(re.split('\t| ',line)[0].split('@')) == 2: dataBase = re.split('\t| ',line)[0].split('@')[0] dataTable = re.split('\t| ',line)[0].split('@')[1] lockType = re.split('\t| ',line)[-1].strip() print dataBase+ "===" + dataTable + "===" + lockType util.get_lock_info(allInfo,database=dataBase,table=dataTable,keytype=lockType) else: dataBase = re.split('\t| ',line)[0].split('@')[0] dataTable = re.split('\t| ',line)[0].split('@')[1] dataPartition = re.split('\t| ',line)[0].split('@')[2].replace('/',',') lockType = re.split('\t| ',line)[-1].strip() print dataBase+ "===" + dataTable + "===" + lockType + "====" + dataPartition util.get_lock_info(allInfo,database=dataBase,table=dataTable,keytype=lockType,partition=dataPartition) print allInfo if len(allInfo) == 0: pass #sys.exit(0) else: mailfile = open("/home/hdfs/ericni/lock_monitor/mail/lock_table_"+ str(now) + ".html","w+") mailcontent = """
TABLE | LOCK_TYPE | LOCK_TIME | QUERY_ID | SQL |
---|---|---|---|---|
%s | """ % (re_table) mailcontent += """%s | """ % (re_type) mailcontent += """%s | """ % (round(float(re_time),2)) mailcontent += """%s | """ % (re_query) mailcontent += """%s | """ % (re_sql) mailcontent += "
产生的报警邮件如下: