线上开启了hive的并发支持,因此job之间会有lock的竞争问题。

在产生锁的竞争时会有如下的信息:

conflicting lock present for table mode EXCLUSIVE

在一些场景下,有些job运行完毕但是不自动释放锁(需要手动unlock或者去zookeeper里面删除掉),因此需要对锁进行监控,主要是用show locks的信息。

具体的python脚本如下:

import osimport subprocessimport utilimport reimport sendmailimport timeimport sysimport propertyif __name__ == "__main__":        allInfo = []        now = time.time()        sql = "show locks"        lock_query_id = ""        lock_create_time = ""        lock_sql = ""        allLock =  util.hive_run_cmd(sql)        for line in allLock:                if len(re.split('\t| ',line)[0].split('@')) == 2:                        dataBase = re.split('\t| ',line)[0].split('@')[0]                        dataTable = re.split('\t| ',line)[0].split('@')[1]                        lockType = re.split('\t| ',line)[-1].strip()                        print dataBase+ "===" + dataTable +  "===" + lockType                        util.get_lock_info(allInfo,database=dataBase,table=dataTable,keytype=lockType)                else:                        dataBase = re.split('\t| ',line)[0].split('@')[0]                        dataTable = re.split('\t| ',line)[0].split('@')[1]                        dataPartition = re.split('\t| ',line)[0].split('@')[2].replace('/',',')                        lockType = re.split('\t| ',line)[-1].strip()                        print dataBase+ "===" + dataTable +  "===" + lockType + "====" + dataPartition                        util.get_lock_info(allInfo,database=dataBase,table=dataTable,keytype=lockType,partition=dataPartition)        print allInfo        if len(allInfo) == 0:                 pass                 #sys.exit(0)        else:                mailfile = open("/home/hdfs/ericni/lock_monitor/mail/lock_table_"+ str(now) + ".html","w+")                mailcontent = """      
""" for line in allInfo: if len(line) < 5: pass else: re_table = line[0] re_type = line[1] re_time = float(now) - float(line[3]) print re_time re_query = line[2] re_sql = line[4] if (re_time >= 1800 and str(re_type) == "SHARED") or (re_time >= 600 and str(re_type) == "EXCLUSIVE"): print "++++++++++++++++++++++++++++++++++++++++++++++" cmd = "/bin/touch /tmp/alert.file" os.popen(cmd) mailcontent += """
""" % (re_table) mailcontent += """
""" % (re_type) mailcontent += """
""" % (round(float(re_time),2)) mailcontent += """
""" % (re_query) mailcontent += """
""" % (re_sql) mailcontent += "
" else: pass mailcontent += "
TABLE LOCK_TYPE LOCK_TIME QUERY_ID SQL
%s %s %s %s %s
" mailfile.write(mailcontent) if not os.path.isfile("/tmp/alert.file"): print "no need to alert" sys.exit(0) else: print "+++______++++" cmd = "/bin/rm -f /tmp/alert.file" os.popen(cmd) sendmail.send_mail_withoutSSL( "HIVE table lock alert",mailcontent.encode('utf-8'),property.mail_list_hdfs)

产生的报警邮件如下: