使用Python脚本检验文件系统数据完整性-阿里云开发者社区

今天是又一年度的2月14日西方情人节，先说点与情人节有关的，但这绝不是狗粮，可以放心阅读。

讲真，如果你是单身狗，没事时还是要静下心学习以提升自己；如果你不是单身狗，没事时除了上一条还要多陪陪媳妇和家人。因为没有什么是比亲人和提升自己更重要的事了！无论是提升自己还是陪伴家人，不要浮于表面，就像今天过情人节一样，向对方表达爱并不是只有这一天和那几天，而是男女双方长久的坚持和包容。

用以前有人用过的句子说：

当你的才华撑不起你的野心，那你就应该静下心来学习；当你的金钱赶不上你的梦想，那你就应该坐下来好好工作；当你的能力还驾驭不了你的目标，那就应该沉下心来历练！

正文开始：

本文为使用Python脚本检验文件系统数据完整性和防止数据篡改提供一种简单且容易实现的思路（其实很简单，只需要了解Python基础+hashlib+文件操作等）。

虽然校验数据完整性这个话题已经由很多更加完美的解决办法，但依然可以作为Python新手练手内容之一，培养一下逻辑思维，防止“老年痴呆”。

目前已经在Windows 10以及Ubuntu（Python 2.7）下测试通过，其他的平台应该也可以，欢迎帮忙测试。

编写的思路和执行过程简要如下：

1.输入要检查数据完整性的目录的路径（也支持单个文件）和要保存文件hash值的校验文件的路径，如果路径不存在，则抛出异常或者创建，取决于具体情况；

参数传入（最新版本将参数传入通过命令行的方式传入了，下面图片中是老版本中的参数传入）：

在刚更新的版本中，参数传入和命令帮助通过docopt模块实现，方便使用。

 
        Python script to check data integrity on UNIX
        /
        Linux 
        or 
        Windows 
       
        accept options using 
        'docopt' 
        module, using 
        'docopt' 
        to accept parameters 
        and 
        command switch 
       
        Usage:
       
        checkDataIntegrity.py [
        -
        g 
        FILE 
        HASH_FILE] 
       
        checkDataIntegrity.py [
        -
        c 
        FILE 
        HASH_FILE] 
       
        checkDataIntegrity.py [
        -
        r HASH_FILE] 
       
        checkDataIntegrity.py generate 
        FILE 
        HASH_FILE 
       
        checkDataIntegrity.py validate 
        FILE 
        HASH_FILE 
       
        checkDataIntegrity.py reset HASH_FILE 
       
        checkDataIntegrity.py (
        -
        -
        version | 
        -
        v) 
       
        checkDataIntegrity.py 
        -
        -
        help 
        | 
        -
        h | 
        -
        ? 
       
        Arguments:
       
        FILE                  
        the path to single 
        file 
        or 
        directory to data protect 
       
        HASH_FILE             the path to 
        hash 
        data saved 
       
        Options:
       
        -
        ? 
        -
        h 
        -
        -
        help          
        show this 
        help 
        message 
        and 
        exit 
       
        -
        v 
        -
        -
        version          show version 
        and 
        exit 
       
        Example, 
        try
        : 
       
        checkDataIntegrity.py generate 
        /
        tmp 
        /
        tmp
        /
        data.json 
       
        checkDataIntegrity.py validate 
        /
        tmp 
        /
        tmp
        /
        data.json 
       
        checkDataIntegrity.py reset 
        /
        tmp
        /
        data.json 
       
        checkDataIntegrity.py 
        -
        g 
        /
        tmp 
        /
        tmp
        /
        data.json 
       
        checkDataIntegrity.py 
        -
        c 
        /
        tmp 
        /
        tmp
        /
        data.json 
       
        checkDataIntegrity.py 
        -
        r 
        /
        tmp
        /
        data.json 
       
        checkDataIntegrity.py 
        -
        -
        help

合法的参数和路径：

路径不存在时抛出异常：

其他异常处理可以通过脚本内容看到。

2.首次执行保存需要校验hash值的校验文件的内容，再次执行读取原先的文件与现在的待校验的目录中的文件的hash值做比对，如果hash值不一样，则显示出该文件路径，如果全部一致，则输出提示信息

首次执行：

再次执行（检验通过）：

校验不通过：

3.当文件发生变更并且想更新校验文件数据时，可以使用remakeDataIntegrity()函数将已保存的校验文件删除

Linux上的测试：

最新的代码可以从GitHub获得，链接：https://github.com/DingGuodong/LinuxBashShellScriptForOps/blob/master/functions/security/checkDataIntegrity.py。

代码如下：

 
        #!/usr/bin/python
       
        # encoding: utf-8
       
        # -*- coding: utf8 -*-
       
        """
       
        Created by PyCharm.
       
        File:               LinuxBashShellScriptForOps:checkDataIntegrity.py
       
        User:               Guodong
       
        Create Date:        2017/2/14
       
        Create Time:        14:45
       
        Python script to check data integrity on UNIX/Linux or Windows
       
        accept options using 'docopt' module, using 'docopt' to accept parameters and command switch
       
        Usage:
       
        checkDataIntegrity.py [-g FILE HASH_FILE] 
       
        checkDataIntegrity.py [-c FILE HASH_FILE] 
       
        checkDataIntegrity.py [-r HASH_FILE] 
       
        checkDataIntegrity.py generate FILE HASH_FILE 
       
        checkDataIntegrity.py validate FILE HASH_FILE 
       
        checkDataIntegrity.py reset HASH_FILE 
       
        checkDataIntegrity.py (--version | -v) 
       
        checkDataIntegrity.py --help | -h | -? 
       
        Arguments:
       
        FILE                  the path to single file or directory to data protect 
       
        HASH_FILE             the path to hash data saved 
       
        Options:
       
        -? -h --help          show this help message and exit 
       
        -v --version          show version and exit 
       
        Example, try:
       
        checkDataIntegrity.py generate /tmp /tmp/data.json 
       
        checkDataIntegrity.py validate /tmp /tmp/data.json 
       
        checkDataIntegrity.py reset /tmp/data.json 
       
        checkDataIntegrity.py -g /tmp /tmp/data.json 
       
        checkDataIntegrity.py -c /tmp /tmp/data.json 
       
        checkDataIntegrity.py -r /tmp/data.json 
       
        checkDataIntegrity.py --help 
       
        """
       
        from 
        docopt 
        import 
        docopt 
       
        import 
        os 
       
        import 
        sys 
       
        import 
        hashlib 
       
        def 
        get_hash_sum(filename, method
        =
        "sha256"
        , block_size
        =
        65536
        ): 
       
        if 
        not 
        os.path.exists(filename): 
       
        raise 
        RuntimeError(
        "cannot open '%s' (No such file or directory)" 
        % 
        filename) 
       
        if 
        not 
        os.path.isfile(filename): 
       
        raise 
        RuntimeError(
        "'%s' :not a regular file" 
        % 
        filename) 
       
        if 
        "md5" 
        in 
        method: 
       
        checksum 
        = 
        hashlib.md5() 
       
        elif 
        "sha1" 
        in 
        method: 
       
        checksum 
        = 
        hashlib.sha1() 
       
        elif 
        "sha256" 
        in 
        method: 
       
        checksum 
        = 
        hashlib.sha256() 
       
        elif 
        "sha384" 
        in 
        method: 
       
        checksum 
        = 
        hashlib.sha384() 
       
        elif 
        "sha512" 
        in 
        method: 
       
        checksum 
        = 
        hashlib.sha512() 
       
        else
        : 
       
        raise 
        RuntimeError(
        "unsupported method %s" 
        % 
        method) 
       
        with 
        open
        (filename, 
        'rb'
        ) as f: 
       
        buf 
        = 
        f.read(block_size) 
       
        while 
        len
        (buf) > 
        0
        : 
       
        checksum.update(buf) 
       
        buf 
        = 
        f.read(block_size) 
       
        if 
        checksum 
        is 
        not 
        None
        : 
       
        return 
        checksum.hexdigest() 
       
        else
        : 
       
        return 
        checksum 
       
        def 
        makeDataIntegrity(path): 
       
        path 
        = 
        unicode
        (path, 
        'utf8'
        )  
        # For Chinese Non-ASCII character 
       
        if 
        not 
        os.path.exists(path): 
       
        raise 
        RuntimeError(
        "Error: cannot access %s: No such file or directory" 
        % 
        path) 
       
        elif 
        os.path.isfile(path): 
       
        dict_all 
        = 
        dict
        () 
       
        dict_all[os.path.abspath(path)] 
        = 
        get_hash_sum(path) 
       
        return 
        dict_all 
       
        elif 
        os.path.isdir(path): 
       
        dict_nondirs 
        = 
        dict
        () 
       
        dict_dirs 
        = 
        dict
        () 
       
        for 
        top, dirs, nondirs 
        in 
        os.walk(path, followlinks
        =
        True
        ): 
       
        for 
        item 
        in 
        nondirs: 
       
        # Do NOT use os.path.abspath(item) here, else it will make a serious bug because of 
       
        # os.path.abspath(item) return "os.getcwd()" + "filename" in some case. 
       
        dict_nondirs[os.path.join(top, item)] 
        = 
        get_hash_sum(os.path.join(top, item)) 
       
        for 
        item 
        in 
        dirs: 
       
        dict_dirs[os.path.join(top, item)] 
        = 
        r"" 
       
        dict_all 
        = 
        dict
        (dict_dirs, 
        *
        *
        dict_nondirs) 
       
        return 
        dict_all 
       
        def 
        saveDataIntegrity(data, filename): 
       
        import 
        json 
       
        data_to_save 
        = 
        json.dumps(data, encoding
        =
        'utf-8'
        ) 
       
        if 
        not 
        os.path.exists(os.path.dirname(filename)): 
       
        os.makedirs(os.path.dirname(filename)) 
       
        with 
        open
        (filename, 
        'wb'
        ) as f: 
       
        f.write(data_to_save) 
       
        def 
        readDataIntegrity(filename): 
       
        import 
        json 
       
        if 
        not 
        os.path.exists(filename): 
       
        raise 
        RuntimeError(
        "cannot open '%s' (No such file or directory)" 
        % 
        filename) 
       
        with 
        open
        (filename, 
        'rb'
        ) as f: 
       
        data 
        = 
        json.loads(f.read()) 
       
        if 
        data: 
       
        return 
        data 
       
        def 
        remakeDataIntegrity(filename): 
       
        def 
        confirm(question, default
        =
        True
        ): 
       
        """ 
       
        Ask user a yes/no question and return their response as True or False. 
       
        :parameter question: 
       
        ``question`` should be a simple, grammatically complete question such as 
       
        "Do you wish to continue?", and will have a string similar to " [Y/n] " 
       
        appended automatically. This function will *not* append a question mark for 
       
        you. 
       
        The prompt string, if given,is printed without a trailing newline before reading. 
       
        :parameter default: 
       
        By default, when the user presses Enter without typing anything, "yes" is 
       
        assumed. This can be changed by specifying ``default=False``. 
       
        :return True or False 
       
        """ 
       
        # Set up suffix 
       
        if 
        default: 
       
        # suffix = "Y/n, default=True" 
       
        suffix 
        = 
        "Y/n" 
       
        else
        : 
       
        # suffix = "y/N, default=False" 
       
        suffix 
        = 
        "y/N" 
       
        # Loop till we get something we like 
       
        while 
        True
        : 
       
        response 
        = 
        raw_input
        (
        "%s [%s] " 
        % 
        (question, suffix)).lower() 
       
        # Default 
       
        if 
        not 
        response: 
       
        return 
        default 
       
        # Yes 
       
        if 
        response 
        in 
        [
        'y'
        , 
        'yes'
        ]: 
       
        return 
        True 
       
        # No 
       
        if 
        response 
        in 
        [
        'n'
        , 
        'no'
        ]: 
       
        return 
        False 
       
        # Didn't get empty, yes or no, so complain and loop 
       
        print
        (
        "I didn't understand you. Please specify '(y)es' or '(n)o'."
        ) 
       
        if 
        os.path.exists(filename): 
       
        if 
        confirm(
        "[warning] remake data integrity file \'%s\'?" 
        % 
        filename): 
       
        os.remove(filename) 
       
        print 
        "[successful] data integrity file \'%s\' has been remade." 
        % 
        filename 
       
        sys.exit(
        0
        ) 
       
        else
        : 
       
        print 
        "[warning] data integrity file \'%s\'is not remade." 
        % 
        filename 
       
        sys.exit(
        0
        ) 
       
        else
        : 
       
        print 
        >> sys.stderr, 
        "[error] data integrity file \'%s\'is not exist." 
        % 
        filename 
       
        def 
        checkDataIntegrity(path_to_check, file_to_save): 
       
        from 
        time 
        import 
        sleep 
       
        if 
        not 
        os.path.exists(file_to_save): 
       
        print 
        "[info] data integrity file \'%s\' is not exist." 
        % 
        file_to_save 
       
        print 
        "[info] make a data integrity file to \'%s\'" 
        % 
        file_to_save 
       
        data 
        = 
        makeDataIntegrity(path_to_check) 
       
        saveDataIntegrity(data, file_to_save) 
       
        print 
        "[successful] make a data integrity file to \'%s\', finished!" 
        % 
        file_to_save, 
       
        print 
        "Now you can use this script later to check data integrity." 
       
        else
        : 
       
        old_data 
        = 
        readDataIntegrity(file_to_save) 
       
        new_data 
        = 
        makeDataIntegrity(path_to_check) 
       
        error_flag 
        = 
        True 
       
        for 
        item 
        in 
        old_data.keys(): 
       
        try
        : 
       
        if 
        not 
        old_data[item] 
        =
        = 
        new_data[item]: 
       
        print
        >> sys.stderr, new_data[item], item 
       
        sleep(
        0.01
        ) 
       
        print 
        "\told hash data is %s" 
        % 
        old_data[item], item 
       
        error_flag 
        = 
        False 
       
        except 
        KeyError as e: 
       
        print 
        >> sys.stderr, 
        "[error]"
        , e.message, 
        "Not Exist!" 
       
        error_flag 
        = 
        False 
       
        if 
        error_flag: 
       
        print 
        "[ successful ] passed, All files integrity is ok!" 
       
        if 
        __name__ 
        =
        = 
        '__main__'
        : 
       
        arguments 
        = 
        docopt(__doc__, version
        =
        '1.0.0rc2'
        ) 
       
        if 
        arguments[
        '-r'
        ] 
        or 
        arguments[
        'reset'
        ]: 
       
        if 
        arguments[
        'HASH_FILE'
        ]: 
       
        remakeDataIntegrity(arguments[
        'HASH_FILE'
        ]) 
       
        elif 
        arguments[
        '-g'
        ] 
        or 
        arguments[
        'generate'
        ]: 
       
        if 
        arguments[
        'FILE'
        ] 
        and 
        arguments[
        'HASH_FILE'
        ]: 
       
        checkDataIntegrity(arguments[
        'FILE'
        ], arguments[
        'HASH_FILE'
        ]) 
       
        elif 
        arguments[
        '-c'
        ] 
        or 
        arguments[
        'validate'
        ]: 
       
        if 
        arguments[
        'FILE'
        ] 
        and 
        arguments[
        'HASH_FILE'
        ]: 
       
        checkDataIntegrity(arguments[
        'FILE'
        ], arguments[
        'HASH_FILE'
        ]) 
       
        else
        : 
       
        print 
        >> sys.stderr, 
        "bad parameters" 
       
        sys.stderr.flush() 
       
        print 
        docopt(__doc__, argv
        =
        "--help"
        )

tag:Python校验文件完整性,文件完整性,哈希校验

这个世界属于有天赋的人，
也属于认真的人，
更属于那些
在有天赋的领域认真钻研的人。

加油，together！

--end--

本文转自 urey_pp 51CTO博客，原文链接：http://blog.51cto.com/dgd2010/1897799，如需转载请自行联系原作者

使用Python脚本检验文件系统数据完整性

热门文章

最新文章

相关课程

相关电子书

相关实验场景