0%

另一种文件监控的方法

通过USN日志进行文件监控

在以前的文章中有一篇是自己实现一个Everything,其中讲了通过readDirectChanges函数进行文件监控并同步的方法。但是这样的方法在监控整个磁盘时好像会漏掉一些文件。

下面介绍另一种方法,通过读取USN日志来进行文件的监控。

代码已经开源到GitHub,之前的ReadDirectoryChanges API的版本也有保存。

File-Engine/C++/fileMonitor at master · XUANXUQAQ/File-Engine (github.com)

File-Engine/C++/fileMonitorReadDirChanges at master · XUANXUQAQ/File-Engine (github.com)

代码以及资料参考自

windows - USN NFTS change notification event interrupt - Stack Overflow

c++ - How can I detect only deleted, changed, and created files on a volume? - Stack Overflow

原理

Obtaining Directory Change Notifications - Win32 apps | Microsoft Learn

在微软官网这篇文章中,详细写了如何获取文件夹的变化通知。

Change Journals - Win32 apps | Microsoft Learn

Keeping an Eye on Your NTFS Drives: the Windows 2000 Change Journal Explained | Microsoft Learn

这里详细介绍了NTFS的usn日志是什么,以及usn日志的数据结构等。

简单来说,每当一个文件进行变动,都会写入usn日志。我们可以通过监控是否有新的usn日志记录写入来判断是否有文件更改,并进行监控。

实现

定义监控类

首先定义一个NTFSChangesWatcher类

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
#pragma once
#include <memory>
#include <string>
#include <Windows.h>
class NTFSChangesWatcher
{
public:
NTFSChangesWatcher(char drive_letter);
~NTFSChangesWatcher() = default;

// Method which runs an infinite loop and waits for new update sequence number in a journal.
// The thread is blocked till the new USN record created in the journal.
void WatchChanges(const bool* flag, void(*)(const std::u16string&), void(*)(const std::u16string&));

private:
HANDLE OpenVolume(char drive_letter);

bool CreateJournal(HANDLE volume);

bool LoadJournal(HANDLE volume, USN_JOURNAL_DATA* journal_data);

bool WaitForNextUsn(PREAD_USN_JOURNAL_DATA read_journal_data) const;

std::unique_ptr<READ_USN_JOURNAL_DATA> GetWaitForNextUsnQuery(USN start_usn);

bool ReadJournalRecords(PREAD_USN_JOURNAL_DATA journal_query, LPVOID buffer,
DWORD& byte_count) const;

USN ReadChangesAndNotify(USN low_usn, char* buffer, void(*)(const std::u16string&), void(*)(const std::u16string&));

std::unique_ptr<READ_USN_JOURNAL_DATA> GetReadJournalQuery(USN low_usn);

void showRecord(std::u16string& full_path, USN_RECORD* record);

char drive_letter_;

HANDLE volume_;

std::unique_ptr<USN_JOURNAL_DATA> journal_;

DWORDLONG journal_id_;

USN last_usn_;

USN max_usn_;

// Flags, which indicate which types of changes you want to listen.
static const int FILE_CHANGE_BITMASK;

static const int kBufferSize;
};

对外的接口函数为WatchChanges

1
void WatchChanges(const bool* flag, void(*)(const std::u16string&), void(*)(const std::u16string&));

函数有三个参数,第一个为停止监控文件标志,当设置为false将会退出循环。第二个参数为当新增文件时的回调函数指针,第三个参数为删除文件时的回调函数指针。

初始化USN日志

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
const int NTFSChangesWatcher::kBufferSize = 1024 * 1024 / 2;

const int NTFSChangesWatcher::FILE_CHANGE_BITMASK = USN_REASON_RENAME_NEW_NAME | USN_REASON_RENAME_OLD_NAME;

NTFSChangesWatcher::NTFSChangesWatcher(char drive_letter) :
drive_letter_(drive_letter)
{
volume_ = OpenVolume(drive_letter_);

journal_ = std::make_unique<USN_JOURNAL_DATA>();

if (const bool res = LoadJournal(volume_, journal_.get()); !res) {
fprintf(stderr, "Failed to load journal");
return;
}
max_usn_ = journal_->MaxUsn;
journal_id_ = journal_->UsnJournalID;
last_usn_ = journal_->NextUsn;
}

首先通过OpenVolume打开磁盘,并返回一个HANDLE,然后分配存储日志的内存空间,接着通过LoadJournal读取usn日志。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
HANDLE NTFSChangesWatcher::OpenVolume(const char drive_letter)
{

wchar_t pattern[10] = L"\\\\?\\a:";

pattern[4] = static_cast<wchar_t>(drive_letter);

const HANDLE volume = CreateFile(
pattern, // lpFileName
// also could be | FILE_READ_DATA | FILE_READ_ATTRIBUTES | SYNCHRONIZE
GENERIC_READ | GENERIC_WRITE | SYNCHRONIZE, // dwDesiredAccess
FILE_SHARE_READ | FILE_SHARE_WRITE | FILE_SHARE_DELETE, // share mode
nullptr, // default security attributes
OPEN_EXISTING, // disposition
// It is always set, no matter whether you explicitly specify it or not. This means, that access
// must be aligned with sector size so we can only read a number of bytes that is a multiple of the sector size.
FILE_FLAG_NO_BUFFERING, // file attributes
nullptr // do not copy file attributes
);

if (volume == INVALID_HANDLE_VALUE) {
// An error occurred!
fprintf(stderr, "Failed to open volume");
return nullptr;
}

return volume;
}

获取HANDLE后,通过LoadJournal获取USN日志,第一次读取失败将会尝试创建后再次尝试读取。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
bool NTFSChangesWatcher::LoadJournal(HANDLE volume, USN_JOURNAL_DATA* journal_data)
{

DWORD byte_count;

// Try to open journal.
if (!DeviceIoControl(volume,
FSCTL_QUERY_USN_JOURNAL,
nullptr,
0,
journal_data,
sizeof(*journal_data),
&byte_count,
nullptr))
{
// If failed (for example, in case journaling is disabled), create journal and retry.

if (CreateJournal(volume)) {
return LoadJournal(volume, journal_data);
}

return false;
}
return true;
}


bool NTFSChangesWatcher::CreateJournal(HANDLE volume)
{

DWORD byte_count;
CREATE_USN_JOURNAL_DATA create_journal_data{};

const bool ok = DeviceIoControl(volume, // handle to volume
FSCTL_CREATE_USN_JOURNAL, // dwIoControlCode
&create_journal_data, // input buffer
sizeof(create_journal_data), // size of input buffer
nullptr, // lpOutBuffer
0, // nOutBufferSize
&byte_count, // number of bytes returned
nullptr) != 0; // OVERLAPPED structure

if (!ok) {
// An error occurred!
}

return ok;
}

开始监控

初始化完成之后就可以调用WatchChanges函数

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
void NTFSChangesWatcher::WatchChanges(const bool* flag,
void(*file_added_callback_func)(const std::u16string&),
void(*file_removed_callback_func)(const std::u16string&))
{
const auto u_buffer = std::make_unique<char[]>(kBufferSize);

const auto read_journal_query = GetWaitForNextUsnQuery(last_usn_);

while (*flag)
{
// This function does not return until new USN record created.
WaitForNextUsn(read_journal_query.get());
last_usn_ = ReadChangesAndNotify(read_journal_query->StartUsn,
u_buffer.get(),
file_added_callback_func,
file_removed_callback_func);
read_journal_query->StartUsn = last_usn_;
}
delete flag;
}

核心的方法就两个,一个WaitForNextUsn,一个ReadChangesAndNotify

首先来看WaitForNextUsn

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
bool NTFSChangesWatcher::WaitForNextUsn(PREAD_USN_JOURNAL_DATA read_journal_data) const
{

DWORD bytes_read;

// This function does not return until new USN record created.
const bool ok = DeviceIoControl(volume_,
FSCTL_READ_USN_JOURNAL,
read_journal_data,
sizeof(*read_journal_data),
&read_journal_data->StartUsn,
sizeof(read_journal_data->StartUsn),
&bytes_read,
nullptr) != 0;
return ok;
}

通过DeviceIoControl函数,发送FSCTL_READ_USN_JOURNAL事件,由于我们之前初始化的时候设置了从最后一个usn记录开始读取,这时该方法将会阻塞直到用户进行操作,NTFS写入一个新的USN日志。

这里的最后一个参数lpOverlapped必须为NULL,因为我们要监控文件的变化,需要阻塞函数,如果是异步调用反而会有各种各样的不方便。

关于DeviceIoControl函数网上已经有很多解释,这里就放个msdn吧。

DeviceIoControl function (ioapiset.h) - Win32 apps | Microsoft Learn

以及FSCTL_READ_USN_JOURNAL

FSCTL_READ_USN_JOURNAL - Win32 apps | Microsoft Learn

当该方法返回后,代表磁盘中出现了一个新的usn记录,这时就会执行到下一个函数

ReadChangesAndNotify

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
USN NTFSChangesWatcher::ReadChangesAndNotify(USN low_usn,
char* buffer,
void(*file_added_callback_func)(const std::u16string&),
void(*file_removed_callback_func)(const std::u16string&))
{

DWORD byte_count;

const auto journal_query = GetReadJournalQuery(low_usn);
memset(buffer, 0, kBufferSize);
if (!ReadJournalRecords(journal_query.get(), buffer, byte_count))
{
// An error occurred.
return low_usn;
}

auto record = reinterpret_cast<USN_RECORD*>(reinterpret_cast<USN*>(buffer) + 1);
const auto record_end = reinterpret_cast<USN_RECORD*>(reinterpret_cast<BYTE*>(buffer) + byte_count);

std::u16string full_path;
for (; record < record_end;
record = reinterpret_cast<USN_RECORD*>(reinterpret_cast<BYTE*>(record) + record->RecordLength))
{
const auto reason = record->Reason;
full_path.clear();
// It is really strange, but some system files creating and deleting at the same time.
if ((reason & USN_REASON_FILE_CREATE) && (reason & USN_REASON_FILE_DELETE))
{
continue;
}
if ((reason & USN_REASON_FILE_CREATE) && (reason & USN_REASON_CLOSE))
{
showRecord(full_path, record);
file_added_callback_func(full_path);
}
else if ((reason & USN_REASON_FILE_DELETE) && (reason & USN_REASON_CLOSE))
{
showRecord(full_path, record);
file_removed_callback_func(full_path);
}
else if (reason & FILE_CHANGE_BITMASK)
{
if (reason & USN_REASON_RENAME_OLD_NAME)
{
showRecord(full_path, record);
file_removed_callback_func(full_path);
}
else if (reason & USN_REASON_RENAME_NEW_NAME)
{
showRecord(full_path, record);
file_added_callback_func(full_path);
}
}
}
return *reinterpret_cast<USN*>(buffer);
}

这里ReadJournalRecords将会调用DeviceIoControl函数发送FSCTL_READ_USN_JOURNAL读出新的USN日志记录。

读取完成后,通过获取USN_RECORD中的reason字段,得到文件是创建,还是被删除。其实还有很多其他的USN_REASON,不过这里由于只需要检测文件变化,因此只监听了

  • USN_REASON_FILE_CREATE

  • USN_REASON_FILE_DELETE

  • USN_REASON_RENAME_OLD_NAME

  • USN_REASON_RENAME_NEW_NAME

所有的原因可以参考这里

USN_RECORD_V2 - Win32 apps | Microsoft Learn

获取文件完整路径

由于USN日志中记录的只有文件名和文件参照号,因此我们需要通过文件参照号和父文件参照号不断向上查询,拼接出完整的路径。

也就是上面的showRecord函数,该函数有两个参数,full_path,USN_RECORD指针类型的record,也就是需要拼接出完整路径的文件记录。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
void NTFSChangesWatcher::showRecord(std::u16string& full_path, USN_RECORD* record)
{
static std::wstring sep_wstr(L"\\");
static std::u16string sep(sep_wstr.begin(), sep_wstr.end());

const indexer_common::FileInfo file_info(*record, drive_letter_);
if (full_path.empty())
{
full_path += file_info.GetName();
}
else
{
full_path = file_info.GetName() + sep + full_path;
}
DWORD byte_count = 1;
auto buffer = std::make_unique<char[]>(kBufferSize);

MFT_ENUM_DATA_V0 med;
med.StartFileReferenceNumber = record->ParentFileReferenceNumber;
med.LowUsn = 0;
med.HighUsn = max_usn_;

if (!DeviceIoControl(volume_,
FSCTL_ENUM_USN_DATA,
&med,
sizeof(med),
buffer.get(),
kBufferSize,
&byte_count,
nullptr))
{
return;
}

auto* parent_record = reinterpret_cast<USN_RECORD*>(reinterpret_cast<USN*>(buffer.get()) + 1);

if (parent_record->FileReferenceNumber != record->ParentFileReferenceNumber)
{
static std::wstring colon_wstr(L":");
static std::u16string colon(colon_wstr.begin(), colon_wstr.end());
std::string drive;
drive += drive_letter_;
auto&& w_drive = string2wstring(drive);
const std::u16string drive_u16(w_drive.begin(), w_drive.end());
full_path = drive_u16 + colon + sep + full_path;
return;
}
showRecord(full_path, parent_record);
}

首先获得文件名和父文件参照号,然后定义一个MFT_ENUM_DATA,由于MFT_ENUM_DATA_V1会报错Error 87,也就是ERROR_INVALID_PARAMETER,所以这里改成了MFT_ENUM_DATA_V0

System Error Codes (0-499) (WinError.h) - Win32 apps | Microsoft Learn

将开始查询地址设置为record->ParentFileReferenceNumber,并将上界设置为最开始初始化的max_usn_。

然后调用DeviceIoControl,发送FSCTL_ENUM_USN_DATA事件,就可以读取出record的父文件夹的USN记录。

这时,将查询出的父文件夹记录再作为record进行递归查询。

不断向上查询,将文件名拼接到full_path中,最后找到顶层退出递归即可。

获得文件完整路径后,即可调用两个回调函数进行处理了。