本文学习如何在内核进行文件的读写等操作。
本文又引入路径、文件相关的结构体如下:
这个结构体是内核的路径的一种表示。内核提供接口将路径字符串转换成这个结构体。
struct filename *
getname(const char __user * filename)
{
return getname_flags(filename, 0);
}
struct filename *
getname_kernel(const char * filename)
getname将用户态的字符串指针转换成struct filename
getname_kernel将内核态的字符串指针转换成struct filename
这两个接口定义在fs/namei.c
这个结构体表示内核的一个路径。前面已经学习过了。它只包含两个成员:vfsmount/dentry。表示挂载点及内部的目录树。
内核导出了kern_path接口,将路径字符串转换成path
int kern_path(const char *name, unsigned int flags, struct path *path)
{
struct filename *filename = getname_kernel(name);
int ret = filename_lookup(AT_FDCWD, filename, flags, path, NULL);
putname(filename);
return ret;
}
EXPORT_SYMBOL(kern_path);
如果路径是来自用户态的指针,则使用
int user_path_at(int dfd, const char __user *name, unsigned flags,
struct path *path)
{
struct filename *filename = getname_flags(name, flags);
int ret = filename_lookup(dfd, filename, flags, path, NULL);
putname(filename);
return ret;
}
EXPORT_SYMBOL(user_path_at);
这个结构体表示一个打开的文件。定义在linux/fs.h,这个结构体非常大。
介绍内核如何进行文件的打开和读写。open接口定义在fs/open.c,read/write接口定义在fs/read_write.c
可以阅读这些源码文件,查看内核导出了哪些接口。
有些资料,使用sys_open/sys_read的方式:
int fd = sys_open(filename, O_RDONLY, 0);
这个是系统调用函数。但是在arm64内核上,开启了CONFIG_ARCH_HAS_SYSCALL_WRAPPER=y,就没有这个接口了。见参考【4】。
include/linux/syscalls.h头文件中:
#ifndef CONFIG_ARCH_HAS_SYSCALL_WRAPPER
asmlinkage long sys_io_setup(unsigned nr_reqs, aio_context_t __user *ctx);
只有没有定义CONFIG_ARCH_HAS_SYSCALL_WRAPPER,才会包含这个定义。定义了这个宏,会使用arch里面的:
#ifdef CONFIG_ARCH_HAS_SYSCALL_WRAPPER
#include <asm/syscall_wrapper.h>
#endif /* CONFIG_ARCH_HAS_SYSCALL_WRAPPER */
见:arch/arm64/include/asm/syscall_wrapper.h
所以在arm64架构上,这个方法不行。读取/proc/kallsyms,有下面这些接口:
# cat /proc/kallsyms | grep sys_open
ffff80008003b754 W compat_sys_open_by_handle_at
ffff80008003bbdc W __arm64_sys_open_by_handle_at
ffff8000800c7374 t do_sys_openat2
ffff8000800c74dc T do_sys_open
ffff8000800c7514 T __arm64_sys_open
ffff8000800c7538 T __arm64_sys_openat
ffff8000800c755c T __arm64_sys_openat2
ffff8000800e74e0 T __arm64_sys_open_tree
ffff80008010b6b8 t proc_sys_open
根据参考文档中的信息,这些接口接收的用户态的指针,需要
mm_segment_t old_fs = get_fs();
set_fs(KERNEL_DS);
这个是正经的内核打开文件的接口,定语在fs/open.c
/**
* filp_open - open file and return file pointer
*
* @filename: path to open
* @flags: open flags as per the open(2) second argument
* @mode: mode for the new file if O_CREAT is set, else ignored
*
* This is the helper to open a file from kernelspace if you really
* have to. But in generally you should not do this, so please move
* along, nothing to see here..
*/
struct file *filp_open(const char *filename, int flags, umode_t mode)
{
struct filename *name = getname_kernel(filename);
struct file *file = ERR_CAST(name);
if (!IS_ERR(name)) {
file = file_open_name(name, flags, mode);
putname(name);
}
return file;
}
EXPORT_SYMBOL(filp_open);
内核读写,提供了:
ssize_t kernel_read(struct file *file, void *buf, size_t count, loff_t *pos)
ssize_t kernel_write(struct file *file, const void *buf, size_t count, loff_t *pos)
vfs_read/vfs_write是给用户态接口用的。它的参数为用户指针。
关闭file接口:filp_close()
定义在fs/kernel_read_file.c
比如内核的firmware加载模块,会调用这个接口:
drivers/base/firmware_loader/main.c
fw_get_filesystem_firmware()
使用:
$ grep EXPORT_SYMBOL fs/*.c
可以找出fs导出了哪些接口。
比如要读取一个文件的属性,fs/stat.c导出了接口:
int vfs_getattr(const struct path *path, struct kstat *stat,
u32 request_mask, unsigned int query_flags)
{
int retval;
if (WARN_ON_ONCE(query_flags & AT_GETATTR_NOSEC))
return -EPERM;
retval = security_inode_getattr(path);
if (retval)
return retval;
return vfs_getattr_nosec(path, stat, request_mask, query_flags);
}
EXPORT_SYMBOL(vfs_getattr)
fs/namei.c导出了创建普通文件的接口:
/**
* vfs_create - create new file
* @idmap: idmap of the mount the inode was found from
* @dir: inode of the parent directory
* @dentry: dentry of the child file
* @mode: mode of the child file
* @want_excl: whether the file must not yet exist
*
* Create a new file.
*
* If the inode has been found through an idmapped mount the idmap of
* the vfsmount must be passed through @idmap. This function will then take
* care to map the inode according to @idmap before checking permissions.
* On non-idmapped mounts or if permission checking is to be performed on the
* raw inode simply pass @nop_mnt_idmap.
*/
int vfs_create(struct mnt_idmap *idmap, struct inode *dir,
struct dentry *dentry, umode_t mode, bool want_excl)
{
int error;
error = may_create(idmap, dir, dentry);
if (error)
return error;
if (!dir->i_op->create)
return -EACCES; /* shouldn't it be ENOSYS? */
mode = vfs_prepare_mode(idmap, dir, mode, S_IALLUGO, S_IFREG);
error = security_inode_create(dir, dentry, mode);
if (error)
return error;
error = dir->i_op->create(idmap, dir, dentry, mode, want_excl);
if (!error)
fsnotify_create(dir, dentry);
return error;
}
EXPORT_SYMBOL(vfs_create);
vfs_create()的第一个参数是mnt_idmap。这个机制用来实现文件系统的用户id和内核实际id不一致的场景,比如docker。如果没有启用这个机制,可以传nop_mnt_idmap。
第二个参数是父目录的inode。
第三个参数是子文件的dentry,由于子文件还不存在,所以这是一个nagative dentry。
内核提供接口从路径创建dentry,可以参考do_mknodat()函数调用:filename_create()
dentry = filename_create(dfd, name, &path, lookup_flags);
【1】Linux Journal. Driving Me Nuts - Things You Never Should Do in the Kernel.
https://www.linuxjournal.com/article/8110
【2】Chris. Writing to a file from the Kernel.
https://benninger.ca/posts/writing-to-a-file-from-the-kernel/
【3】Slavaim.
https://github.com/slavaim/Linux-kernel-modules/blob/master/readfile/readfile.c
【4】Dominik Brodowski. syscalls: introduce CONFIG_ARCH_HAS_SYSCALL_WRAPPER
https://lkml.org/lkml/2018/4/5/143
【5】idmappings
https://www.kernel.org/doc/html/latest/filesystems/idmappings.html
【6】 Jake Edge. ID-mapped mounts
https://lwn.net/Articles/896255/