hexdump可以格式化打印一段数据(标准输入或文件),打印成八进制、十进制、十六进制。通常用来查看二进制文件。
也常常通过自定义格式,来进行数据处理,比如将4字节二进制整数转换成可读的数字字符串。
预定义格式打印:-b, -c, -C, -d, -o, -x
-b, 一个字节,8进制打印,一行16个字节,开头是16进制偏移。每个字节3个字符,前导0填充。如
$ hexdump -b -n 30 /dev/urandom
0000000 036 327 155 017 205 003 020 215 121 017 316 252 212 067 206 011
0000010 165 121 340 327 207 263 273 067 263 070 110 343 320 263
000001e
-c, 一个字节,按字符打印,如果是不可打印字符,则打印编码,或者转义名称如\n
$ hexdump -c -n 30 /dev/urandom
0000000 221 B 4 | ( 232 & \a ;
0000010 o 030 022 ] \t 237 006 ~
000001e
-C,用的最多的查看二进制文件选项,hex+ASCII显示
$ hexdump -C -n 30 /dev/urandom
00000000 b1 98 2f 62 86 57 9e e9 8f f4 11 dd 35 7b 36 b7 |../b.W......5{6.|
00000010 c7 7a 29 b7 3f 08 77 6b 84 89 14 da 1e ea |.z).?.wk......|
0000001e
-d,-o,-x
按两个字节显示,分别为十进制、八进制、十六进制。也是16个字符,8个数字
# hexdump -x -n 30 /dev/urandom
0000000 be52 468a 34b2 1e22 975c 73b6 e886 cfdd
0000010 ef18 7194 0bae 567e ee80 ec13 c4bc
000001e
需要注意的,上面第一个数字是0xbe52,但是在x86上,第一个字节确是0x52,因为x86是小端字节序。
这里和字节序有关,他等价于:
unsigned short *v = (unsigned short *)data;
printf("%02x", v)
-n length
只打印前length个字节
-s offset
跳过offset个字节
-v
默认是缩略打印,也就是如果一段数据持续相同的话,会只打印一个*
# hexdump -n 1024 /dev/zero
0000000 0000 0000 0000 0000 0000 0000 0000 0000
*
0000400
如果使用-v选项,则不缩写。
-e选项定义格式,格式由一个或多个格式单元组成。一个-e选项,也可以包含多个格式单元,也可以有多个-e选项。
每个格式单元可以消耗0个或多个字节。格式:
[ <iteration count> / <byte count> ] "<format string>"
iteration count,是格式字符串的循环次数,
byte count,是一次读取几个字节,
format string,是格式字符串,和c printf函数的格式字符串类似,但是有一些特殊的格式字符:
NUL \0
<alert character> \a
<backspace> \b
<form-feed> \f
<newline> \n
<carriage return> \r
<tab> \t
<vertical tab> \v
%_a,后面跟d或o或x,表示偏移是十进制、八进制、十六进制
%_A, 和_a类似,但是只有在最后结尾的时候,才输出地址,中间循环输出空。
%_c,输出字符,如果字符不可打印,输出转义或者8进制ascii码
%_p,输出字符,如果字符不可打印,输出.
每个格式单元,会消耗 <iteration count> * <byte count> 个字节,即使format string没有使用它们。
$ echo -ne '\x1\x2\x3\x4\x5\x6\x7\x8' | hexdump -e '4/1 "%02x " 2/1 "\n"'
01 02 03 04
07 08
如上,第一个格式单元,一次读取1个字节,读取4次,所以格式化依次打印4个字节,然后第二个格式单元,打印换行,
如果没有指定byte count,则根据格式来,
%_c, %_p, %_u, %c, 1个字节
%d, %i, %o, %u, %X, %x, 4个字节
%E, %e, %f, %G, %g, 8个字节
一个格式单元,只能有一个格式化字符,多了报错
$ hexdump -e '4/1 "%02x %d"'
hexdump: byte count with multiple conversion characters
每个格式单元消耗固定数量的输入数据,为 byte_count * interation_count
The amount of data interpreted by each format string is the sum of the data required by each
format unit, which is the iteration count times the byte count, or the iteration count times
the number of bytes required by the format if the byte count is not specified.
输入数据按块读入,块的大小是所有格式单元中最大需求的块的大小
The input is manipulated in ``blocks'', where a block is defined as the largest amount of
data specified by any format string. Format strings interpreting less than an input block's
worth of data, whose last format unit both interprets some number of bytes and does not have
a specified iteration count, have the iteration count incremented until the entire input
block has been processed or there is not enough data remaining in the block to satisfy the
format string.
If, either as a result of user specification or hexdump modifying the iteration count as de‐
scribed above, an iteration count is greater than one, no trailing whitespace characters are
output during the last iteration.
It is an error to specify a byte count as well as multiple conversion characters or strings
unless all but one of the conversion characters or strings is _a or _A.
If, as a result of the specification of the -n option or end-of-file being reached, input
data only partially satisfies a format string, the input block is zero-padded sufficiently to
display all available data (i.e., any format units overlapping the end of data will display
some number of the zero bytes).
Further output by such format strings is replaced by an equivalent number of spaces. An
equivalent number of spaces is defined as the number of spaces output by an s conversion
character with the same field width and precision as the original conversion character or
conversion string but with any “+”, “ ”, “#” conversion flag characters removed, and refer‐
encing a NULL string.
同一个-e选项的多个格式单元是串行消耗block。
多个-e选项的格式单元是消耗的同一个block。
参考
man hexdump