flex是一个产生scanners的工具，scanner是一个识别文本中lexical patterns的程序。flex从文件或者标准输入中读取要产生scanner的描述文件。描述文件是正则表达式和C代码的组合。flex产生一个C源码文件lex.yy.c。定义了一个yylex()，这个源文件可以编译、和flex运行库链接产生一个可执行文件。

5 Format of the Input File

flex输入文件包含3个部分，用只包含%%的行分开。

definitions

%%

rules

%%

user code

5.1 Format of the definitions section

definitions 包含名字定义，来简化scanner，相当于宏。还包含start condition。

名字定义形式：

name definition

名字以字母或下划线开始，后面可以跟字母、数字、下划线、破折号。

定义从名字后的第一个非空字符开始到行尾。

名字可以在后续使用{name}的方式引用，扩展成(definition)

未缩进的注释(/*开头)会原封不动的拷贝到输出，直到*/

缩进的文本，和%{ %}之间的文本会原封不动的拷贝到输出。%{ 和 %} 不能缩进。

%top块和 %{ %}块类似，但是它们会放到生成文件的顶部。

%top{

/* This code goes at the "top" of the generated file. */

#include <stdint.h>

#include <inttypes.h>

}

5.2 Format of the rules section

rules section包含下列rules：

1	`pattern action`

pattern不能缩进，action必须从同一行开始。

第一行rule之前的缩进的内容，或者%{ 和 %} 之间的内容，用来声明scanning routine的局部变量。

其它缩进的文本，和%{ %}之间的文本会原封不动的拷贝到输出。%{ 和 %} 不能缩进。

查看一个示例：

$ cat cal1.l 
%%
        int abch = 1;
"+"     { printf("PLUS\n"); }
%{
        {
                int a = 1;
                printf("a %d\n", a);
        }
%}
"-"     { printf("MINUS\n"); }
%%

abch将定义在扫描程序的开头。编译，flex cal1.l，查看编译产物lex.yy.c

/** The main scanner function which does all the work.
 */
YY_DECL
{
        yy_state_type yy_current_state;
        char *yy_cp, *yy_bp;
        int yy_act;
     
        if ( !(yy_init) )
                {    
                (yy_init) = 1; 
 
#ifdef YY_USER_INIT
                YY_USER_INIT;
#endif
 
                if ( ! (yy_start) )
                        (yy_start) = 1; /* first start state */
 
                if ( ! yyin )
                        yyin = stdin;                                                                                                                                                                               
 
                if ( ! yyout )
                        yyout = stdout;
 
                if ( ! YY_CURRENT_BUFFER ) {
                        yyensure_buffer_stack ();
                        YY_CURRENT_BUFFER_LVALUE =
                                yy_create_buffer(yyin,YY_BUF_SIZE );
                }    
 
                yy_load_buffer_state( );
                }    
 
        {    
#line 1 "cal1.l"
 
        int abch = 1; 
#line 686 "lex.yy.c"
 
        while ( /*CONSTCOND*/1 )                /* loops until end-of-file is reached */
 
。。。
case 1:
YY_RULE_SETUP
#line 3 "cal1.l"
{ printf("PLUS\n"); }
        YY_BREAK
 
        {
                int a = 1;
                printf("a %d\n", a);
        }
 
case 2:
YY_RULE_SETUP
#line 10 "cal1.l"
{ printf("MINUS\n"); }
        YY_BREAK
case 3:
YY_RULE_SETUP
#line 11 "cal1.l"
ECHO;
        YY_BREAK
#line 764 "lex.yy.c"
case YY_STATE_EOF(INITIAL):
        yyterminate();

可以看到 abch定义放在了解析的开头。所有的规则集中到了case。规则中的不属于规则的代码块，会放在对应的规则后。但是在YY_BREAK后，看上去没啥实际意义。

5.3 Format of the user code section

user code section的内容会原封不动的拷贝到lex.yy.c，这个section是可选的，如果没有包括这个section，则第二个%%可以省略。

5.4 Comments in the Input

Flex支持c风格的注释，/* */之间的内容都被认为是注释。一旦flex发现注释，它就原封不动的拷贝到产生的源码中。注释可以出现在任何地方，但是有下述例外：

注释不能出现在rule中期待正则表达式的地方。

注释不能出现在definition section的%option行。

下列注释是合法的：

%{
/* code block */
%}
 
/* Definitions Section */
%x STATE_X
 
%%
    /* Rules Section */
ruleA    /* after regex */ { /* code block */ } /* after code block */
        /* Rules Section (indented) */
<STATE_X>{
ruleC    ECHO;
ruleD    ECHO;
%{
/* code block */
%}
}
%%
/* User Code Section */

参考

flex手册

ILD

5 Format of the Input File

5.1 Format of the definitions section

5.2 Format of the rules section

5.3 Format of the user code section

5.4 Comments in the Input