一个GLSL Shader的格式化算法(LALR解析器)
在进行OpenGL程序开发时,我需要自行解析`string`类型的Shader代码,抽取出里面的某些变量名和subroutine名。
由于找不到可用的GLSL Shader解析器,就照着虎书(《现代编译原理-c语言描述》)自己写了个LALR Generator,实际上包含了(词法分析器+语法分析器+格式化框架)的(LR(0)、SLR(1)、LALR(1)、LR(1))全自动生成,支持yacc的优先级指令和Shift/Reduce、Reduce/Reduc冲突解决功能。
本文的GLSL Shader的格式化工具(下载链接在这里https://files.cnblogs.com/files/bitzhuwei/GLSL.Formatter.net8.0.rar?t=1735122490&download=true ),就是用我的LALR Generator,根据GLSL4.60.8的文法生成的,如下所述。
GLSL4.60.8的文法
#extractor <GLSL4.60.8.st.ext> translation_unit : external_declaration | translation_unit external_declaration ; external_declaration : function_definition | declaration | ';' ; function_definition : function_prototype compound_statement/* compound_statement_no_new_scope */ ; variable_identifier : 'identifier' ; primary_expression : variable_identifier | 'intConstant' | 'uintConstant' | 'floatConstant' | 'boolConstant' | 'doubleConstant' | '(' expression ')' ; postfix_expression : primary_expression | postfix_expression '[' integer_expression ']' | function_call | postfix_expression '.' 'identifier' // FIELD_SELECTION | postfix_expression '++' | postfix_expression '--' ; integer_expression : expression ; function_call : function_call_or_method ; function_call_or_method : function_call_generic ; function_call_generic : function_call_header_with_parameters ')' | function_call_header_no_parameters ')' ; function_call_header_no_parameters : function_call_header 'void' | function_call_header ; function_call_header_with_parameters : function_call_header assignment_expression | function_call_header_with_parameters ',' assignment_expression ; function_call_header : function_identifier '(' ; function_identifier : type_specifier | postfix_expression ; unary_expression : postfix_expression | '++' unary_expression | '--' unary_expression | unary_operator unary_expression ; unary_operator : '+' | '-' | '!' | '~' ; multiplicative_expression : unary_expression | multiplicative_expression '*' unary_expression | multiplicative_expression '/' unary_expression | multiplicative_expression '%' unary_expression ; additive_expression : multiplicative_expression | additive_expression '+' multiplicative_expression | additive_expression '-' multiplicative_expression ; shift_expression : additive_expression | shift_expression '<<' additive_expression | shift_expression '>>' additive_expression ; relational_expression : shift_expression | relational_expression '<' shift_expression | relational_expression '>' shift_expression | relational_expression '<=' shift_expression | relational_expression '>=' shift_expression ; equality_expression : relational_expression | equality_expression '==' relational_expression | equality_expression '!=' relational_expression ; and_expression : equality_expression | and_expression '&' equality_expression ; exclusive_or_expression : and_expression | exclusive_or_expression '^' and_expression ; inclusive_or_expression : exclusive_or_expression | inclusive_or_expression '|' exclusive_or_expression ; logical_and_expression : inclusive_or_expression | logical_and_expression '&&' inclusive_or_expression ; logical_xor_expression : logical_and_expression | logical_xor_expression '^^' logical_and_expression ; logical_or_expression : logical_xor_expression | logical_or_expression '||' logical_xor_expression ; conditional_expression : logical_or_expression | logical_or_expression '?' expression ':' assignment_expression ; assignment_expression : conditional_expression | unary_expression assignment_operator assignment_expression ; assignment_operator : '=' | '*=' | '/=' | '%=' | '+=' | '-=' | '<<=' | '>>=' | '&=' | '^=' | '|=' ; expression : assignment_expression | expression ',' assignment_expression ; constant_expression : conditional_expression ; declaration : function_prototype ';' | init_declarator_list ';' | 'precision' precision_qualifier type_specifier ';' | type_qualifier 'identifier' '{' struct_declaration_list '}' ';' | type_qualifier 'identifier' '{' struct_declaration_list '}' 'identifier' ';' | type_qualifier 'identifier' '{' struct_declaration_list '}' 'identifier' array_specifier ';' | type_qualifier ';' | type_qualifier 'identifier' ';' | type_qualifier 'identifier' identifier_list ';' ; identifier_list : ',' 'identifier' | identifier_list ',' 'identifier' ; function_prototype : function_declarator ')' ; function_declarator : function_header | function_header_with_parameters ; function_header_with_parameters : function_header parameter_declaration | function_header_with_parameters ',' parameter_declaration ; function_header : fully_specified_type 'identifier' '(' ; parameter_declarator : type_specifier 'identifier' | type_specifier 'identifier' array_specifier ; parameter_declaration : type_qualifier parameter_declarator | parameter_declarator | type_qualifier parameter_type_specifier | parameter_type_specifier ; parameter_type_specifier : type_specifier ; init_declarator_list : single_declaration | init_declarator_list ',' 'identifier' | init_declarator_list ',' 'identifier' array_specifier | init_declarator_list ',' 'identifier' array_specifier '=' initializer | init_declarator_list ',' 'identifier' '=' initializer ; single_declaration : fully_specified_type | fully_specified_type 'identifier' | fully_specified_type 'identifier' array_specifier | fully_specified_type 'identifier' array_specifier '=' initializer | fully_specified_type 'identifier' '=' initializer ; fully_specified_type : type_specifier | type_qualifier type_specifier ; invariant_qualifier : 'invariant' ; interpolation_qualifier : 'smooth' | 'flat' | 'noperspective' ; layout_qualifier : 'layout' '(' layout_qualifier_id_list ')' ; layout_qualifier_id_list : layout_qualifier_id | layout_qualifier_id_list ',' layout_qualifier_id ; layout_qualifier_id : 'identifier' | 'identifier' '=' constant_expression | 'shared' ; precise_qualifier : 'precise' ; type_qualifier : single_type_qualifier | type_qualifier single_type_qualifier ; single_type_qualifier : storage_qualifier | layout_qualifier | precision_qualifier | interpolation_qualifier | invariant_qualifier | precise_qualifier ; storage_qualifier : 'const' | 'in' | 'out' | 'inout' | 'centroid' | 'patch' | 'sample' | 'uniform' | 'buffer' | 'shared' | 'coherent' | 'volatile' | 'restrict' | 'readonly' | 'writeonly' | 'subroutine' | 'subroutine' '(' type_name_list ')' ; type_name_list : 'type_name' | type_name_list ',' 'type_name' ; type_specifier : type_specifier_nonarray | type_specifier_nonarray array_specifier ; array_specifier : '[' ']' | '[' conditional_expression ']' | array_specifier '[' ']' | array_specifier '[' conditional_expression ']' ; type_specifier_nonarray : 'void' | 'float' | 'double' | 'int' | 'uint' | 'bool' | 'vec2' | 'vec3' | 'vec4' | 'dvec2' | 'dvec3' | 'dvec4' | 'bvec2' | 'bvec3' | 'bvec4' | 'ivec2' | 'ivec3' | 'ivec4' | 'uvec2' | 'uvec3' | 'uvec4' | 'mat2' | 'mat3' | 'mat4' | 'mat2x2' | 'mat2x3' | 'mat2x4' | 'mat3x2' | 'mat3x3' | 'mat3x4' | 'mat4x2' | 'mat4x3' | 'mat4x4' | 'dmat2' | 'dmat3' | 'dmat4' | 'dmat2x2' | 'dmat2x3' | 'dmat2x4' | 'dmat3x2' | 'dmat3x3' | 'dmat3x4' | 'dmat4x2' | 'dmat4x3' | 'dmat4x4' | 'atomic_uint' | 'sampler2D' | 'sampler3D' | 'samplerCube' | 'sampler2DShadow' | 'samplerCubeShadow' | 'sampler2DArray' | 'sampler2DArrayShadow' | 'samplerCubeArray' | 'samplerCubeArrayShadow' | 'isampler2D' | 'isampler3D' | 'isamplerCube' | 'isampler2DArray' | 'isamplerCubeArray' | 'usampler2D' | 'usampler3D' | 'usamplerCube' | 'usampler2DArray' | 'usamplerCubeArray' | 'sampler1D' | 'sampler1DShadow' | 'sampler1DArray' | 'sampler1DArrayShadow' | 'isampler1D' | 'isampler1DArray' | 'usampler1D' | 'usampler1DArray' | 'sampler2DRect' | 'sampler2DRectShadow' | 'isampler2DRect' | 'usampler2DRect' | 'samplerBuffer' | 'isamplerBuffer' | 'usamplerBuffer' | 'sampler2DMS' | 'isampler2DMS' | 'usampler2DMS' | 'sampler2DMSArray' | 'isampler2DMSArray' | 'usampler2DMSArray' | 'image2D' | 'iimage2D' | 'uimage2D' | 'image3D' | 'iimage3D' | 'uimage3D' | 'imageCube' | 'iimageCube' | 'uimageCube' | 'imageBuffer' | 'iimageBuffer' | 'uimageBuffer' | 'image1D' | 'iimage1D' | 'uimage1D' | 'image1DArray' | 'iimage1DArray' | 'uimage1DArray' | 'image2DRect' | 'iimage2DRect' | 'uimage2DRect' | 'image2DArray' | 'iimage2DArray' | 'uimage2DArray' | 'imageCubeArray' | 'iimageCubeArray' | 'uimageCubeArray' | 'image2DMS' | 'iimage2DMS' | 'uimage2DMS' | 'image2DMSArray' | 'iimage2DMSArray' | 'uimage2DMSArray' | struct_specifier | 'type_name' ; precision_qualifier : 'highp' | 'mediump' | 'lowp' ; struct_specifier : 'struct' 'type_name'/* 'identifier' */ '{' struct_declaration_list '}' | 'struct' '{' struct_declaration_list '}' ; struct_declaration_list : struct_declaration | struct_declaration_list struct_declaration ; struct_declaration : type_specifier struct_declarator_list ';' | type_qualifier type_specifier struct_declarator_list ';' ; struct_declarator_list : struct_declarator | struct_declarator_list ',' struct_declarator ; struct_declarator : 'identifier' | 'identifier' array_specifier ; initializer : assignment_expression | '{' initializer_list '}' | '{' initializer_list ',' '}' ; initializer_list : initializer | initializer_list ',' initializer ; declaration_statement : declaration ; statement : compound_statement | simple_statement ; simple_statement : declaration_statement | expression_statement | selection_statement | switch_statement | case_label | iteration_statement | jump_statement ; compound_statement : '{' '}' | '{' statement_list '}' ; /* merge into statement statement_no_new_scope : compound_statement_no_new_scope | simple_statement ; */ /* merge into compound_statement compound_statement_no_new_scope : '{' '}' | '{' statement_list '}' ; */ statement_list : statement | statement_list statement ; expression_statement : ';' | expression ';' ; selection_statement : 'if' '(' expression ')' selection_rest_statement ; selection_rest_statement : statement 'else' statement | statement ; condition : expression | fully_specified_type 'identifier' '=' initializer ; switch_statement : 'switch' '(' expression ')' '{' switch_statement_list '}' ; switch_statement_list : empty | statement_list ; case_label : 'case' expression ':' | 'default' ':' ; iteration_statement : 'while' '(' condition ')' statement/* statement_no_new_scope */ | 'do' statement 'while' '(' expression ')' ';' | 'for' '(' for_init_statement for_rest_statement ')' statement/* statement_no_new_scope */ ; for_init_statement : expression_statement | declaration_statement ; conditionopt : condition | empty ; for_rest_statement : conditionopt ';' | conditionopt ';' expression ; jump_statement : 'continue' ';' | 'break' ';' | 'return' ';' | 'return' expression ';' | 'discard' ';' ; // lexical statements // no need : 'struct' 'identifier' '{' struct_declaration_list '}' // now I changed it into 'struct' 'type_name' '{' struct_declaration_list '}' // only identifier next to 'struct' is a user-defined type and should be a 'type_name' token. 'type_name' %%<'struct'>[a-zA-Z_][a-zA-Z0-9_]*%% 'intConstant' %%[-+]?[0-9]+%% 'intConstant' %%0x[0-9A-Fa-f]+%% 'uintConstant' %%[-+]?[0-9]+[uU]%% 'uintConstant' %%0x[0-9A-Fa-f]+[uU]%% 'floatConstant' %%[-+]?[0-9]+([.][0-9]+)?([Ee][-+]?[0-9]+)?[fF]%% 'boolConstant' %%true/[^a-zA-Z0-9_]%% 'boolConstant' %%false/[^a-zA-Z0-9_]%% 'doubleConstant' %%[-+]?[0-9]+([.][0-9]+)?([Ee][-+]?[0-9]+)?%% 'identifier' %%[a-zA-Z_][a-zA-Z0-9_]*%% %grammarName GLSL %blockComment on %inlineComment on
有了解析器,就有了单词流List<Token>
和语法树Node
。面对GLSL代码的语法树,我们如何对源代码进行格式化呢?
格式化的目的是让源代码从书写上更适宜人类阅读。具体来说,格式化要做的事,只是增减某些空白符(空格、tab符、换行符)而已,不会修改具有语法意义的内容。
在C#中,
if (x >0) {t = x;x = 0; }
会被格式化为
if (x > 0) { t = x; x = 0; }
而
if (x > 0) {t = x; x= 0; }
会被格式化为
if (x > 0) { t = x; x = 0; }
同样意义的源代码,格式化结果却不同。这有些复杂。
我们将问题拆开,一步一步来。
首先,我们忽略多行注释blockComment、单行注释inlineComment和源代码中原有的空白符。
此时,一个if(x>0){t=x;x=0;}
就应当被格式化为:
if (x > 0) { t = x; x = 0; }
也就是说,语法树中的每一部分,都可以自己判断出应当如何格式化。用递归的方式遍历语法树即可。
现在来考虑注释。注释可能出现在List<Token>
(词法分析器的分析结果)里的任意位置上,属实调皮。于是就有这样一个想法,利用C#的yield return
语法糖,我们执行下述算法:
遍历`List<Token>`: 如果下一个要进行格式化的Token是注释,那么就在当前位置输出它并换行。 如果下一个要进行格式化的Token不是注释,那么就用`yield return`的方式,在语法树上格式化地输出它(输出它+某些空白符)。
只要在语法树每输出一个Token
时,都进行一次yield return
(return什么无所谓),那么yield return
就能和遍历List<Token>
的过程同步完成。
现在能够带着注释格式化了,但是无法解决这样的情况:上一行和下一行代码之间有好几个空行(即换行符),上述格式化过程会无视之。
完全的格式化算法要继承上述方法的思路,并加强如下:
语法树的各类语法块`Node`分别实现自己内部的格式化办法。 语法块`Node`要根据上一级`Node`传递来的要求,在输出自己内部的第一个`Token`之前,先输出多少个换行符和空格。 语法块`Node`要告知自己的每个下一级`Node`,它希望这些子级先输出多少个换行符和空格。 每个`Token`根据(上级传递来的要求+它与前一个`Token`之间的位置关系(间隔多少空格和换行符))输出自己前面的空白符,而后输出自己的内容。
每个语法块`Node`都服从直接上级的要求,并对各个直接下级发出要求。通过这样接力的方式,就可以通过遍历一次语法树来输出格式化的代码了。
同学们可以下载链接中的工具查看效果,里面还有一些示例shader。