一个GLSL Shader的格式化算法(LALR解析器)

一个GLSL Shader的格式化算法(LALR解析器)

在进行OpenGL程序开发时,我需要自行解析`string`类型的Shader代码,抽取出里面的某些变量名和subroutine名。

由于找不到可用的GLSL Shader解析器,就照着虎书(《现代编译原理-c语言描述》)自己写了个LALR Generator,实际上包含了(词法分析器+语法分析器+格式化框架)的(LR(0)、SLR(1)、LALR(1)、LR(1))全自动生成,支持yacc的优先级指令和Shift/Reduce、Reduce/Reduc冲突解决功能。

本文的GLSL Shader的格式化工具(下载链接在这里https://files.cnblogs.com/files/bitzhuwei/GLSL.Formatter.net8.0.rar?t=1735122490&download=true ),就是用我的LALR Generator,根据GLSL4.60.8的文法生成的,如下所述。

GLSL4.60.8的文法
#extractor <GLSL4.60.8.st.ext>  translation_unit :       external_declaration     | translation_unit external_declaration  ; external_declaration :       function_definition     | declaration     | ';'  ; function_definition :       function_prototype compound_statement/* compound_statement_no_new_scope */  ; variable_identifier :       'identifier'  ; primary_expression :       variable_identifier     | 'intConstant'     | 'uintConstant'     | 'floatConstant'     | 'boolConstant'     | 'doubleConstant'     | '(' expression ')'  ; postfix_expression :       primary_expression     | postfix_expression '[' integer_expression ']'     | function_call     | postfix_expression '.' 'identifier' // FIELD_SELECTION     | postfix_expression '++'     | postfix_expression '--'  ; integer_expression :       expression  ; function_call :       function_call_or_method  ; function_call_or_method :       function_call_generic  ; function_call_generic :       function_call_header_with_parameters ')'     | function_call_header_no_parameters ')'  ; function_call_header_no_parameters :       function_call_header 'void'     | function_call_header  ; function_call_header_with_parameters :       function_call_header assignment_expression     | function_call_header_with_parameters ',' assignment_expression  ; function_call_header :       function_identifier '('  ; function_identifier :       type_specifier     | postfix_expression  ; unary_expression :       postfix_expression     | '++' unary_expression     | '--' unary_expression     | unary_operator unary_expression  ; unary_operator :       '+'     | '-'     | '!'     | '~'  ; multiplicative_expression :       unary_expression     | multiplicative_expression '*' unary_expression     | multiplicative_expression '/' unary_expression     | multiplicative_expression '%' unary_expression  ; additive_expression :       multiplicative_expression     | additive_expression '+' multiplicative_expression     | additive_expression '-' multiplicative_expression  ; shift_expression :       additive_expression     | shift_expression '<<' additive_expression     | shift_expression '>>' additive_expression  ; relational_expression :       shift_expression     | relational_expression '<' shift_expression     | relational_expression '>' shift_expression     | relational_expression '<=' shift_expression     | relational_expression '>=' shift_expression  ; equality_expression :       relational_expression     | equality_expression '==' relational_expression     | equality_expression '!=' relational_expression  ; and_expression :       equality_expression     | and_expression '&' equality_expression  ; exclusive_or_expression :       and_expression     | exclusive_or_expression '^' and_expression  ; inclusive_or_expression :       exclusive_or_expression     | inclusive_or_expression '|' exclusive_or_expression  ; logical_and_expression :       inclusive_or_expression     | logical_and_expression '&&' inclusive_or_expression  ; logical_xor_expression :       logical_and_expression     | logical_xor_expression '^^' logical_and_expression  ; logical_or_expression :       logical_xor_expression     | logical_or_expression '||' logical_xor_expression  ; conditional_expression :       logical_or_expression     | logical_or_expression '?' expression ':' assignment_expression  ; assignment_expression :       conditional_expression     | unary_expression assignment_operator assignment_expression  ; assignment_operator :       '='     | '*='     | '/='     | '%='     | '+='     | '-='     | '<<='     | '>>='     | '&='     | '^='     | '|='  ; expression :       assignment_expression     | expression ',' assignment_expression  ; constant_expression :       conditional_expression  ; declaration :       function_prototype ';'     | init_declarator_list ';'     | 'precision' precision_qualifier type_specifier ';'     | type_qualifier 'identifier' '{' struct_declaration_list '}' ';'     | type_qualifier 'identifier' '{' struct_declaration_list '}' 'identifier' ';'     | type_qualifier 'identifier' '{' struct_declaration_list '}' 'identifier' array_specifier ';'     | type_qualifier ';'     | type_qualifier 'identifier' ';'     | type_qualifier 'identifier' identifier_list ';'  ; identifier_list :       ',' 'identifier'     | identifier_list ',' 'identifier'  ; function_prototype :       function_declarator ')'  ; function_declarator :       function_header     | function_header_with_parameters  ; function_header_with_parameters :       function_header parameter_declaration     | function_header_with_parameters ',' parameter_declaration  ; function_header :       fully_specified_type 'identifier' '('  ; parameter_declarator :       type_specifier 'identifier'     | type_specifier 'identifier' array_specifier  ; parameter_declaration :       type_qualifier parameter_declarator     | parameter_declarator     | type_qualifier parameter_type_specifier     | parameter_type_specifier  ; parameter_type_specifier :       type_specifier  ; init_declarator_list :       single_declaration     | init_declarator_list ',' 'identifier'     | init_declarator_list ',' 'identifier' array_specifier     | init_declarator_list ',' 'identifier' array_specifier '=' initializer     | init_declarator_list ',' 'identifier' '=' initializer  ; single_declaration :       fully_specified_type     | fully_specified_type 'identifier'     | fully_specified_type 'identifier' array_specifier     | fully_specified_type 'identifier' array_specifier '=' initializer     | fully_specified_type 'identifier' '=' initializer  ; fully_specified_type :       type_specifier     | type_qualifier type_specifier  ; invariant_qualifier :       'invariant'  ; interpolation_qualifier :       'smooth'     | 'flat'     | 'noperspective'  ; layout_qualifier :       'layout' '(' layout_qualifier_id_list ')'  ; layout_qualifier_id_list :       layout_qualifier_id     | layout_qualifier_id_list ',' layout_qualifier_id  ; layout_qualifier_id :       'identifier'     | 'identifier' '=' constant_expression     | 'shared'  ; precise_qualifier :       'precise'  ; type_qualifier :       single_type_qualifier     | type_qualifier single_type_qualifier  ; single_type_qualifier :       storage_qualifier     | layout_qualifier     | precision_qualifier     | interpolation_qualifier     | invariant_qualifier     | precise_qualifier  ; storage_qualifier :       'const'     | 'in'     | 'out'     | 'inout'     | 'centroid'     | 'patch'     | 'sample'     | 'uniform'     | 'buffer'     | 'shared'     | 'coherent'     | 'volatile'     | 'restrict'     | 'readonly'     | 'writeonly'     | 'subroutine'     | 'subroutine' '(' type_name_list ')'  ; type_name_list :       'type_name'     | type_name_list ',' 'type_name'  ; type_specifier :       type_specifier_nonarray     | type_specifier_nonarray array_specifier  ; array_specifier :       '[' ']'     | '[' conditional_expression ']'     | array_specifier '[' ']'     | array_specifier '[' conditional_expression ']'  ; type_specifier_nonarray :       'void'     | 'float'     | 'double'     | 'int'     | 'uint'     | 'bool'     | 'vec2'     | 'vec3'     | 'vec4'     | 'dvec2'     | 'dvec3'     | 'dvec4'     | 'bvec2'     | 'bvec3'     | 'bvec4'     | 'ivec2'     | 'ivec3'     | 'ivec4'     | 'uvec2'     | 'uvec3'     | 'uvec4'     | 'mat2'     | 'mat3'     | 'mat4'     | 'mat2x2'     | 'mat2x3'     | 'mat2x4'     | 'mat3x2'     | 'mat3x3'     | 'mat3x4'     | 'mat4x2'     | 'mat4x3'     | 'mat4x4'     | 'dmat2'     | 'dmat3'     | 'dmat4'     | 'dmat2x2'     | 'dmat2x3'     | 'dmat2x4'     | 'dmat3x2'     | 'dmat3x3'     | 'dmat3x4'     | 'dmat4x2'     | 'dmat4x3'     | 'dmat4x4'     | 'atomic_uint'     | 'sampler2D'     | 'sampler3D'     | 'samplerCube'     | 'sampler2DShadow'     | 'samplerCubeShadow'     | 'sampler2DArray'     | 'sampler2DArrayShadow'     | 'samplerCubeArray'     | 'samplerCubeArrayShadow'     | 'isampler2D'     | 'isampler3D'     | 'isamplerCube'     | 'isampler2DArray'     | 'isamplerCubeArray'     | 'usampler2D'     | 'usampler3D'     | 'usamplerCube'     | 'usampler2DArray'     | 'usamplerCubeArray'     | 'sampler1D'     | 'sampler1DShadow'     | 'sampler1DArray'     | 'sampler1DArrayShadow'     | 'isampler1D'     | 'isampler1DArray'     | 'usampler1D'     | 'usampler1DArray'     | 'sampler2DRect'     | 'sampler2DRectShadow'     | 'isampler2DRect'     | 'usampler2DRect'     | 'samplerBuffer'     | 'isamplerBuffer'     | 'usamplerBuffer'     | 'sampler2DMS'     | 'isampler2DMS'     | 'usampler2DMS'     | 'sampler2DMSArray'     | 'isampler2DMSArray'     | 'usampler2DMSArray'     | 'image2D'     | 'iimage2D'     | 'uimage2D'     | 'image3D'     | 'iimage3D'     | 'uimage3D'     | 'imageCube'     | 'iimageCube'     | 'uimageCube'     | 'imageBuffer'     | 'iimageBuffer'     | 'uimageBuffer'     | 'image1D'     | 'iimage1D'     | 'uimage1D'     | 'image1DArray'     | 'iimage1DArray'     | 'uimage1DArray'     | 'image2DRect'     | 'iimage2DRect'     | 'uimage2DRect'     | 'image2DArray'     | 'iimage2DArray'     | 'uimage2DArray'     | 'imageCubeArray'     | 'iimageCubeArray'     | 'uimageCubeArray'     | 'image2DMS'     | 'iimage2DMS'     | 'uimage2DMS'     | 'image2DMSArray'     | 'iimage2DMSArray'     | 'uimage2DMSArray'     | struct_specifier     | 'type_name'  ; precision_qualifier :       'highp'     | 'mediump'     | 'lowp'  ; struct_specifier :       'struct' 'type_name'/* 'identifier' */ '{' struct_declaration_list '}'     | 'struct' '{' struct_declaration_list '}'  ; struct_declaration_list :       struct_declaration     | struct_declaration_list struct_declaration  ; struct_declaration :       type_specifier struct_declarator_list ';'     | type_qualifier type_specifier struct_declarator_list ';'  ; struct_declarator_list :       struct_declarator     | struct_declarator_list ',' struct_declarator  ; struct_declarator :       'identifier'     | 'identifier' array_specifier  ; initializer :       assignment_expression     | '{' initializer_list '}'     | '{' initializer_list ',' '}'  ; initializer_list :       initializer     | initializer_list ',' initializer  ; declaration_statement :       declaration  ; statement :       compound_statement     | simple_statement  ; simple_statement :       declaration_statement     | expression_statement     | selection_statement     | switch_statement     | case_label     | iteration_statement     | jump_statement  ; compound_statement :       '{' '}'     | '{' statement_list '}'  ;  /* merge into statement statement_no_new_scope :       compound_statement_no_new_scope     | simple_statement  ;  */  /* merge into compound_statement compound_statement_no_new_scope :       '{' '}'     | '{' statement_list '}'  ;  */ statement_list :       statement     | statement_list statement  ; expression_statement :       ';'     | expression ';'  ; selection_statement :       'if' '(' expression ')' selection_rest_statement  ; selection_rest_statement :       statement 'else' statement     | statement  ; condition :       expression     | fully_specified_type 'identifier' '=' initializer  ; switch_statement :       'switch' '(' expression ')' '{' switch_statement_list '}'  ; switch_statement_list :       empty     | statement_list  ; case_label :       'case' expression ':'     | 'default' ':'  ; iteration_statement :       'while' '(' condition ')' statement/* statement_no_new_scope */     | 'do' statement 'while' '(' expression ')' ';'     | 'for' '(' for_init_statement for_rest_statement ')' statement/* statement_no_new_scope */  ; for_init_statement :       expression_statement     | declaration_statement  ; conditionopt :       condition     | empty  ; for_rest_statement :       conditionopt ';'     | conditionopt ';' expression  ; jump_statement :       'continue' ';'     | 'break' ';'     | 'return' ';'     | 'return' expression ';'     | 'discard' ';'  ;   // lexical statements  // no need : 'struct' 'identifier' '{' struct_declaration_list '}' // now I changed it into 'struct' 'type_name' '{' struct_declaration_list '}' // only identifier next to 'struct' is a user-defined type and should be a 'type_name' token. 'type_name' %%<'struct'>[a-zA-Z_][a-zA-Z0-9_]*%%   'intConstant' %%[-+]?[0-9]+%% 'intConstant' %%0x[0-9A-Fa-f]+%% 'uintConstant' %%[-+]?[0-9]+[uU]%% 'uintConstant' %%0x[0-9A-Fa-f]+[uU]%% 'floatConstant' %%[-+]?[0-9]+([.][0-9]+)?([Ee][-+]?[0-9]+)?[fF]%% 'boolConstant' %%true/[^a-zA-Z0-9_]%% 'boolConstant' %%false/[^a-zA-Z0-9_]%% 'doubleConstant' %%[-+]?[0-9]+([.][0-9]+)?([Ee][-+]?[0-9]+)?%%   'identifier' %%[a-zA-Z_][a-zA-Z0-9_]*%%   %grammarName   GLSL %blockComment  on %inlineComment on

有了解析器,就有了单词流List<Token>和语法树Node。面对GLSL代码的语法树,我们如何对源代码进行格式化呢?

格式化的目的是让源代码从书写上更适宜人类阅读。具体来说,格式化要做的事,只是增减某些空白符(空格、tab符、换行符)而已,不会修改具有语法意义的内容。

在C#中,

if (x >0) {t = x;x = 0; }

会被格式化为

if (x > 0) {     t = x; x = 0; }

if (x > 0) {t = x; x= 0; }

会被格式化为

if (x > 0) {     t = x;     x = 0; }

同样意义的源代码,格式化结果却不同。这有些复杂。

我们将问题拆开,一步一步来。

首先,我们忽略多行注释blockComment、单行注释inlineComment和源代码中原有的空白符。

此时,一个if(x>0){t=x;x=0;}就应当被格式化为:

if (x > 0) {     t = x;     x = 0; }

也就是说,语法树中的每一部分,都可以自己判断出应当如何格式化。用递归的方式遍历语法树即可。

现在来考虑注释。注释可能出现在List<Token>(词法分析器的分析结果)里的任意位置上,属实调皮。于是就有这样一个想法,利用C#的yield return语法糖,我们执行下述算法:

遍历`List<Token>`:     如果下一个要进行格式化的Token是注释,那么就在当前位置输出它并换行。     如果下一个要进行格式化的Token不是注释,那么就用`yield return`的方式,在语法树上格式化地输出它(输出它+某些空白符)。

只要在语法树每输出一个Token时,都进行一次yield return(return什么无所谓),那么yield return就能和遍历List<Token>的过程同步完成。

现在能够带着注释格式化了,但是无法解决这样的情况:上一行和下一行代码之间有好几个空行(即换行符),上述格式化过程会无视之。

完全的格式化算法要继承上述方法的思路,并加强如下:

语法树的各类语法块`Node`分别实现自己内部的格式化办法。 语法块`Node`要根据上一级`Node`传递来的要求,在输出自己内部的第一个`Token`之前,先输出多少个换行符和空格。 语法块`Node`要告知自己的每个下一级`Node`,它希望这些子级先输出多少个换行符和空格。 每个`Token`根据(上级传递来的要求+它与前一个`Token`之间的位置关系(间隔多少空格和换行符))输出自己前面的空白符,而后输出自己的内容。

每个语法块`Node`都服从直接上级的要求,并对各个直接下级发出要求。通过这样接力的方式,就可以通过遍历一次语法树来输出格式化的代码了。

同学们可以下载链接中的工具查看效果,里面还有一些示例shader。

 

发表评论

相关文章