基於Predictive Parsing的ABNF語法分析器(二)——ABNF語法元素的類定義

下面根據ABNF的語法定義,逐條來定義ANBF語法元素類:

(一)首先來看rulelist:

rulelist       =  1*( rule / (*c-wsp c-nl) )
rulelist(規則列表)是ABNF語法的最頂層的符號,也就說一份符合ABNF規定的文法,它就是一個rulelist。rulelist至少由一個rule(規則)組成,在Java語言中,我們直接使用List來定義即可,例如:
List<Rule> ruleList;

(二)rule的定義

rule代表一條上下文無關文法規則,rule的定義是:

rule           =  rulename defined-as elements c-nl
                               ; continues if next line starts
                               ;  with white space
即一條規則是由rulename(規則名稱)、defined-as(定義爲符號)、elements(元素)、c-nl(換行)組成的,例如Rule1="this is a rule"就是一條規則。

對於Rule類來說,rulename、defined-as、elements是有具體內容的,我們將其定義爲Rule類的成員即可,而c-nl只是分隔符,其具體內容(例如是一個空格還是兩個空格),對於Rule沒有影響,因此我們把Rule類定義爲:

/*
    This file is one of the component a Context-free Grammar Parser Generator,
    which accept a piece of text as the input, and generates a parser
    for the inputted context-free grammar.
    Copyright (C) 2013, Junbiao Pan (Email: [email protected])

    This program is free software: you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation, either version 3 of the License, or
    (at your option) any later version.

    This program is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along with this program.  If not, see <http://www.gnu.org/licenses/>.
 */

public class Rule {
	private RuleName ruleName;
	public RuleName getRuleName() { return ruleName; }

    private String definedAs;
    public String getDefinedAs() { return definedAs; }
    public void setDefinedAs(String definedAs) { this.definedAs = definedAs; }
	
	private Elements elements;
	public Elements getElements() { return elements; }
	
	public Rule(RuleName ruleName, String definedAs, Elements elements) {
		this.ruleName = ruleName;
        this.definedAs = definedAs;
		this.elements = elements;
	}
}
之所以需要保存defined-as信息,是因爲rule的定義分普通定義和遞增定義兩種,在Rule類中需要保留這個信息。

        defined-as     =  *c-wsp ("=" / "=/") *c-wsp
                               ; basic rules definition and
                               ;  incremental alternatives


(三)rulename的定義

不廢話了,直接上代碼:

/*
    This file is one of the component a Context-free Grammar Parser Generator,
    which accept a piece of text as the input, and generates a parser
    for the inputted context-free grammar.
    Copyright (C) 2013, Junbiao Pan (Email: [email protected])

    This program is free software: you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation, either version 3 of the License, or
    (at your option) any later version.

    This program is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along with this program.  If not, see <http://www.gnu.org/licenses/>.
 */

public class RuleName implements Element {
	private String prefix;
	private String rulename;
	public String toString() { return prefix + rulename; }
	
    public RuleName(String rulename) {
        this.prefix = "";
        this.rulename = rulename;
    }

	public RuleName(String prefix, String rulename) {
		this.prefix = prefix;
		this.rulename = rulename;
	}
}
請原諒我版權聲明的行數比代碼的行數還多,等這些代碼慢慢羽翼豐滿就沒那麼亮瞎眼啦,嘿嘿。這個RuleName定義了一個prefix即前綴,爲什麼呢?因爲今後我們處理ABNF文法時,會有許多依賴關係,例如一份SIP(RFC3261)協議,需要依賴RFC1035、RFC2234、RFC2396、RFC2616、RFC2806等多份規範,前綴就像命名空間一樣爲每一份規範內的規則名加上規範名稱的限定。

(四)elements的定義

/*
    This file is one of the component a Context-free Grammar Parser Generator,
    which accept a piece of text as the input, and generates a parser
    for the inputted context-free grammar.
    Copyright (C) 2013, Junbiao Pan (Email: [email protected])

    This program is free software: you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation, either version 3 of the License, or
    (at your option) any later version.

    This program is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along with this program.  If not, see <http://www.gnu.org/licenses/>.
 */
// elements       =  alternation *c-wsp
public class Elements {
	private Alternation alternation;
	public Alternation getAlternation() { return alternation; }
	public Elements(Alternation alternation) {
		this.alternation = alternation;
	}
}

elements其實就是alternation,我把alternation定義爲elements的成員(而不是定義爲子類),主要是因爲elements比alternation多了後面的*c-wsp。

(五)Alternation的定義

/*
    This file is one of the component a Context-free Grammar Parser Generator,
    which accept a piece of text as the input, and generates a parser
    for the inputted context-free grammar.
    Copyright (C) 2013, Junbiao Pan (Email: [email protected])

    This program is free software: you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation, either version 3 of the License, or
    (at your option) any later version.

    This program is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along with this program.  If not, see <http://www.gnu.org/licenses/>.
 */

//  alternation    =  concatenation
//                          *(*c-wsp "/" *c-wsp concatenation)
public class Alternation {
    private Set<Concatenation> concatenations = new HashSet<Concatenation>();
	public void addConcatenation(Concatenation concatenation) {
		concatenations.add(concatenation);
	}
	public Set<Concatenation> getConcatenations() {
		return concatenations;
	}
}
alternation可以派生爲一個或多個concatenation,之間用“/”隔開。

這裏把concatenation定義在集合Set而不是List中,是因爲alternation並無先後順序。

(六)Concatenation的定義

/*
    This file is one of the component a Context-free Grammar Parser Generator,
    which accept a piece of text as the input, and generates a parser
    for the inputted context-free grammar.
    Copyright (C) 2013, Junbiao Pan (Email: [email protected])

    This program is free software: you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation, either version 3 of the License, or
    (at your option) any later version.

    This program is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along with this program.  If not, see <http://www.gnu.org/licenses/>.
 */
// concatenation  =  repetition *(1*c-wsp repetition)
public class Concatenation {
	private List<Repetition> repetitions = new ArrayList<Repetition>();
	public void addRepetition(Repetition repetition) {
		repetitions.add(repetition);
	}
	public List<Repetition> getRepetitions() { return repetitions; }
}

一個concatenation由一個或多個repetition組成,這些repetition是有先後順序的。

(七)Repetition的定義

/*
    This file is one of the component a Context-free Grammar Parser Generator,
    which accept a piece of text as the input, and generates a parser
    for the inputted context-free grammar.
    Copyright (C) 2013, Junbiao Pan (Email: [email protected])

    This program is free software: you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation, either version 3 of the License, or
    (at your option) any later version.

    This program is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along with this program.  If not, see <http://www.gnu.org/licenses/>.
 */

// repetition     =  [repeat] element
public class Repetition {
	private Repeat repeat;
	private Element element;
	
	public Repetition(Repeat repeat, Element element) {
		this.repeat = repeat;
		this.element = element;
	}
	public Repetition(Element element) {
		this(null, element);
	}
}
(八)repeat和element的定義
/*
    This file is one of the component a Context-free Grammar Parser Generator,
    which accept a piece of text as the input, and generates a parser
    for the inputted context-free grammar.
    Copyright (C) 2013, Junbiao Pan (Email: [email protected])

    This program is free software: you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation, either version 3 of the License, or
    (at your option) any later version.

    This program is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along with this program.  If not, see <http://www.gnu.org/licenses/>.
 */

// repeat         =  1*DIGIT / (*DIGIT "*" *DIGIT)
public class Repeat { 
    private int min = 0, max = 0; 
    public int getMin() { return this.min; } 
    public int getMax() { return this.max; } 
    public Repeat(int min, int max) {this.min = min;this.max = max;}
}
repeat是由一組或兩組數字組成的,當由兩組數字組成時,兩組數字之間有星號“*”。

再看element的定義:

//  element        =  rulename / group / option /
//                            char-val / num-val / prose-val
public interface Element  {
}
element可以派生爲rulename、group、option等nonterminal,可以把它定義爲一個接口(暫時我沒有想到更好的方法)。

(九)group的定義

/*
    This file is one of the component a Context-free Grammar Parser Generator,
    which accept a piece of text as the input, and generates a parser
    for the inputted context-free grammar.
    Copyright (C) 2013, Junbiao Pan (Email: [email protected])

    This program is free software: you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation, either version 3 of the License, or
    (at your option) any later version.

    This program is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along with this program.  If not, see <http://www.gnu.org/licenses/>.
 */

//  group          =  "(" *c-wsp alternation *c-wsp ")"
public class Group implements Element {
    private Alternation alternation;
    public Group(Alternation alternation) {this.alternation = alternation;}
}
一個group是由一對圓括號包含的alternation。

(十)option的定義

/*
    This file is one of the component a Context-free Grammar Parser Generator,
    which accept a piece of text as the input, and generates a parser
    for the inputted context-free grammar.
    Copyright (C) 2013, Junbiao Pan (Email: [email protected])

    This program is free software: you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation, either version 3 of the License, or
    (at your option) any later version.

    This program is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along with this program.  If not, see <http://www.gnu.org/licenses/>.
 */

//  option         =  "[" *c-wsp alternation *c-wsp "]"

public class Option implements Element {
    private Alternation alternation; 
    public Alternation getAlternation() { return alternation; } 
    public Option(Alternation alternation) {this.alternation = alternation;}
}
一個option是由一對方括號包含的alternation。

(十一)num_val的定義

num_val包括二進制、十進制和十六進制的形式。

/*
    This file is one of the component a Context-free Grammar Parser Generator,
    which accept a piece of text as the input, and generates a parser
    for the inputted context-free grammar.
    Copyright (C) 2013, Junbiao Pan (Email: [email protected])

    This program is free software: you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation, either version 3 of the License, or
    (at your option) any later version.

    This program is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along with this program.  If not, see <http://www.gnu.org/licenses/>.
 */
/*
        num-val        =  "%" (bin-val / dec-val / hex-val)

        bin-val        =  "b" 1*BIT
                          [ 1*("." 1*BIT) / ("-" 1*BIT) ]
                               ; series of concatenated bit values
                               ; or single ONEOF range

        dec-val        =  "d" 1*DIGIT
                          [ 1*("." 1*DIGIT) / ("-" 1*DIGIT) ]

        hex-val        =  "x" 1*HEXDIG
                          [ 1*("." 1*HEXDIG) / ("-" 1*HEXDIG) ]
*/
public class NumVal implements Element {//, Terminal {
	private String base;
	private List<String> values = new ArrayList<String>();
	public List<String> getValues() { return values; }
	public NumVal(String base) {
		this.base = base;
	}
}

無論是二進制、十進制還是十六進制,都有列舉和範圍兩種形式,例如%d11.22.33.44表示4個十進制的數字11、22、33、44,而%x00-ff表示十六進制從0x00到0xff之間。

這裏單獨定義範圍類型的數值:

/*
    This file is one of the component a Context-free Grammar Parser Generator,
    which accept a piece of text as the input, and generates a parser
    for the inputted context-free grammar.
    Copyright (C) 2013, Junbiao Pan (Email: [email protected])

    This program is free software: you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation, either version 3 of the License, or
    (at your option) any later version.

    This program is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along with this program.  If not, see <http://www.gnu.org/licenses/>.
 */

public class RangedNumVal implements Element {
	private String base, from ,to;
	
	public RangedNumVal(String base, String from, String to) {
		this.base = base;
		this.from = from;
		this.to = to;
	}
}

(十二)char-val和prose-val

char-val和prose-val的ABNF定義:

        char-val       =  DQUOTE *(%x20-21 / %x23-7E) DQUOTE
                               ; quoted string of SP and VCHAR
                                  without DQUOTE

        prose-val      =  "<" *(%x20-3D / %x3F-7E) ">"
                               ; bracketed string of SP and VCHAR
                                  without angles
                               ; prose description, to be used as
                                  last resort
char-val的Java定義:

/*
    This file is one of the component a Context-free Grammar Parser Generator,
    which accept a piece of text as the input, and generates a parser
    for the inputted context-free grammar.
    Copyright (C) 2013, Junbiao Pan (Email: [email protected])

    This program is free software: you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation, either version 3 of the License, or
    (at your option) any later version.

    This program is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along with this program.  If not, see <http://www.gnu.org/licenses/>.
 */

public class CharVal implements Element {
	private String value;
	public CharVal(String value) {
		this.value = value;
	}

}
prose-val的定義:
/*
    This file is one of the component a Context-free Grammar Parser Generator,
    which accept a piece of text as the input, and generates a parser
    for the inputted context-free grammar.
    Copyright (C) 2013, Junbiao Pan (Email: [email protected])

    This program is free software: you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation, either version 3 of the License, or
    (at your option) any later version.

    This program is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along with this program.  If not, see <http://www.gnu.org/licenses/>.
 */

public class ProseVal implements Element {
	private String value;
	public ProseVal(String value) {
		this.value = value;
	}

}


其他一些ABNF語法元素,例如ALPHA之類的,因爲比較簡單就直接按字符串使用了,不單獨定義一個類。

有了這些元素對應的類定義,下一步我們就可以開始正式寫預測分析器的代碼了。

上面的版權聲明比較累贅,還請包涵,代碼會在完善的時候逐步加上去的。

午休去 :)

本系列文章索引:基於預測的ABNF文法分析器

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章