|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectlp.parse.LpLexer
public class LpLexer
Class that tokenizes a textual input. The nextToken()
method reads
tokens as they appear in the input. get*
methods
(getTokenType()
, getLexem()
, getLineNumber()
,
getPosition()
and getToken()
return information
relevant to the last token read. All whitespace (as defined by
Character.isWhitespace(char)
) and line comments (parts of input after
a '%' character until the next line break or input end) are ignored, i.e.
they are not used to generate any tokens. 9 types of tokens are recognized:
LpTokenType.LEFT_PAREN
-- a left parenthesis '('LpTokenType.RIGHT_PAREN
-- a right parenthesis ')'LpTokenType.COMMA
-- a comma ','LpTokenType.DOT
-- a dot '.'LpTokenType.RULE_ARROW
-- a string "<-" or a string ":-"LpTokenType.LOWERCASE_WORD
is a string of characters from the set
{'_', 'a', 'b', ..., 'z', 'A', 'B', ..., 'Z', '0', '1', ..., '9'} not
beginning with an uppercase letter. In other words
([_a-z0-9][_a-zA-Z0-9]*)
. The token is parsed greedily -- it ends
only in case the next character does not belong to the set mentioned above,
even if it's whitespace or a beginning of a comment.LpTokenType.UPPERCASE_WORD
is a string of characters from the set
{'_', 'a', 'b', ..., 'z', 'A', 'B', ..., 'Z', '0', '1', ..., '9'} beginning
with an uppercase letter. In other words ([A-Z][_a-zA-Z0-9]*)
.LpTokenType.EOF
is returned when the end of input is happily
reached and also ever after.LpTokenType.UNKNOWN_CHAR
is returned if a character occurs that
couldn't be matched against any other token (just to be precise, it is none
of the following: whitespace, part of an inline comment, '(', ')', ',', '.',
a '<' of ':" followed by a '-', '_', lower- or uppercase letter). After
this token is returned by nextToken()
, getLexem()
returns a
string of length 1 with the alien character.LpLexer l = new LpLexer(); l.setInput("Simple, short sentence."); l.nextToken(); LpTokenType t = l.getTokenType(); while (t != LpTokenType.EOF) { System.out.println("token: " + t.toString() + "; lexem: " + l.getLexem() + "; line number: " + l.getLineNumber() + "; position: " + l.getPosition()); l.nextToken(); t = l.getTokenType(); } l.close();you should get the following output:
token: UPPERCASE_WORD; lexem: Simple; line number: 1; position: 1 token: COMMA; lexem: ,; line number: 1; position: 7 token: LOWERCASE_WORD; lexem: short; line number: 1; position: 9 token: LOWERCASE_WORD; lexem: sentence; line number: 1; position: 15 token: DOT; lexem: .; line number: 1; position: 23
LpTokenType
,
LpToken
Field Summary | |
---|---|
private int |
la
The lookahead character. |
private StringBuilder |
lexem
A StringBuilder where the lexem corresponding to the last token read is kept. |
private int |
lineNumber
A container for the number of line on which the last token occured. |
private int |
position
A container for the position of the last token's beginning within a line. |
private Reader |
reader
The reader used to read the input. |
private LpTokenType |
type
Type of the last token read. |
Constructor Summary | |
---|---|
LpLexer()
Creates a new instance of LpLexer . |
Method Summary | |
---|---|
private void |
appendOne()
Appends the current lookahead character to lexem and reads a new
one. |
void |
close()
Closes the underlying reader. |
String |
getLexem()
Returns the lexem corresponding to the last token read. |
int |
getLineNumber()
Returns the number of line of input on which the last token occured. |
int |
getPosition()
Returns the position of the last token's beginning within the line of input it's on. |
LpToken |
getToken()
Returns a LpToken instance containing information about the last
token read. |
LpTokenType |
getTokenType()
Returns the type of the last token read. |
protected void |
initialize()
Reinitializes members and reads the first lookahead character. |
private boolean |
isWordLetter(char c)
Determines if a character belongs to the set {'_', 'a', 'b', ..., 'z', 'A', 'B', ..., 'Z', '0', '1', ..., '9'}. |
void |
nextToken()
Reads the next token occuring on the input. |
private void |
readNewLA()
Reads one character from the input and stores it in the lookahead container la . |
void |
setInput(File file)
Sets the contents of the given file as an input for this LpLexer . |
void |
setInput(CharSequence input)
Sets the character input of this LpLexer . |
void |
setInput(Reader reader)
The given character reader will be used used as input for this LpLexer . |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
private Reader reader
private int la
readNewLA()
.
private LpTokenType type
getTokenType()
.
private final StringBuilder lexem
getLexem()
.
private int lineNumber
getLineNumber()
for more information on how lines are numbered.
private int position
getPosition()
.
Constructor Detail |
---|
public LpLexer()
LpLexer
.
Method Detail |
---|
public void setInput(CharSequence input)
LpLexer
. A StringReader
is used to read the input character by character.
Also resets information about the previously read token to the default
values (as if no token was read before).
input
- string with input for the LpLexer
IllegalArgumentException
- if input
is null
public void setInput(File file)
LpLexer
.
The default system encoding is used to read the contents of the file.
Also resets information about the previously read token to the default
values (as if no token was read before).
file
- the file with input for this LpLexer
IOException
- (wrapped in an ExceptionAdapter
)
in case an I/O exception occurs while opening or reading the file
IllegalArgumentException
- if file
is null
public void setInput(Reader reader)
LpLexer
.
Also resets information about the previously read token to the default
values (as if no token was read before).
reader
- a reader with input for the LpLexer
IOException
- (wrapped in an ExceptionAdapter
)
in case an I/O exception occurs while reading from the Reader
IllegalArgumentException
- if reader
is null
protected void initialize()
IOException
- (wrapped in an ExceptionAdapter
) in
case an I/O error occurs while reading the first lookahead characterpublic void close()
setInput(CharSequence)
or
setInput(File)
was used to set the current character source,
this method should be called when no more tokens are required from the
source. In other cases it is up to the programmer whether she will close
the Reader
given to setInput(Reader)
herself or call
this method.
close
in interface Closeable
IOException
- (wrapped in an ExceptionAdapter
) in
case an I/O exception occurs while closing the underlying Reader
public void nextToken()
IOException
- (wrapped in an ExceptionAdapter
) in
case an I/O exception occurs while reading the inputpublic LpTokenType getTokenType()
nextToken()
is called at least once after the last
setInput()
call. But if such a situation occurs, null
is
returned. Similarily, if close()
has already been called,
null
is returned.
public String getLexem()
LpTokenType.EOF
token, empty string is returned. This method is
not meant to be called before nextToken()
is called at least
once after the last setInput()
call. But if such a situation
occurs, null
is returned. Similarily, if close()
has
already been called, null
is returned.
public int getLineNumber()
nextToken()
is
called at least once after the last setInput()
call. But if such
a situation occurs, -1 is returned. Similarily, if close()
has
already been called, -1 is returned.
public int getPosition()
nextToken()
is
called at least once after the last setInput()
call. But if such
a situation occurs, -1 is returned. Similarily, if close()
has
already been called, -1 is returned.
public LpToken getToken()
LpToken
instance containing information about the last
token read. The information is read using the getTokenType()
,
getLexem()
, getPosition()
and getLineNumber()
methods.
LpToken
instance containing information about the last
token readprivate void readNewLA()
la
. Updates lineNumber
and position
.
IOException
- (wrapped in an ExceptionAdapter
) in
case an I/O exception occurs while reading the characterprivate void appendOne()
lexem
and reads a new
one.
IOException
- (wrapped in an ExceptionAdapter
) in
case an I/O exception occurs while reading the the new lookahead
characterprivate boolean isWordLetter(char c)
c
- the character in question
true
if it does belong to the set mentioned above,
false
otherwise.
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |