DUBUISSON Olivier wrote:
> N.B.: My answers are only based on ITU-T Rec. X.680 and I haven't
> checked if there are differences in how regexps are defined in TTCN-3.
There are a few differences. Enough to be frustrating. Enough that many
of your answers are invalid for TTCN-3...
> Carl Cerecke wrote:
>
>>Some questions regarding section B.1.5, "Matching character pattern".
>>
>>What should the behaviuor be if the character following a \ is not a
>>meta-character e.g. \a
>
>
> X.680, A.2.8:
> The regular expression "\a" matches the character "a".
> NOTE - The fourth example shows that the backslash is allowed to precede
> characters that are not metacharacters, but this use is deprecated
> (because other metacharacters could be allowed in future versions of
> this Recommendation | International Standard).
This is the behaviour I would expect, but it is not mentioned in the
TTCN-3 standard.
>>Does "abc*xyz" match as many characters as it can before xyz, or as few
>>as it must? i.e. for the string "abcxyzxyz" will the * match "" or "xyz"?
>
>
> A character string does match or doesn't match a pattern; it is a
> boolean result. Your question only matters when it comes to referencing
> subexpressions with \1, \2… Since these notations can only be used in
> English text and comments, I'd recommend that the comment clarifies
> whether it matches the shortest or longest subexpression (IMHO it
> usually matches the longest). We'll discuss this within the ASN.1 group
> ASAP.
I think longest match is more often what is expected. It is not
specified in the TTCN-3 standard.
>>Do the ? and * meta-characters match line-terminators?
>
>
> These operators apply to the character or subexpression which precedes
> them (see X.680, A.2.10). As a consequence, a pattern such as "\n?"
> matches zero or one newline character.
X.680 uses these characters the same way everything else seems to in a
regular expression. In TTCN-3 they behave like the globbing in a unix
shell. See the comment at the end of the email.
>>What is the behaviour of unescaped, but "illegal" meta-characters? e.g.
>>does pattern "ab]" match a-b-right-square-bracket?
>
>
> It is not a valid regexp. But X.680, Annex A, says nothing about this.
>
>
>>How do reference expressions behave? i.e. are they treated as a group,
>>surrounded by implicit parentheses? e.g. If foo := "ab" does pattern
>>"{foo}#(2)" match abb or abab?
>
>
> Your example is illegal. The curly brackets can only be used to denote a
> character by giving its {group,plane,row,cell} quadruple as defined in
> ISO/IEC 10646-1.
> I guess you meant to write "\N{Foo}#(2)" where foo is defined as
> follows:
> Foo IA5String ::= {a|b}
> In that case, yes, implicit parentheses surround the subexpression and
> the pattern is equivalent to "[ab]#2".
Curly brackets have a different meaning in TTCN-3. The string inside the
curly brackets is the name of a string variable/constant containing a
regular expression. A 10646-1 char in TTCN-3 regexp is denoted by
\q{group,plane,row,cell}
>>What is the behaviour of ^ not at the start of the set? A normal
>>character or an error?
>
>
> The caret has a special meaning within square brackets only.
> X.680, A.2.2 says:
> If the first character of the list is the caret "^", then it matches any
> character which is not in the list. To include a literal CIRCUMFLEX
> ACCENT (94) "^", place it anywhere except in the first position or
> precede it with a backslash.
>
> If a caret appears in a regexp outside square brackets, it matches a
> CIRCUMFLEX ACCENT.
I agree that this makes sense, but there is no mention of it in the
TTCN-3 standard.
>>Are meta-characters allowed within a set preceeded by a \ (The
>>standard specifies only character literals and - and ^ inside a set).
>
>
> Please give an example.
A poorly worded question from me. Sorry. An example:
Does "[a-g?w-z]" match any of the symbols abcdefg?wxyz or is it an
error? This is covered in X.680 A.2.2 (All metacharacter sequences,
except "]" and "\", loose their special meaning inside a list.) but is
not mentioned in the TTCN-3 standard.
>>Can sets be nested?
>
>
> Please give an example.
What does "[a-m[n-z]]" mean? Is it interpreted as "[a-z]" or an error or
something else? The X.680 A.2.2 definition effectively defines a meaning
for this string.
>>Can a reference expression be used inside a set?
>
>
> Please give an example. You are using a vocabulary with which I am not
> familiar.
Reference expressions are those identifiers within curley brackets,
mentioned above.
>>Regular expression notation in other tools almost universally use ? as
>>an optional match (0 or 1), and * as repetition (0 or more). The
>>equivalent expressions for ttcn3-style ? and * in "normal" regular
>>expressions is . for ? and .* for * Is it a good idea to depart from
>>standard notation in this area?
>
>
> I guess the difference is due to the fact that ASN.1 regexps have been
> included with no change in TTCN-3.
The ASN.1 (and pretty much everything else) "." is equivalent to TTCN-3
(and unix-shell) "?"
The TTCN-3 (and unix-shell) "*" is equivalent to ASN.1 (and pretty much
everything else) ".*"
Regular expressions are an important part of any testing system and
should be very well specified so all implementors can implement regular
expression engines that behave identically.
Cheers,
--
Carl Cerecke - Software Designer
Da Vinci Communications Ltd
Christchurch - New Zealand
TEL : +64 3 3838311
FAX : +64 3 3838310
e-mail:
This email address is being protected from spambots. You need JavaScript enabled to view it.
www :
www.davinci-communications.com