Welcome, Guest
Username: Password:
  • Page:
  • 1

TOPIC:

regular expression details 04 Nov 2002 01:00 #6256

Some questions regarding section B.1.5, "Matching character pattern".

What should the behaviuor be if the character following a \ is not a
meta-character e.g. \a

Does "abc*xyz" match as many characters as it can before xyz, or as few
as it must? i.e. for the string "abcxyzxyz" will the * match "" or "xyz"?

Do the ? and * meta-characters match line-terminators?

What is the behaviour of unescaped, but "illegal" meta-characters? e.g.
does pattern "ab]" match a-b-right-square-bracket?

How do reference expressions behave? i.e. are they treated as a group,
surrounded by implicit parentheses? e.g. If foo := "ab" does pattern
"{foo}#(2)" match abb or abab?

What is the behaviour of ^ not at the start of the set? A normal
character or an error?

Are meta-characters allowed within a set preceeded by a \ (The
standard specifies only character literals and - and ^ inside a set).

Can sets be nested?

Can a reference expression be used inside a set?

Regular expression notation in other tools almost universally use ? as
an optional match (0 or 1), and * as repetition (0 or more). The
equivalent expressions for ttcn3-style ? and * in "normal" regular
expressions is . for ? and .* for * Is it a good idea to depart from
standard notation in this area?

I'm sure there are some other issues regarding the specification of
patterns in ttcn-3 that will need resolving/clarifying. These issues are
just the ones that came to me first.

Cheers,
--
Carl Cerecke - Software Designer
Da Vinci Communications Ltd
Christchurch - New Zealand
TEL : +64 3 3838311
FAX : +64 3 3838310
e-mail: This email address is being protected from spambots. You need JavaScript enabled to view it.
www : www.davinci-communications.com

Please Log in to join the conversation.

regular expression details 04 Nov 2002 10:33 #6257

Carl Cerecke wrote:
>
> Some questions regarding section B.1.5, "Matching character pattern".

My recollection is that TTCN-3 patterns are the same as ASN.1 patterns.
You might find more information in Annex A of ITU-T Rec. X.680 that
you can download at:
www.itu.int/ITU-T/studygroups/com17/languages/

Another way to have answers to your question is to put
your patterns in statements such as:
T ::= UTF8String (PATTERN "your-pattern")
and check them with the Asnp syntax checker at:
asn1.elibel.tm.fr/asnp

After that, if some of your questions still do not have answers,
please get back to me and I'll have a look.
--
Olivier DUBUISSON
france telecom R&D

DTL/TAL - 22307 Lannion Cedex - France
t: +33 2 96 05 38 50 - f: +33 2 96 05 39 45 - asn1.elibel.tm.fr/

Please Log in to join the conversation.

regular expression details 12 Nov 2002 07:44 #6264

N.B.: My answers are only based on ITU-T Rec. X.680 and I haven't
checked if there are differences in how regexps are defined in TTCN-3.

Carl Cerecke wrote:
>
> Some questions regarding section B.1.5, "Matching character pattern".
>
> What should the behaviuor be if the character following a \ is not a
> meta-character e.g. \a

X.680, A.2.8:
The regular expression "\a" matches the character "a".
NOTE - The fourth example shows that the backslash is allowed to precede
characters that are not metacharacters, but this use is deprecated
(because other metacharacters could be allowed in future versions of
this Recommendation | International Standard).

> Does "abc*xyz" match as many characters as it can before xyz, or as few
> as it must? i.e. for the string "abcxyzxyz" will the * match "" or "xyz"?

A character string does match or doesn't match a pattern; it is a
boolean result. Your question only matters when it comes to referencing
subexpressions with \1, \2Â… Since these notations can only be used in
English text and comments, I'd recommend that the comment clarifies
whether it matches the shortest or longest subexpression (IMHO it
usually matches the longest). We'll discuss this within the ASN.1 group
ASAP.

> Do the ? and * meta-characters match line-terminators?

These operators apply to the character or subexpression which precedes
them (see X.680, A.2.10). As a consequence, a pattern such as "\n?"
matches zero or one newline character.

> What is the behaviour of unescaped, but "illegal" meta-characters? e.g.
> does pattern "ab]" match a-b-right-square-bracket?

It is not a valid regexp. But X.680, Annex A, says nothing about this.

> How do reference expressions behave? i.e. are they treated as a group,
> surrounded by implicit parentheses? e.g. If foo := "ab" does pattern
> "{foo}#(2)" match abb or abab?

Your example is illegal. The curly brackets can only be used to denote a
character by giving its {group,plane,row,cell} quadruple as defined in
ISO/IEC 10646-1.
I guess you meant to write "\N{Foo}#(2)" where foo is defined as
follows:
Foo IA5String ::= {a|b}
In that case, yes, implicit parentheses surround the subexpression and
the pattern is equivalent to "[ab]#2".

> What is the behaviour of ^ not at the start of the set? A normal
> character or an error?

The caret has a special meaning within square brackets only.
X.680, A.2.2 says:
If the first character of the list is the caret "^", then it matches any
character which is not in the list. To include a literal CIRCUMFLEX
ACCENT (94) "^", place it anywhere except in the first position or
precede it with a backslash.

If a caret appears in a regexp outside square brackets, it matches a
CIRCUMFLEX ACCENT.

> Are meta-characters allowed within a set preceeded by a \ (The
> standard specifies only character literals and - and ^ inside a set).

Please give an example.

> Can sets be nested?

Please give an example.

> Can a reference expression be used inside a set?

Please give an example. You are using a vocabulary with which I am not
familiar.

> Regular expression notation in other tools almost universally use ? as
> an optional match (0 or 1), and * as repetition (0 or more). The
> equivalent expressions for ttcn3-style ? and * in "normal" regular
> expressions is . for ? and .* for * Is it a good idea to depart from
> standard notation in this area?

I guess the difference is due to the fact that ASN.1 regexps have been
included with no change in TTCN-3.
--
Olivier DUBUISSON
france telecom R&D

DTL/TAL - 22307 Lannion Cedex - France
t: +33 2 96 05 38 50 - f: +33 2 96 05 39 45 - asn1.elibel.tm.fr/

Please Log in to join the conversation.

regular expression details 13 Nov 2002 01:49 #6265

DUBUISSON Olivier wrote:
> N.B.: My answers are only based on ITU-T Rec. X.680 and I haven't
> checked if there are differences in how regexps are defined in TTCN-3.

There are a few differences. Enough to be frustrating. Enough that many
of your answers are invalid for TTCN-3...

> Carl Cerecke wrote:
>
>>Some questions regarding section B.1.5, "Matching character pattern".
>>
>>What should the behaviuor be if the character following a \ is not a
>>meta-character e.g. \a
>
>
> X.680, A.2.8:
> The regular expression "\a" matches the character "a".
> NOTE - The fourth example shows that the backslash is allowed to precede
> characters that are not metacharacters, but this use is deprecated
> (because other metacharacters could be allowed in future versions of
> this Recommendation | International Standard).

This is the behaviour I would expect, but it is not mentioned in the
TTCN-3 standard.

>>Does "abc*xyz" match as many characters as it can before xyz, or as few
>>as it must? i.e. for the string "abcxyzxyz" will the * match "" or "xyz"?
>
>
> A character string does match or doesn't match a pattern; it is a
> boolean result. Your question only matters when it comes to referencing
> subexpressions with \1, \2… Since these notations can only be used in
> English text and comments, I'd recommend that the comment clarifies
> whether it matches the shortest or longest subexpression (IMHO it
> usually matches the longest). We'll discuss this within the ASN.1 group
> ASAP.

I think longest match is more often what is expected. It is not
specified in the TTCN-3 standard.

>>Do the ? and * meta-characters match line-terminators?
>
>
> These operators apply to the character or subexpression which precedes
> them (see X.680, A.2.10). As a consequence, a pattern such as "\n?"
> matches zero or one newline character.

X.680 uses these characters the same way everything else seems to in a
regular expression. In TTCN-3 they behave like the globbing in a unix
shell. See the comment at the end of the email.

>>What is the behaviour of unescaped, but "illegal" meta-characters? e.g.
>>does pattern "ab]" match a-b-right-square-bracket?
>
>
> It is not a valid regexp. But X.680, Annex A, says nothing about this.
>
>
>>How do reference expressions behave? i.e. are they treated as a group,
>>surrounded by implicit parentheses? e.g. If foo := "ab" does pattern
>>"{foo}#(2)" match abb or abab?
>
>
> Your example is illegal. The curly brackets can only be used to denote a
> character by giving its {group,plane,row,cell} quadruple as defined in
> ISO/IEC 10646-1.
> I guess you meant to write "\N{Foo}#(2)" where foo is defined as
> follows:
> Foo IA5String ::= {a|b}
> In that case, yes, implicit parentheses surround the subexpression and
> the pattern is equivalent to "[ab]#2".

Curly brackets have a different meaning in TTCN-3. The string inside the
curly brackets is the name of a string variable/constant containing a
regular expression. A 10646-1 char in TTCN-3 regexp is denoted by
\q{group,plane,row,cell}

>>What is the behaviour of ^ not at the start of the set? A normal
>>character or an error?
>
>
> The caret has a special meaning within square brackets only.
> X.680, A.2.2 says:
> If the first character of the list is the caret "^", then it matches any
> character which is not in the list. To include a literal CIRCUMFLEX
> ACCENT (94) "^", place it anywhere except in the first position or
> precede it with a backslash.
>
> If a caret appears in a regexp outside square brackets, it matches a
> CIRCUMFLEX ACCENT.

I agree that this makes sense, but there is no mention of it in the
TTCN-3 standard.

>>Are meta-characters allowed within a set preceeded by a \ (The
>>standard specifies only character literals and - and ^ inside a set).
>
>
> Please give an example.

A poorly worded question from me. Sorry. An example:
Does "[a-g?w-z]" match any of the symbols abcdefg?wxyz or is it an
error? This is covered in X.680 A.2.2 (All metacharacter sequences,
except "]" and "\", loose their special meaning inside a list.) but is
not mentioned in the TTCN-3 standard.

>>Can sets be nested?
>
>
> Please give an example.

What does "[a-m[n-z]]" mean? Is it interpreted as "[a-z]" or an error or
something else? The X.680 A.2.2 definition effectively defines a meaning
for this string.

>>Can a reference expression be used inside a set?
>
>
> Please give an example. You are using a vocabulary with which I am not
> familiar.

Reference expressions are those identifiers within curley brackets,
mentioned above.

>>Regular expression notation in other tools almost universally use ? as
>>an optional match (0 or 1), and * as repetition (0 or more). The
>>equivalent expressions for ttcn3-style ? and * in "normal" regular
>>expressions is . for ? and .* for * Is it a good idea to depart from
>>standard notation in this area?
>
>
> I guess the difference is due to the fact that ASN.1 regexps have been
> included with no change in TTCN-3.

The ASN.1 (and pretty much everything else) "." is equivalent to TTCN-3
(and unix-shell) "?"
The TTCN-3 (and unix-shell) "*" is equivalent to ASN.1 (and pretty much
everything else) ".*"

Regular expressions are an important part of any testing system and
should be very well specified so all implementors can implement regular
expression engines that behave identically.

Cheers,
--
Carl Cerecke - Software Designer
Da Vinci Communications Ltd
Christchurch - New Zealand
TEL : +64 3 3838311
FAX : +64 3 3838310
e-mail: This email address is being protected from spambots. You need JavaScript enabled to view it.
www : www.davinci-communications.com

Please Log in to join the conversation.

  • Page:
  • 1

FacebookTwitterGoogle BookmarksRedditNewsvineTechnoratiLinkedin