From Division of Nearshore Research

ScottDuff: Perl6Apocalypse5

Notes on Apocalypse 5 [1]

	(...) capturing brackets
	[...] grouping brackets (non capturing)
	{...} closure (return value ignored unless assigned)
	<...> Assertion (does this thing match?)
	:...  introduces a meta-syntactic token (a modifier)

$0 is the match object. Assigning to $0 affects the return value of the RE

Built-in regular expression modifiers

Modifiers now come at the beginning of the RE statement and as such, the leading m or s is required. For example, s:i/foo/bar/ replaces "foo", "Foo", "FOO", etc with bar. The modifer :i means "ignore case".

Long ModifierShort ModifierMeaning
:cont:cContinue from where the previous match left off
:words:wMatch a sequence of "words". Causes whitespace (which is normally ignored) to be replaced by \s+ between identifier and \s* anywhere else
:ignorecase:iMatch all alphabetic characters in a case insensitive manner
:any:areturns a list of anywhere the pattern matches within the string regardless of overlap
:each:eapply the pattern iteratively ("each" time we can)
:once:oMatch succeeds exactly once. To allow the RE to match again, execute the .reset() method on the RE object.
:perl5:p5Perl 5 matching. Cause the RE to be interpreted using the Perl 5 rules
?:u0Level 0 Unicode support. A . (dot) matches bytes
?:u1Level 1 Unicode support. A . (dot) matches code points
?:u2Level 2 Unicode support. A . (dot) matches a grapheme
?:u3Level 3 Unicode support. What . (dot) matches is language dependent
:nth(1):1stonly match the first occurance of the pattern
:nth(2):2ndonly match the second occurance of the pattern
:nth(3):3rdonly match the third occurance of the pattern
:nth(4):4thonly match the fourth occurance of the pattern
:nth(5):5thonly match the fifth occurance of the pattern
.........
:x(1):1xMatch one time
:x(2):2xMatch two times
:x(3):3xMatch three times
.........

Built-in assertions

assertionmeaning
<[...]>matches ... as a character class
<'...'>matches ... as a literal string
<alpha>matches any alphabetic character
<digit>matches any numeric character
<sp>matches a space character
<ws>matches any sequences of whitespace (same as \s+)
<dot>matches a literal . character (same as <'.'>)
<lt>matches a litereal < character (same as <'<'>)
<gt>matches a litereal > character (same as <'>'>)
<prior>match whatever the most recently successful match did
<after pattern>matches only after pattern (zero-width)
<before pattern>matches only before pattern (zero-width)
<commit>fails the entire match if backtracked to
<cut>fails the entire match if backtracked to and removes the portion of the string that matched to that point
<fail>causes the match to fail if reached
<null>match nothing
<ident>match an "identifier". (same as [ [<alpha>|_] \w* ])
<self>matches the same pattern as the current rule (useful for recursion)

Retrieved from http://lighthouse.tamucc.edu/ScottDuff/Perl6Apocalypse5
Page last modified on September 29, 2005, at 09:28 PM