docs/re_format.7

   1 .\"     $OpenBSD: re_format.7,v 1.14 2007/05/31 19:19:30 jmc Exp $
   2 .\"
   3 .\" Copyright (c) 1997, Phillip F Knaack. All rights reserved.
   4 .\"
   5 .\" Copyright (c) 1992, 1993, 1994 Henry Spencer.
   6 .\" Copyright (c) 1992, 1993, 1994
   7 .\"     The Regents of the University of California.  All rights reserved.
   8 .\"
   9 .\" This code is derived from software contributed to Berkeley by
  10 .\" Henry Spencer.
  11 .\"
  12 .\" Redistribution and use in source and binary forms, with or without
  13 .\" modification, are permitted provided that the following conditions
  14 .\" are met:
  15 .\" 1. Redistributions of source code must retain the above copyright
  16 .\"    notice, this list of conditions and the following disclaimer.
  17 .\" 2. Redistributions in binary form must reproduce the above copyright
  18 .\"    notice, this list of conditions and the following disclaimer in the
  19 .\"    documentation and/or other materials provided with the distribution.
  20 .\" 3. Neither the name of the University nor the names of its contributors
  21 .\"    may be used to endorse or promote products derived from this software
  22 .\"    without specific prior written permission.
  23 .\"
  24 .\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
  25 .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  26 .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  27 .\" ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
  28 .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  29 .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  30 .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  31 .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  32 .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  33 .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  34 .\" SUCH DAMAGE.
  35 .\"
  36 .\"     @(#)re_format.7 8.3 (Berkeley) 3/20/94
  37 .\"
  38 .Dd $Mdocdate: May 31 2007 $
  39 .Dt RE_FORMAT 7
  40 .Os
  41 .Sh NAME
  42 .Nm re_format
  43 .Nd POSIX regular expressions
  44 .Sh DESCRIPTION
  45 Regular expressions (REs),
  46 as defined in
  47 .St -p1003.1-2004 ,
  48 come in two forms:
  49 basic regular expressions
  50 (BREs)
  51 and extended regular expressions
  52 (EREs).
  53 Both forms of regular expressions are supported
  54 by the interfaces described in
  55 .Xr regex 3 .
  56 Applications dealing with regular expressions
  57 may use one or the other form
  58 (or indeed both).
  59 For example,
  60 .Xr ed 1
  61 uses BREs,
  62 whilst
  63 .Xr egrep 1
  64 talks EREs.
  65 Consult the manual page for the specific application to find out which
  66 it uses.
  67 .Pp
  68 POSIX leaves some aspects of RE syntax and semantics open;
  69 .Sq **
  70 marks decisions on these aspects that
  71 may not be fully portable to other POSIX implementations.
  72 .Pp
  73 This manual page first describes regular expressions in general,
  74 specifically extended regular expressions,
  75 and then discusses differences between them and basic regular expressions.
  76 .Sh EXTENDED REGULAR EXPRESSIONS
  77 An ERE is one** or more non-empty**
  78 .Em branches ,
  79 separated by
  80 .Sq \*(Ba .
  81 It matches anything that matches one of the branches.
  82 .Pp
  83 A branch is one** or more
  84 .Em pieces ,
  85 concatenated.
  86 It matches a match for the first, followed by a match for the second, etc.
  87 .Pp
  88 A piece is an
  89 .Em atom
  90 possibly followed by a single**
  91 .Sq * ,
  92 .Sq + ,
  93 .Sq ?\& ,
  94 or
  95 .Em bound .
  96 An atom followed by
  97 .Sq *
  98 matches a sequence of 0 or more matches of the atom.
  99 An atom followed by
 100 .Sq +
 101 matches a sequence of 1 or more matches of the atom.
 102 An atom followed by
 103 .Sq ?\&
 104 matches a sequence of 0 or 1 matches of the atom.
 105 .Pp
 106 A bound is
 107 .Sq {
 108 followed by an unsigned decimal integer,
 109 possibly followed by
 110 .Sq ,\&
 111 possibly followed by another unsigned decimal integer,
 112 always followed by
 113 .Sq } .
 114 The integers must lie between 0 and
 115 .Dv RE_DUP_MAX
 116 (255**) inclusive,
 117 and if there are two of them, the first may not exceed the second.
 118 An atom followed by a bound containing one integer
 119 .Ar i
 120 and no comma matches
 121 a sequence of exactly
 122 .Ar i
 123 matches of the atom.
 124 An atom followed by a bound
 125 containing one integer
 126 .Ar i
 127 and a comma matches
 128 a sequence of
 129 .Ar i
 130 or more matches of the atom.
 131 An atom followed by a bound
 132 containing two integers
 133 .Ar i
 134 and
 135 .Ar j
 136 matches a sequence of
 137 .Ar i
 138 through
 139 .Ar j
 140 (inclusive) matches of the atom.
 141 .Pp
 142 An atom is a regular expression enclosed in
 143 .Sq ()
 144 (matching a part of the regular expression),
 145 an empty set of
 146 .Sq ()
 147 (matching the null string)**,
 148 a
 149 .Em bracket expression
 150 (see below),
 151 .Sq .\&
 152 (matching any single character),
 153 .Sq ^
 154 (matching the null string at the beginning of a line),
 155 .Sq $
 156 (matching the null string at the end of a line),
 157 a
 158 .Sq \e
 159 followed by one of the characters
 160 .Sq ^.[$()|*+?{\e
 161 (matching that character taken as an ordinary character),
 162 a
 163 .Sq \e
 164 followed by any other character**
 165 (matching that character taken as an ordinary character,
 166 as if the
 167 .Sq \e
 168 had not been present**),
 169 or a single character with no other significance (matching that character).
 170 A
 171 .Sq {
 172 followed by a character other than a digit is an ordinary character,
 173 not the beginning of a bound**.
 174 It is illegal to end an RE with
 175 .Sq \e .
 176 .Pp
 177 A bracket expression is a list of characters enclosed in
 178 .Sq [] .
 179 It normally matches any single character from the list (but see below).
 180 If the list begins with
 181 .Sq ^ ,
 182 it matches any single character
 183 .Em not
 184 from the rest of the list
 185 (but see below).
 186 If two characters in the list are separated by
 187 .Sq - ,
 188 this is shorthand for the full
 189 .Em range
 190 of characters between those two (inclusive) in the
 191 collating sequence, e.g.\&
 192 .Sq [0-9]
 193 in ASCII matches any decimal digit.
 194 It is illegal** for two ranges to share an endpoint, e.g.\&
 195 .Sq a-c-e .
 196 Ranges are very collating-sequence-dependent,
 197 and portable programs should avoid relying on them.
 198 .Pp
 199 To include a literal
 200 .Sq ]\&
 201 in the list, make it the first character
 202 (following a possible
 203 .Sq ^ ) .
 204 To include a literal
 205 .Sq - ,
 206 make it the first or last character,
 207 or the second endpoint of a range.
 208 To use a literal
 209 .Sq -
 210 as the first endpoint of a range,
 211 enclose it in
 212 .Sq [.
 213 and
 214 .Sq .]
 215 to make it a collating element (see below).
 216 With the exception of these and some combinations using
 217 .Sq [
 218 (see next paragraphs),
 219 all other special characters, including
 220 .Sq \e ,
 221 lose their special significance within a bracket expression.
 222 .Pp
 223 Within a bracket expression, a collating element
 224 (a character,
 225 a multi-character sequence that collates as if it were a single character,
 226 or a collating-sequence name for either)
 227 enclosed in
 228 .Sq [.
 229 and
 230 .Sq .]
 231 stands for the sequence of characters of that collating element.
 232 The sequence is a single element of the bracket expression's list.
 233 A bracket expression containing a multi-character collating element
 234 can thus match more than one character,
 235 e.g. if the collating sequence includes a
 236 .Sq ch
 237 collating element,
 238 then the RE
 239 .Sq [[.ch.]]*c
 240 matches the first five characters of
 241 .Sq chchcc .
 242 .Pp
 243 Within a bracket expression, a collating element enclosed in
 244 .Sq [=
 245 and
 246 .Sq =]
 247 is an equivalence class, standing for the sequences of characters
 248 of all collating elements equivalent to that one, including itself.
 249 (If there are no other equivalent collating elements,
 250 the treatment is as if the enclosing delimiters were
 251 .Sq [.
 252 and
 253 .Sq .] . )
 254 For example, if
 255 .Sq x
 256 and
 257 .Sq y
 258 are the members of an equivalence class,
 259 then
 260 .Sq [[=x=]] ,
 261 .Sq [[=y=]] ,
 262 and
 263 .Sq [xy]
 264 are all synonymous.
 265 An equivalence class may not** be an endpoint of a range.
 266 .Pp
 267 Within a bracket expression, the name of a
 268 .Em character class
 269 enclosed
 270 in
 271 .Sq [:
 272 and
 273 .Sq :]
 274 stands for the list of all characters belonging to that class.
 275 Standard character class names are:
 276 .Bd -literal -offset indent
 277 alnum   digit   punct
 278 alpha   graph   space
 279 blank   lower   upper
 280 cntrl   print   xdigit
 281 .Ed
 282 .Pp
 283 These stand for the character classes defined in
 284 .Xr ctype 3 .
 285 A locale may provide others.
 286 A character class may not be used as an endpoint of a range.
 287 .Pp
 288 There are two special cases** of bracket expressions:
 289 the bracket expressions
 290 .Sq [[:<:]]
 291 and
 292 .Sq [[:>:]]
 293 match the null string at the beginning and end of a word, respectively.
 294 A word is defined as a sequence of
 295 characters starting and ending with a word character
 296 which is neither preceded nor followed by
 297 word characters.
 298 A word character is an
 299 .Em alnum
 300 character (as defined by
 301 .Xr ctype 3 )
 302 or an underscore.
 303 This is an extension,
 304 compatible with but not specified by POSIX,
 305 and should be used with
 306 caution in software intended to be portable to other systems.
 307 .Pp
 308 In the event that an RE could match more than one substring of a given
 309 string,
 310 the RE matches the one starting earliest in the string.
 311 If the RE could match more than one substring starting at that point,
 312 it matches the longest.
 313 Subexpressions also match the longest possible substrings, subject to
 314 the constraint that the whole match be as long as possible,
 315 with subexpressions starting earlier in the RE taking priority over
 316 ones starting later.
 317 Note that higher-level subexpressions thus take priority over
 318 their lower-level component subexpressions.
 319 .Pp
 320 Match lengths are measured in characters, not collating elements.
 321 A null string is considered longer than no match at all.
 322 For example,
 323 .Sq bb*
 324 matches the three middle characters of
 325 .Sq abbbc ;
 326 .Sq (wee|week)(knights|nights)
 327 matches all ten characters of
 328 .Sq weeknights ;
 329 when
 330 .Sq (.*).*
 331 is matched against
 332 .Sq abc ,
 333 the parenthesized subexpression matches all three characters;
 334 and when
 335 .Sq (a*)*
 336 is matched against
 337 .Sq bc ,
 338 both the whole RE and the parenthesized subexpression match the null string.
 339 .Pp
 340 If case-independent matching is specified,
 341 the effect is much as if all case distinctions had vanished from the
 342 alphabet.
 343 When an alphabetic that exists in multiple cases appears as an
 344 ordinary character outside a bracket expression, it is effectively
 345 transformed into a bracket expression containing both cases,
 346 e.g.\&
 347 .Sq x
 348 becomes
 349 .Sq [xX] .
 350 When it appears inside a bracket expression,
 351 all case counterparts of it are added to the bracket expression,
 352 so that, for example,
 353 .Sq [x]
 354 becomes
 355 .Sq [xX]
 356 and
 357 .Sq [^x]
 358 becomes
 359 .Sq [^xX] .
 360 .Pp
 361 No particular limit is imposed on the length of REs**.
 362 Programs intended to be portable should not employ REs longer
 363 than 256 bytes,
 364 as an implementation can refuse to accept such REs and remain
 365 POSIX-compliant.
 366 .Pp
 367 The following is a list of extended regular expressions:
 368 .Bl -tag -width Ds
 369 .It Ar c
 370 Any character
 371 .Ar c
 372 not listed below matches itself.
 373 .It \e Ns Ar c
 374 Any backslash-escaped character
 375 .Ar c
 376 matches itself.
 377 .It \&.
 378 Matches any single character that is not a newline
 379 .Pq Sq \en .
 380 .It Bq Ar char-class
 381 Matches any single character in
 382 .Ar char-class .
 383 To include a
 384 .Ql \&]
 385 in
 386 .Ar char-class ,
 387 it must be the first character.
 388 A range of characters may be specified by separating the end characters
 389 of the range with a
 390 .Ql - ;
 391 e.g.\&
 392 .Ar a-z
 393 specifies the lower case characters.
 394 The following literal expressions can also be used in
 395 .Ar char-class
 396 to specify sets of characters:
 397 .Bd -unfilled -offset indent
 398 [:alnum:] [:cntrl:] [:lower:] [:space:]
 399 [:alpha:] [:digit:] [:print:] [:upper:]
 400 [:blank:] [:graph:] [:punct:] [:xdigit:]
 401 .Ed
 402 .Pp
 403 If
 404 .Ql -
 405 appears as the first or last character of
 406 .Ar char-class ,
 407 then it matches itself.
 408 All other characters in
 409 .Ar char-class
 410 match themselves.
 411 .Pp
 412 Patterns in
 413 .Ar char-class
 414 of the form
 415 .Eo [.
 416 .Ar col-elm
 417 .Ec .]\&
 418 or
 419 .Eo [=
 420 .Ar col-elm
 421 .Ec =]\& ,
 422 where
 423 .Ar col-elm
 424 is a collating element, are interpreted according to
 425 .Xr setlocale 3
 426 .Pq not currently supported .
 427 .It Bq ^ Ns Ar char-class
 428 Matches any single character, other than newline, not in
 429 .Ar char-class .
 430 .Ar char-class
 431 is defined as above.
 432 .It ^
 433 If
 434 .Sq ^
 435 is the first character of a regular expression, then it
 436 anchors the regular expression to the beginning of a line.
 437 Otherwise, it matches itself.
 438 .It $
 439 If
 440 .Sq $
 441 is the last character of a regular expression,
 442 it anchors the regular expression to the end of a line.
 443 Otherwise, it matches itself.
 444 .It [[:<:]]
 445 Anchors the single character regular expression or subexpression
 446 immediately following it to the beginning of a word.
 447 .It [[:>:]]
 448 Anchors the single character regular expression or subexpression
 449 immediately following it to the end of a word.
 450 .It Pq Ar re
 451 Defines a subexpression
 452 .Ar re .
 453 Any set of characters enclosed in parentheses
 454 matches whatever the set of characters without parentheses matches
 455 (that is a long-winded way of saying the constructs
 456 .Sq (re)
 457 and
 458 .Sq re
 459 match identically).
 460 .It *
 461 Matches the single character regular expression or subexpression
 462 immediately preceding it zero or more times.
 463 If
 464 .Sq *
 465 is the first character of a regular expression or subexpression,
 466 then it matches itself.
 467 The
 468 .Sq *
 469 operator sometimes yields unexpected results.
 470 For example, the regular expression
 471 .Ar b*
 472 matches the beginning of the string
 473 .Qq abbb
 474 (as opposed to the substring
 475 .Qq bbb ) ,
 476 since a null match is the only leftmost match.
 477 .It +
 478 Matches the singular character regular expression
 479 or subexpression immediately preceding it
 480 one or more times.
 481 .It ?
 482 Matches the singular character regular expression
 483 or subexpression immediately preceding it
 484 0 or 1 times.
 485 .Sm off
 486 .It Xo
 487 .Pf { Ar n , m No }\ \&
 488 .Pf { Ar n , No }\ \&
 489 .Pf { Ar n No }
 490 .Xc
 491 .Sm on
 492 Matches the single character regular expression or subexpression
 493 immediately preceding it at least
 494 .Ar n
 495 and at most
 496 .Ar m
 497 times.
 498 If
 499 .Ar m
 500 is omitted, then it matches at least
 501 .Ar n
 502 times.
 503 If the comma is also omitted, then it matches exactly
 504 .Ar n
 505 times.
 506 .It \*(Ba
 507 Used to separate patterns.
 508 For example,
 509 the pattern
 510 .Sq cat\*(Badog
 511 matches either
 512 .Sq cat
 513 or
 514 .Sq dog .
 515 .El
 516 .Sh BASIC REGULAR EXPRESSIONS
 517 Basic regular expressions differ in several respects:
 518 .Bl -bullet -offset 3n
 519 .It
 520 .Sq \*(Ba ,
 521 .Sq + ,
 522 and
 523 .Sq ?\&
 524 are ordinary characters and there is no equivalent
 525 for their functionality.
 526 .It
 527 The delimiters for bounds are
 528 .Sq \e{
 529 and
 530 .Sq \e} ,
 531 with
 532 .Sq {
 533 and
 534 .Sq }
 535 by themselves ordinary characters.
 536 .It
 537 The parentheses for nested subexpressions are
 538 .Sq \e(
 539 and
 540 .Sq \e) ,
 541 with
 542 .Sq (
 543 and
 544 .Sq )\&
 545 by themselves ordinary characters.
 546 .It
 547 .Sq ^
 548 is an ordinary character except at the beginning of the
 549 RE or** the beginning of a parenthesized subexpression.
 550 .It
 551 .Sq $
 552 is an ordinary character except at the end of the
 553 RE or** the end of a parenthesized subexpression.
 554 .It
 555 .Sq *
 556 is an ordinary character if it appears at the beginning of the
 557 RE or the beginning of a parenthesized subexpression
 558 (after a possible leading
 559 .Sq ^ ) .
 560 .It
 561 Finally, there is one new type of atom, a
 562 .Em back-reference :
 563 .Sq \e
 564 followed by a non-zero decimal digit
 565 .Ar d
 566 matches the same sequence of characters matched by the
 567 .Ar d Ns th
 568 parenthesized subexpression
 569 (numbering subexpressions by the positions of their opening parentheses,
 570 left to right),
 571 so that, for example,
 572 .Sq \e([bc]\e)\e1
 573 matches
 574 .Sq bb\&
 575 or
 576 .Sq cc
 577 but not
 578 .Sq bc .
 579 .El
 580 .Pp
 581 The following is a list of basic regular expressions:
 582 .Bl -tag -width Ds
 583 .It Ar c
 584 Any character
 585 .Ar c
 586 not listed below matches itself.
 587 .It \e Ns Ar c
 588 Any backslash-escaped character
 589 .Ar c ,
 590 except for
 591 .Sq { ,
 592 .Sq } ,
 593 .Sq \&( ,
 594 and
 595 .Sq \&) ,
 596 matches itself.
 597 .It \&.
 598 Matches any single character that is not a newline
 599 .Pq Sq \en .
 600 .It Bq Ar char-class
 601 Matches any single character in
 602 .Ar char-class .
 603 To include a
 604 .Ql \&]
 605 in
 606 .Ar char-class ,
 607 it must be the first character.
 608 A range of characters may be specified by separating the end characters
 609 of the range with a
 610 .Ql - ;
 611 e.g.\&
 612 .Ar a-z
 613 specifies the lower case characters.
 614 The following literal expressions can also be used in
 615 .Ar char-class
 616 to specify sets of characters:
 617 .Bd -unfilled -offset indent
 618 [:alnum:] [:cntrl:] [:lower:] [:space:]
 619 [:alpha:] [:digit:] [:print:] [:upper:]
 620 [:blank:] [:graph:] [:punct:] [:xdigit:]
 621 .Ed
 622 .Pp
 623 If
 624 .Ql -
 625 appears as the first or last character of
 626 .Ar char-class ,
 627 then it matches itself.
 628 All other characters in
 629 .Ar char-class
 630 match themselves.
 631 .Pp
 632 Patterns in
 633 .Ar char-class
 634 of the form
 635 .Eo [.
 636 .Ar col-elm
 637 .Ec .]\&
 638 or
 639 .Eo [=
 640 .Ar col-elm
 641 .Ec =]\& ,
 642 where
 643 .Ar col-elm
 644 is a collating element, are interpreted according to
 645 .Xr setlocale 3
 646 .Pq not currently supported .
 647 .It Bq ^ Ns Ar char-class
 648 Matches any single character, other than newline, not in
 649 .Ar char-class .
 650 .Ar char-class
 651 is defined as above.
 652 .It ^
 653 If
 654 .Sq ^
 655 is the first character of a regular expression, then it
 656 anchors the regular expression to the beginning of a line.
 657 Otherwise, it matches itself.
 658 .It $
 659 If
 660 .Sq $
 661 is the last character of a regular expression,
 662 it anchors the regular expression to the end of a line.
 663 Otherwise, it matches itself.
 664 .It [[:<:]]
 665 Anchors the single character regular expression or subexpression
 666 immediately following it to the beginning of a word.
 667 .It [[:>:]]
 668 Anchors the single character regular expression or subexpression
 669 immediately following it to the end of a word.
 670 .It \e( Ns Ar re Ns \e)
 671 Defines a subexpression
 672 .Ar re .
 673 Subexpressions may be nested.
 674 A subsequent backreference of the form
 675 .Pf \e Ns Ar n ,
 676 where
 677 .Ar n
 678 is a number in the range [1,9], expands to the text matched by the
 679 .Ar n Ns th
 680 subexpression.
 681 For example, the regular expression
 682 .Ar \e(.*\e)\e1
 683 matches any string consisting of identical adjacent substrings.
 684 Subexpressions are ordered relative to their left delimiter.
 685 .It *
 686 Matches the single character regular expression or subexpression
 687 immediately preceding it zero or more times.
 688 If
 689 .Sq *
 690 is the first character of a regular expression or subexpression,
 691 then it matches itself.
 692 The
 693 .Sq *
 694 operator sometimes yields unexpected results.
 695 For example, the regular expression
 696 .Ar b*
 697 matches the beginning of the string
 698 .Qq abbb
 699 (as opposed to the substring
 700 .Qq bbb ) ,
 701 since a null match is the only leftmost match.
 702 .Sm off
 703 .It Xo
 704 .Pf \e{ Ar n , m No \e}\ \&
 705 .Pf \e{ Ar n , No \e}\ \&
 706 .Pf \e{ Ar n No \e}
 707 .Xc
 708 .Sm on
 709 Matches the single character regular expression or subexpression
 710 immediately preceding it at least
 711 .Ar n
 712 and at most
 713 .Ar m
 714 times.
 715 If
 716 .Ar m
 717 is omitted, then it matches at least
 718 .Ar n
 719 times.
 720 If the comma is also omitted, then it matches exactly
 721 .Ar n
 722 times.
 723 .El
 724 .Sh SEE ALSO
 725 .Xr ctype 3 ,
 726 .Xr regex 3
 727 .Sh STANDARDS
 728 .St -p1003.1-2004 :
 729 Base Definitions, Chapter 9 (Regular Expressions).
 730 .Sh BUGS
 731 Having two kinds of REs is a botch.
 732 .Pp
 733 The current POSIX spec says that
 734 .Sq )\&
 735 is an ordinary character in the absence of an unmatched
 736 .Sq ( ;
 737 this was an unintentional result of a wording error,
 738 and change is likely.
 739 Avoid relying on it.
 740 .Pp
 741 Back-references are a dreadful botch,
 742 posing major problems for efficient implementations.
 743 They are also somewhat vaguely defined
 744 (does
 745 .Sq a\e(\e(b\e)*\e2\e)*d
 746 match
 747 .Sq abbbd ? ) .
 748 Avoid using them.
 749 .Pp
 750 POSIX's specification of case-independent matching is vague.
 751 The
 752 .Dq one case implies all cases
 753 definition given above
 754 is the current consensus among implementors as to the right interpretation.
 755 .Pp
 756 The syntax for word boundaries is incredibly ugly.