Beyond POSIX

Next: Rx Theory  Prev: Posix Entry Points

This section is not finished documentation, but rather a collection of pointers towards some of the interesting, non-standard features of Rx.

New Regexp Operators

Rx supports some unusual regexp syntax.

[[:cut N:]] sets pmatch[0].final_tag to N and causes the matching to stop instantly. If N is 0, the overall match fails, otherwise it succeeds.

[[:(:]] ... [[:):]] is just like \( ... \) except that in the first case, no pmatch entries are changed, and the subexpression is not counted in the numbering of parenthesized subexpressions.

[[:(:]] ... [[:):]] can be used when you do not need to know where a subexpression matched but are only using parentheses to effect the parsing of the regexp.

There are two reasons to use [[:(:]] ... [[:):]]:

1. regexec will run faster.

2. Currently, only 8 backreferencable subexpressions are supported: \1 .. \9. Using [[:(:]] ... [[:):]] is a way to conserve backreferencable subexpression names in an expression with many parentheses.

New POSIX Functions

regncomp and regnexec are non-standard generalizations of regcomp and regexec.

Tuning POSIX performance

Two mysterious parmaters can be used to trade-off performance and memory use.

At compile-time they are RX_DEFAULT_DFA_CACHE_SIZE and RX_DEFAULT_NFA_DELAY.

If you want to mess with these (I generally don't advise it), I suggest experimenting for your particular application/memory situation; frob these by powers of two and try out the results on what you expect will be typical regexp workloads.

You can also set those parameters at run-time (before calling any regexp functions) by tweaking the corresponding variables:

rx_default_cache->bytes_allowed

and

rx_basic_unfaniverse_delay

POSIX stream-style interface

rx_make_solutions, rx_next_solution, and rx_free_solutions are a lower level alternative to the posix functions. Using those functions, you can compare a compiled regexp to a string that is not contiguous in memory or even a string that is not entirely in memory at any one time.

The code in rxposix.c points out how those functions are used.

DFAs Directly

If you are only interested in pure regular expressions (no pmatch data, no backreferences, and no counted subexpressions), you can parse a regexp using rx_parse, convert it to an nfa using rx_unfa, and run the dfa using rx_init_system, rx_advance_to_final, and rx_terminate_system. The dfa Scheme primitives in `rgx.c' may provide some guide.