Module Pcre

Perl Compatibility Regular Expressions for OCaml

8.0.1 - homepage

Exceptions

type error =
  1. | Partial
    (*

    String only matched the pattern partially

    *)
  2. | BadPartial
    (*

    Pattern contains items that cannot be used together with partial matching.

    *)
  3. | BadPattern of string * int
    (*

    BadPattern (msg, pos) regular expression is malformed. The reason is in msg, the position of the error in the pattern in pos.

    *)
  4. | BadUTF8
    (*

    UTF8 string being matched is invalid

    *)
  5. | BadUTF8Offset
    (*

    Gets raised when a UTF8 string being matched with offset is invalid.

    *)
  6. | MatchLimit
    (*

    Maximum allowed number of match attempts with backtracking or recursion is reached during matching. ALL FUNCTIONS CALLING THE MATCHING ENGINE MAY RAISE IT!!!

    *)
  7. | RecursionLimit
  8. | WorkspaceSize
    (*

    Raised by pcre_dfa_exec when the provided workspace array is too small. See documention on pcre_dfa_exec for details on workspace array sizing.

    *)
  9. | InternalError of string
    (*

    InternalError msg C-library exhibits unknown/undefined behaviour. The reason is in msg.

    *)
exception Error of error

Exception indicating PCRE errors.

exception Backtrack

Backtrack used in callout functions to force backtracking.

exception Regexp_or of string * error

Regexp_or (pat, error) gets raised for sub-pattern pat by regexp_or if it failed to compile.

Compilation and runtime flags and their conversion functions

type icflag

Internal representation of compilation flags

and irflag

Internal representation of runtime flags

and cflag = [
  1. | `CASELESS
    (*

    Case insensitive matching

    *)
  2. | `MULTILINE
    (*

    '^' and '$' match before/after newlines, not just at the beginning/end of a string

    *)
  3. | `DOTALL
    (*

    '.' matches all characters (newlines, too)

    *)
  4. | `EXTENDED
    (*

    Ignores whitespace and PERL-comments. Behaves like the '/x'-option in PERL

    *)
  5. | `ANCHORED
    (*

    Pattern matches only at start of string

    *)
  6. | `DOLLAR_ENDONLY
    (*

    '$' in pattern matches only at end of string

    *)
  7. | `EXTRA
    (*

    Reserved for future extensions of PCRE

    *)
  8. | `UNGREEDY
    (*

    Quantifiers not greedy anymore, only if followed by '?'

    *)
  9. | `UTF8
    (*

    Treats patterns and strings as UTF8 characters.

    *)
  10. | `NO_UTF8_CHECK
    (*

    Turns off validity checks on UTF8 strings for efficiency reasons. WARNING: invalid UTF8 strings may cause a crash then!

    *)
  11. | `NO_AUTO_CAPTURE
    (*

    Disables the use of numbered capturing parentheses

    *)
  12. | `AUTO_CALLOUT
    (*

    Automatically inserts callouts with id 255 before each pattern item

    *)
  13. | `FIRSTLINE
    (*

    Unanchored patterns must match before/at first NL

    *)
]

Compilation flags

val cflags : cflag list -> icflag

cflags cflag_list converts a list of compilation flags to their internal representation.

val cflag_list : icflag -> cflag list

cflag_list cflags converts internal representation of compilation flags to a list.

type rflag = [
  1. | `ANCHORED
    (*

    Treats pattern as if it were anchored

    *)
  2. | `NOTBOL
    (*

    Beginning of string is not treated as beginning of line

    *)
  3. | `NOTEOL
    (*

    End of string is not treated as end of line

    *)
  4. | `NOTEMPTY
    (*

    Empty strings are not considered to be a valid match

    *)
  5. | `PARTIAL
    (*

    Turns on partial matching

    *)
  6. | `DFA_RESTART
    (*

    Causes matching to proceed presuming the subject string is further to one partially matched previously using the same int-array working set. May only be used with pcre_dfa_exec or unsafe_pcre_dfa_exec, and should always be paired with `PARTIAL.

    *)
]

Runtime flags

val rflags : rflag list -> irflag

rflags rflag_list converts a list of runtime flags to their internal representation.

val rflag_list : irflag -> rflag list

rflag_list rflags converts internal representation of runtime flags to a list.

Information on the PCRE-configuration (build-time options)

val version : string

Version information

Version of the PCRE-C-library

val config_utf8 : bool

Indicates whether UTF8-support is enabled

val config_newline : char

Character used as newline

Number of bytes used for internal linkage of regular expressions

val config_match_limit : int

Default limit for calls to internal matching function

val config_match_limit_recursion : int

Default limit recursion for calls to internal matching function

val config_stackrecurse : bool

Indicates use of stack recursion in matching function

Information on patterns

type firstbyte_info = [
  1. | `Char of char
    (*

    Fixed first character

    *)
  2. | `Start_only
    (*

    Pattern matches at beginning and end of newlines

    *)
  3. | `ANCHORED
    (*

    Pattern is anchored

    *)
]

Information on matching of "first chars" in patterns

type study_stat = [
  1. | `Not_studied
    (*

    Pattern has not yet been studied

    *)
  2. | `Studied
    (*

    Pattern has been studied successfully

    *)
  3. | `Optimal
    (*

    Pattern could not be improved by studying

    *)
]

Information on the study status of patterns

type regexp

Compiled regular expressions

val options : regexp -> icflag

options regexp

  • returns

    compilation flags of regexp.

val size : regexp -> int

size regexp

  • returns

    memory size of regexp.

val studysize : regexp -> int

studysize regexp

  • returns

    memory size of study information of regexp.

val capturecount : regexp -> int

capturecount regexp

  • returns

    number of capturing subpatterns in regexp.

val backrefmax : regexp -> int

backrefmax regexp

  • returns

    number of highest backreference in regexp.

val namecount : regexp -> int

namecount regexp

  • returns

    number of named subpatterns in regexp.

val nameentrysize : regexp -> int

nameentrysize regexp

  • returns

    size of longest name of named subpatterns in regexp + 3.

val names : regexp -> string array

names regex

  • returns

    array of names of named substrings in regexp.

val firstbyte : regexp -> firstbyte_info

firstbyte regexp

  • returns

    firstbyte info on regexp.

val firsttable : regexp -> string option

firsttable regexp

  • returns

    some 256-bit (32-byte) fixed set table in form of a string for regexp if available, None otherwise.

val lastliteral : regexp -> char option

lastliteral regexp

  • returns

    some last matching character of regexp if available, None otherwise.

val study_stat : regexp -> study_stat

study_stat regexp

  • returns

    study status of regexp.

val get_stringnumber : regexp -> string -> int

get_stringnumber rex name

  • returns

    the index of the named substring name in regular expression rex. This index can then be used with get_substring.

  • raises Invalid_arg

    if there is no such named substring.

val get_match_limit : regexp -> int option

get_match_limit rex

  • returns

    some match limit of regular expression rex or None.

val get_match_limit_recursion : regexp -> int option

get_match_limit_recursion rex

  • returns

    some recursion match limit of regular expression rex or None.

Compilation of patterns

type chtables

Alternative set of char tables for pattern matching

val maketables : unit -> chtables

Generates new set of char tables for the current locale.

val regexp : ?study:bool -> ?jit_compile:bool -> ?limit:int -> ?limit_recursion:int -> ?iflags:icflag -> ?flags:cflag list -> ?chtables:chtables -> string -> regexp

regexp ?jit_compile ?study ?limit ?limit_recursion ?iflags ?flags ?chtables pattern compiles pattern with flags when given, with iflags otherwise, and with char tables chtables. If study is true, then the resulting regular expression will be studied. If jit_compile is true, studying will also perform JIT-compilation of the pattern. If limit is specified, this sets a limit to the amount of recursion and backtracking (only lower than the builtin default!). If this limit is exceeded, MatchLimit will be raised during matching.

  • parameter study

    default = true

  • parameter jit_compile

    default = false

  • parameter limit

    default = no extra limit other than default

  • parameter limit_recursion

    default = no extra limit_recursion other than default

  • parameter iflags

    default = no extra flags

  • parameter flags

    default = ignored

  • parameter chtables

    default = builtin char tables

  • returns

    the regular expression.

    For detailed documentation on how you can specify PERL-style regular expressions (= patterns), please consult the PCRE-documentation ("man pcrepattern") or PERL-manuals.

val regexp_or : ?study:bool -> ?jit_compile:bool -> ?limit:int -> ?limit_recursion:int -> ?iflags:icflag -> ?flags:cflag list -> ?chtables:chtables -> string list -> regexp

regexp_or ?study ?limit ?limit_recursion ?iflags ?flags ?chtables patterns like regexp, but combines patterns as alternatives (or-patterns) into one regular expression.

val quote : string -> string

quote str

  • returns

    the quoted string of str.

Subpattern extraction

type substrings

Information on substrings after pattern matching

val get_subject : substrings -> string

get_subject substrings

  • returns

    the subject string of substrings.

val num_of_subs : substrings -> int

num_of_subs substrings

  • returns

    number of strings in substrings (whole match inclusive).

val get_substring : substrings -> int -> string

get_substring substrings n

  • returns

    the nth substring (0 is whole match) of substrings.

  • raises Invalid_argument

    if n is not in the range of the number of substrings.

  • raises Not_found

    if the corresponding subpattern did not capture a substring.

val get_substring_ofs : substrings -> int -> int * int

get_substring_ofs substrings n

  • returns

    the offset tuple of the nth substring of substrings (0 is whole match).

  • raises Invalid_argument

    if n is not in the range of the number of substrings.

  • raises Not_found

    if the corresponding subpattern did not capture a substring.

val get_substrings : ?full_match:bool -> substrings -> string array

get_substrings ?full_match substrings

  • returns

    the array of substrings in substrings. It includes the full match at index 0 when full_match is true, the captured substrings only when it is false. If a subpattern did not capture a substring, the empty string is returned in the corresponding position instead.

  • parameter full_match

    default = true

val get_opt_substrings : ?full_match:bool -> substrings -> string option array

get_opt_substrings ?full_match substrings

  • returns

    the array of optional substrings in substrings. It includes Some full_match_str at index 0 when full_match is true, Some captured_substrings only when it is false. If a subpattern did not capture a substring, None is returned in the corresponding position instead.

  • parameter full_match

    default = true

val get_named_substring : regexp -> string -> substrings -> string

get_named_substring rex name substrings

  • returns

    the named substring name in regular expression rex and substrings.

  • raises Invalid_argument

    if there is no such named substring.

  • raises Not_found

    if the corresponding subpattern did not capture a substring.

val get_named_substring_ofs : regexp -> string -> substrings -> int * int

get_named_substring_ofs rex name substrings

  • returns

    the offset tuple of the named substring name in regular expression rex and substrings.

  • raises Invalid_argument

    if there is no such named substring.

  • raises Not_found

    if the corresponding subpattern did not capture a substring.

Callouts

type callout_data = {
  1. callout_number : int;
    (*

    Callout number

    *)
  2. substrings : substrings;
    (*

    Substrings matched so far

    *)
  3. start_match : int;
    (*

    Subject start offset of current match attempt

    *)
  4. current_position : int;
    (*

    Subject offset of current match pointer

    *)
  5. capture_top : int;
    (*

    Number of the highest captured substring so far

    *)
  6. capture_last : int;
    (*

    Number of the most recently captured substring

    *)
  7. pattern_position : int;
    (*

    Offset of next match item in pattern string

    *)
  8. next_item_length : int;
    (*

    Length of next match item in pattern string

    *)
}
type callout = callout_data -> unit

Type of callout functions

Callouts are referred to in patterns as "(?Cn)" where "n" is a callout_number ranging from 0 to 255. Substrings captured so far are accessible as usual via substrings. You will have to consider capture_top and capture_last to know about the current state of valid substrings.

By raising exception Backtrack within a callout function, the user can force the pattern matching engine to backtrack to other possible solutions. Other exceptions will terminate matching immediately and return control to OCaml.

Matching of patterns and subpattern extraction

val pcre_exec : ?iflags:irflag -> ?flags:rflag list -> ?rex:regexp -> ?pat:string -> ?pos:int -> ?callout:callout -> string -> int array

pcre_exec ?iflags ?flags ?rex ?pat ?pos ?callout subj

  • returns

    an array of offsets that describe the position of matched subpatterns in the string subj starting at position pos with pattern pat when given, regular expression rex otherwise. The array also contains additional workspace needed by the match engine. Uses flags when given, the precompiled iflags otherwise. Callouts are handled by callout.

  • parameter iflags

    default = no extra flags

  • parameter flags

    default = ignored

  • parameter rex

    default = matches whitespace

  • parameter pat

    default = ignored

  • parameter pos

    default = 0

  • parameter callout

    default = ignore callouts

  • raises Not_found

    if pattern does not match.

val pcre_dfa_exec : ?iflags:irflag -> ?flags:rflag list -> ?rex:regexp -> ?pat:string -> ?pos:int -> ?callout:callout -> ?workspace:int array -> string -> int array

pcre_dfa_exec ?iflags ?flags ?rex ?pat ?pos ?callout ?workspace subj invokes the "alternative" DFA matching function.

  • returns

    an array of offsets that describe the position of matched subpatterns in the string subj starting at position pos with pattern pat when given, regular expression rex otherwise. The array also contains additional workspace needed by the match engine. Uses flags when given, the precompiled iflags otherwise. Requires a sufficiently-large workspace array. Callouts are handled by callout.

    Note that the returned array of offsets are quite different from those returned by pcre_exec et al. The motivating use case for the DFA match function is to be able to restart a partial match with N additional input segments. Because the match function/workspace does not store segments seen previously, the offsets returned when a match completes will refer only to the matching portion of the last subject string provided. Thus, returned offsets from this function should not be used to support extracting captured submatches. If you need to capture submatches from a series of inputs incrementally matched with this function, you'll need to concatenate those inputs that yield a successful match here and re-run the same pattern against that single subject string.

    Aside from an absolute minimum of 20, PCRE does not provide any guidance regarding the size of workspace array needed by any given pattern. Therefore, it is wise to appropriately handle the possible WorkspaceSize error. If raised, you can allocate a new, larger workspace array and begin the DFA matching process again.

  • parameter iflags

    default = no extra flags

  • parameter flags

    default = ignored

  • parameter rex

    default = matches whitespace

  • parameter pat

    default = ignored

  • parameter pos

    default = 0

  • parameter callout

    default = ignore callouts

  • parameter workspace

    default = fresh array of length 20

  • raises Not_found

    if the pattern match has failed

  • raises Error

    Partial if the pattern has matched partially; a subsequent exec call with the same pattern and workspace (adding the DFA_RESTART flag) be made to either further advance or complete the partial match.

  • raises Error

    WorkspaceSize if the workspace array is too small to accommodate the DFA state required by the supplied pattern

val exec : ?iflags:irflag -> ?flags:rflag list -> ?rex:regexp -> ?pat:string -> ?pos:int -> ?callout:callout -> string -> substrings

exec ?iflags ?flags ?rex ?pat ?pos ?callout subj

  • returns

    substring information on string subj starting at position pos with pattern pat when given, regular expression rex otherwise. Uses flags when given, the precompiled iflags otherwise. Callouts are handled by callout.

  • parameter iflags

    default = no extra flags

  • parameter flags

    default = ignored

  • parameter rex

    default = matches whitespace

  • parameter pat

    default = ignored

  • parameter pos

    default = 0

  • parameter callout

    default = ignore callouts

  • raises Not_found

    if pattern does not match.

val exec_all : ?iflags:irflag -> ?flags:rflag list -> ?rex:regexp -> ?pat:string -> ?pos:int -> ?callout:callout -> string -> substrings array

exec_all ?iflags ?flags ?rex ?pat ?pos ?callout subj

  • returns

    an array of substring information of all matching substrings in string subj starting at position pos with pattern pat when given, regular expression rex otherwise. Uses flags when given, the precompiled iflags otherwise. Callouts are handled by callout.

  • parameter iflags

    default = no extra flags

  • parameter flags

    default = ignored

  • parameter rex

    default = matches whitespace

  • parameter pat

    default = ignored

  • parameter pos

    default = 0

  • parameter callout

    default = ignore callouts

  • raises Not_found

    if pattern does not match.

val next_match : ?iflags:irflag -> ?flags:rflag list -> ?rex:regexp -> ?pat:string -> ?pos:int -> ?callout:callout -> substrings -> substrings

next_match ?iflags ?flags ?rex ?pat ?pos ?callout substrs

  • returns

    substring information on the match that follows on the last match denoted by substrs, jumping over pos characters (also backwards!), using pattern pat when given, regular expression rex otherwise. Uses flags when given, the precompiled iflags otherwise. Callouts are handled by callout.

  • parameter iflags

    default = no extra flags

  • parameter flags

    default = ignored

  • parameter rex

    default = matches whitespace

  • parameter pat

    default = ignored

  • parameter pos

    default = 0

  • parameter callout

    default = ignore callouts

  • raises Not_found

    if pattern does not match.

  • raises Invalid_arg

    if pos let matching start outside of the subject string.

val extract : ?iflags:irflag -> ?flags:rflag list -> ?rex:regexp -> ?pat:string -> ?pos:int -> ?full_match:bool -> ?callout:callout -> string -> string array

extract ?iflags ?flags ?rex ?pat ?pos ?full_match ?callout subj

  • returns

    the array of substrings that match subj starting at position pos, using pattern pat when given, regular expression rex otherwise. Uses flags when given, the precompiled iflags otherwise. It includes the full match at index 0 when full_match is true, the captured substrings only when it is false. Callouts are handled by callout. If a subpattern did not capture a substring, the empty string is returned in the corresponding position instead.

  • parameter iflags

    default = no extra flags

  • parameter flags

    default = ignored

  • parameter rex

    default = matches whitespace

  • parameter pat

    default = ignored

  • parameter pos

    default = 0

  • parameter full_match

    default = true

  • parameter callout

    default = ignore callouts

  • raises Not_found

    if pattern does not match.

val extract_opt : ?iflags:irflag -> ?flags:rflag list -> ?rex:regexp -> ?pat:string -> ?pos:int -> ?full_match:bool -> ?callout:callout -> string -> string option array

extract_opt ?iflags ?flags ?rex ?pat ?pos ?full_match ?callout subj

  • returns

    the array of optional substrings that match subj starting at position pos, using pattern pat when given, regular expression rex otherwise. Uses flags when given, the precompiled iflags otherwise. It includes Some full_match_str at index 0 when full_match is true, Some captured-substrings only when it is false. Callouts are handled by callout. If a subpattern did not capture a substring, None is returned in the corresponding position instead.

  • parameter iflags

    default = no extra flags

  • parameter flags

    default = ignored

  • parameter rex

    default = matches whitespace

  • parameter pat

    default = ignored

  • parameter pos

    default = 0

  • parameter full_match

    default = true

  • parameter callout

    default = ignore callouts

  • raises Not_found

    if pattern does not match.

val extract_all : ?iflags:irflag -> ?flags:rflag list -> ?rex:regexp -> ?pat:string -> ?pos:int -> ?full_match:bool -> ?callout:callout -> string -> string array array

extract_all ?iflags ?flags ?rex ?pat ?pos ?full_match ?callout subj

  • returns

    an array of arrays of all matching substrings that match subj starting at position pos, using pattern pat when given, regular expression rex otherwise. Uses flags when given, the precompiled iflags otherwise. It includes the full match at index 0 of the extracted string arrays when full_match is true, the captured substrings only when it is false. Callouts are handled by callout.

  • parameter iflags

    default = no extra flags

  • parameter flags

    default = ignored

  • parameter rex

    default = matches whitespace

  • parameter pat

    default = ignored

  • parameter pos

    default = 0

  • parameter full_match

    default = true

  • parameter callout

    default = ignore callouts

  • raises Not_found

    if pattern does not match.

val extract_all_opt : ?iflags:irflag -> ?flags:rflag list -> ?rex:regexp -> ?pat:string -> ?pos:int -> ?full_match:bool -> ?callout:callout -> string -> string option array array

extract_all_opt ?iflags ?flags ?rex ?pat ?pos ?full_match ?callout subj

  • returns

    an array of arrays of all optional matching substrings that match subj starting at position pos, using pattern pat when given, regular expression rex otherwise. Uses flags when given, the precompiled iflags otherwise. It includes Some full_match_str at index 0 of the extracted string arrays when full_match is true, Some captured_substrings only when it is false. Callouts are handled by callout. If a subpattern did not capture a substring, None is returned in the corresponding position instead.

  • parameter iflags

    default = no extra flags

  • parameter flags

    default = ignored

  • parameter rex

    default = matches whitespace

  • parameter pat

    default = ignored

  • parameter pos

    default = 0

  • parameter full_match

    default = true

  • parameter callout

    default = ignore callouts

  • raises Not_found

    if pattern does not match.

val pmatch : ?iflags:irflag -> ?flags:rflag list -> ?rex:regexp -> ?pat:string -> ?pos:int -> ?callout:callout -> string -> bool

pmatch ?iflags ?flags ?rex ?pat ?pos ?callout subj

  • returns

    true if subj is matched by pattern pat when given, regular expression rex otherwise, starting at position pos. Uses flags when given, the precompiled iflags otherwise. Callouts are handled by callout.

  • parameter iflags

    default = no extra flags

  • parameter flags

    default = ignored

  • parameter rex

    default = matches whitespace

  • parameter pat

    default = ignored

  • parameter pos

    default = 0

  • parameter callout

    default = ignore callouts

String substitution

type substitution

Information on substitution patterns

val subst : string -> substitution

subst str converts the string str representing a substitution pattern to the internal representation

The contents of the substitution string str can be normal text mixed with any of the following (mostly as in PERL):

  • $[0-9]+ - a "$" immediately followed by an arbitrary number. "$0" stands for the name of the executable, any other number for the n-th backreference.
  • $& - the whole matched pattern
  • $` - the text before the match
  • $' - the text after the match
  • $+ - the last group that matched
  • $$ - a single "$"
  • $! - delimiter which does not appear in the substitution. Can be used to part "$0-9+" from an immediately following other number.
val replace : ?iflags:irflag -> ?flags:rflag list -> ?rex:regexp -> ?pat:string -> ?pos:int -> ?itempl:substitution -> ?templ:string -> ?callout:callout -> string -> string

replace ?iflags ?flags ?rex ?pat ?pos ?itempl ?templ ?callout subj replaces all substrings of subj matching pattern pat when given, regular expression rex otherwise, starting at position pos with the substitution string templ when given, itempl otherwise. Uses flags when given, the precompiled iflags otherwise. Callouts are handled by callout.

  • parameter iflags

    default = no extra flags

  • parameter flags

    default = ignored

  • parameter rex

    default = matches whitespace

  • parameter pat

    default = ignored

  • parameter pos

    default = 0

  • parameter itempl

    default = empty string

  • parameter templ

    default = ignored

  • parameter callout

    default = ignore callouts

  • raises Failure

    if there are backreferences to nonexistent subpatterns.

val qreplace : ?iflags:irflag -> ?flags:rflag list -> ?rex:regexp -> ?pat:string -> ?pos:int -> ?templ:string -> ?callout:callout -> string -> string

qreplace ?iflags ?flags ?rex ?pat ?pos ?templ ?callout subj replaces all substrings of subj matching pattern pat when given, regular expression rex otherwise, starting at position pos with the string templ. Uses flags when given, the precompiled iflags otherwise. Callouts are handled by callout.

  • parameter iflags

    default = no extra flags

  • parameter flags

    default = ignored

  • parameter rex

    default = matches whitespace

  • parameter pat

    default = ignored

  • parameter pos

    default = 0

  • parameter templ

    default = ignored

  • parameter callout

    default = ignore callouts

val substitute_substrings : ?iflags:irflag -> ?flags:rflag list -> ?rex:regexp -> ?pat:string -> ?pos:int -> ?callout:callout -> subst:(substrings -> string) -> string -> string

substitute_substrings ?iflags ?flags ?rex ?pat ?pos ?callout ~subst subj replaces all substrings of subj matching pattern pat when given, regular expression rex otherwise, starting at position pos with the result of function subst applied to the substrings of the match. Uses flags when given, the precompiled iflags otherwise. Callouts are handled by callout.

  • parameter iflags

    default = no extra flags

  • parameter flags

    default = ignored

  • parameter rex

    default = matches whitespace

  • parameter pat

    default = ignored

  • parameter pos

    default = 0

  • parameter callout

    default = ignore callouts

val substitute : ?iflags:irflag -> ?flags:rflag list -> ?rex:regexp -> ?pat:string -> ?pos:int -> ?callout:callout -> subst:(string -> string) -> string -> string

substitute ?iflags ?flags ?rex ?pat ?pos ?callout ~subst subj replaces all substrings of subj matching pattern pat when given, regular expression rex otherwise, starting at position pos with the result of function subst applied to the match. Uses flags when given, the precompiled iflags otherwise. Callouts are handled by callout.

  • parameter iflags

    default = no extra flags

  • parameter flags

    default = ignored

  • parameter rex

    default = matches whitespace

  • parameter pat

    default = ignored

  • parameter pos

    default = 0

  • parameter callout

    default = ignore callouts

val replace_first : ?iflags:irflag -> ?flags:rflag list -> ?rex:regexp -> ?pat:string -> ?pos:int -> ?itempl:substitution -> ?templ:string -> ?callout:callout -> string -> string

replace_first ?iflags ?flags ?rex ?pat ?pos ?itempl ?templ ?callout subj replaces the first substring of subj matching pattern pat when given, regular expression rex otherwise, starting at position pos with the substitution string templ when given, itempl otherwise. Uses flags when given, the precompiled iflags otherwise. Callouts are handled by callout.

  • parameter iflags

    default = no extra flags

  • parameter flags

    default = ignored

  • parameter rex

    default = matches whitespace

  • parameter pat

    default = ignored

  • parameter pos

    default = 0

  • parameter itempl

    default = empty string

  • parameter templ

    default = ignored

  • parameter callout

    default = ignore callouts

  • raises Failure

    if there are backreferences to nonexistent subpatterns.

val qreplace_first : ?iflags:irflag -> ?flags:rflag list -> ?rex:regexp -> ?pat:string -> ?pos:int -> ?templ:string -> ?callout:callout -> string -> string

qreplace_first ?iflags ?flags ?rex ?pat ?pos ?templ ?callout subj replaces the first substring of subj matching pattern pat when given, regular expression rex otherwise, starting at position pos with the string templ. Uses flags when given, the precompiled iflags otherwise. Callouts are handled by callout.

  • parameter iflags

    default = no extra flags

  • parameter flags

    default = ignored

  • parameter rex

    default = matches whitespace

  • parameter pat

    default = ignored

  • parameter pos

    default = 0

  • parameter templ

    default = ignored

  • parameter callout

    default = ignore callouts

val substitute_substrings_first : ?iflags:irflag -> ?flags:rflag list -> ?rex:regexp -> ?pat:string -> ?pos:int -> ?callout:callout -> subst:(substrings -> string) -> string -> string

substitute_substrings_first ?iflags ?flags ?rex ?pat ?pos ?callout ~subst subj replaces the first substring of subj matching pattern pat when given, regular expression rex otherwise, starting at position pos with the result of function subst applied to the substrings of the match. Uses flags when given, the precompiled iflags otherwise. Callouts are handled by callout.

  • parameter iflags

    default = no extra flags

  • parameter flags

    default = ignored

  • parameter rex

    default = matches whitespace

  • parameter pat

    default = ignored

  • parameter pos

    default = 0

  • parameter callout

    default = ignore callouts

val substitute_first : ?iflags:irflag -> ?flags:rflag list -> ?rex:regexp -> ?pat:string -> ?pos:int -> ?callout:callout -> subst:(string -> string) -> string -> string

substitute_first ?iflags ?flags ?rex ?pat ?pos ?callout ~subst subj replaces the first substring of subj matching pattern pat when given, regular expression rex otherwise, starting at position pos with the result of function subst applied to the match. Uses flags when given, the precompiled iflags otherwise. Callouts are handled by callout.

  • parameter iflags

    default = no extra flags

  • parameter flags

    default = ignored

  • parameter rex

    default = matches whitespace

  • parameter pat

    default = ignored

  • parameter pos

    default = 0

  • parameter callout

    default = ignore callouts

Splitting

val split : ?iflags:irflag -> ?flags:rflag list -> ?rex:regexp -> ?pat:string -> ?pos:int -> ?max:int -> ?callout:callout -> string -> string list

split ?iflags ?flags ?rex ?pat ?pos ?max ?callout subj splits subj into a list of at most max strings, using as delimiter pattern pat when given, regular expression rex otherwise, starting at position pos. Uses flags when given, the precompiled iflags otherwise. If max is zero, trailing empty fields are stripped. If it is negative, it is treated as arbitrarily large. If neither pat nor rex are specified, leading whitespace will be stripped! Should behave exactly as in PERL. Callouts are handled by callout.

  • parameter iflags

    default = no extra flags

  • parameter flags

    default = ignored

  • parameter rex

    default = matches whitespace

  • parameter pat

    default = ignored

  • parameter pos

    default = 0

  • parameter max

    default = 0

  • parameter callout

    default = ignore callouts

val asplit : ?iflags:irflag -> ?flags:rflag list -> ?rex:regexp -> ?pat:string -> ?pos:int -> ?max:int -> ?callout:callout -> string -> string array

asplit ?iflags ?flags ?rex ?pat ?pos ?max ?callout subj same as Pcre.split but return an array instead of a list.

type split_result =
  1. | Text of string
    (*

    Text part of split string

    *)
  2. | Delim of string
    (*

    Delimiter part of split string

    *)
  3. | Group of int * string
    (*

    Subgroup of matched delimiter (subgroup_nr, subgroup_str)

    *)
  4. | NoGroup
    (*

    Unmatched subgroup

    *)

Result of a Pcre.full_split

val full_split : ?iflags:irflag -> ?flags:rflag list -> ?rex:regexp -> ?pat:string -> ?pos:int -> ?max:int -> ?callout:callout -> string -> split_result list

full_split ?iflags ?flags ?rex ?pat ?pos ?max ?callout subj splits subj into a list of at most max elements of type "split_result", using as delimiter pattern pat when given, regular expression rex otherwise, starting at position pos. Uses flags when given, the precompiled iflags otherwise. If max is zero, trailing empty fields are stripped. If it is negative, it is treated as arbitrarily large. Should behave exactly as in PERL. Callouts are handled by callout.

  • parameter iflags

    default = no extra flags

  • parameter flags

    default = ignored

  • parameter rex

    default = matches whitespace

  • parameter pat

    default = ignored

  • parameter pos

    default = 0

  • parameter max

    default = 0

  • parameter callout

    default = ignore callouts

Additional convenience functions

val foreach_line : ?ic:Stdlib.in_channel -> (string -> unit) -> unit

foreach_line ?ic f applies f to each line in inchannel ic until the end-of-file is reached.

  • parameter ic

    default = stdin

val foreach_file : string list -> (string -> Stdlib.in_channel -> unit) -> unit

foreach_file filenames f opens each file in the list filenames for input and applies f to each filename and the corresponding channel. Channels are closed after each operation (even when exceptions occur - they get reraised afterwards!).

UNSAFE STUFF - USE WITH CAUTION!

val unsafe_pcre_exec : irflag -> regexp -> pos:int -> subj_start:int -> subj:string -> int array -> callout option -> unit

unsafe_pcre_exec flags rex ~pos ~subj_start ~subj offset_vector callout. You should read the C-source to know what happens. If you do not understand it - don't use this function!

val make_ovector : regexp -> int * int array

make_ovector regexp calculates the tuple (subgroups2, ovector) which is the number of subgroup offsets and the offset array.

val unsafe_pcre_dfa_exec : irflag -> regexp -> pos:int -> subj_start:int -> subj:string -> int array -> callout option -> workspace:int array -> unit

unsafe_pcre_dfa_exec flags rex ~pos ~subj_start ~subj offset_vector callout ~workpace. You should read the C-source to know what happens. If you do not understand it - don't use this function!