PcrePerl Compatibility Regular Expressions for OCaml
8.0.2 - homepage
type error = | PartialString only matched the pattern partially
*)| BadPartialPattern contains items that cannot be used together with partial matching.
*)| BadPattern of string * intBadPattern (msg, pos) regular expression is malformed. The reason is in msg, the position of the error in the pattern in pos.
| BadUTF8UTF8 string being matched is invalid
*)| BadUTF8OffsetGets raised when a UTF8 string being matched with offset is invalid.
*)| MatchLimitMaximum allowed number of match attempts with backtracking or recursion is reached during matching. ALL FUNCTIONS CALLING THE MATCHING ENGINE MAY RAISE IT!!!
*)| RecursionLimit| WorkspaceSizeRaised by pcre_dfa_exec when the provided workspace array is too small. See documention on pcre_dfa_exec for details on workspace array sizing.
| InternalError of stringInternalError msg C-library exhibits unknown/undefined behaviour. The reason is in msg.
exception Error of errorException indicating PCRE errors.
exception Regexp_or of string * errorRegexp_or (pat, error) gets raised for sub-pattern pat by regexp_or if it failed to compile.
and cflag = [ | `CASELESSCase insensitive matching
*)| `MULTILINE'^' and '$' match before/after newlines, not just at the beginning/end of a string
*)| `DOTALL'.' matches all characters (newlines, too)
*)| `EXTENDEDIgnores whitespace and PERL-comments. Behaves like the '/x'-option in PERL
*)| `ANCHOREDPattern matches only at start of string
*)| `DOLLAR_ENDONLY'$' in pattern matches only at end of string
*)| `EXTRAReserved for future extensions of PCRE
*)| `UNGREEDYQuantifiers not greedy anymore, only if followed by '?'
*)| `UTF8Treats patterns and strings as UTF8 characters.
*)| `NO_UTF8_CHECKTurns off validity checks on UTF8 strings for efficiency reasons. WARNING: invalid UTF8 strings may cause a crash then!
*)| `NO_AUTO_CAPTUREDisables the use of numbered capturing parentheses
*)| `AUTO_CALLOUTAutomatically inserts callouts with id 255 before each pattern item
*)| `FIRSTLINEUnanchored patterns must match before/at first NL
*) ]Compilation flags
cflags cflag_list converts a list of compilation flags to their internal representation.
cflag_list cflags converts internal representation of compilation flags to a list.
type rflag = [ | `ANCHOREDTreats pattern as if it were anchored
*)| `NOTBOLBeginning of string is not treated as beginning of line
*)| `NOTEOLEnd of string is not treated as end of line
*)| `NOTEMPTYEmpty strings are not considered to be a valid match
*)| `PARTIALTurns on partial matching
*)| `DFA_RESTARTCauses matching to proceed presuming the subject string is further to one partially matched previously using the same int-array working set. May only be used with pcre_dfa_exec or unsafe_pcre_dfa_exec, and should always be paired with `PARTIAL.
]Runtime flags
rflags rflag_list converts a list of runtime flags to their internal representation.
rflag_list rflags converts internal representation of runtime flags to a list.
Default limit recursion for calls to internal matching function
type firstbyte_info = [ | `Char of charFixed first character
*)| `Start_onlyPattern matches at beginning and end of newlines
*)| `ANCHOREDPattern is anchored
*) ]Information on matching of "first chars" in patterns
type study_stat = [ | `Not_studiedPattern has not yet been studied
*)| `StudiedPattern has been studied successfully
*)| `OptimalPattern could not be improved by studying
*) ]Information on the study status of patterns
val size : regexp -> intsize regexp
val studysize : regexp -> intstudysize regexp
val capturecount : regexp -> intcapturecount regexp
val backrefmax : regexp -> intbackrefmax regexp
val namecount : regexp -> intnamecount regexp
val nameentrysize : regexp -> intnameentrysize regexp
val names : regexp -> string arraynames regex
val firstbyte : regexp -> firstbyte_infofirstbyte regexp
val firsttable : regexp -> string optionfirsttable regexp
val lastliteral : regexp -> char optionlastliteral regexp
val study_stat : regexp -> study_statstudy_stat regexp
val get_stringnumber : regexp -> string -> intget_stringnumber rex name
val get_match_limit : regexp -> int optionget_match_limit rex
val get_match_limit_recursion : regexp -> int optionget_match_limit_recursion rex
val maketables : unit -> chtablesGenerates new set of char tables for the current locale.
val regexp :
?study:bool ->
?jit_compile:bool ->
?limit:int ->
?limit_recursion:int ->
?iflags:icflag ->
?flags:cflag list ->
?chtables:chtables ->
string ->
regexpregexp ?jit_compile ?study ?limit ?limit_recursion ?iflags ?flags ?chtables
pattern compiles pattern with flags when given, with iflags otherwise, and with char tables chtables. If study is true, then the resulting regular expression will be studied. If jit_compile is true, studying will also perform JIT-compilation of the pattern. If limit is specified, this sets a limit to the amount of recursion and backtracking (only lower than the builtin default!). If this limit is exceeded, MatchLimit will be raised during matching.
val regexp_or :
?study:bool ->
?jit_compile:bool ->
?limit:int ->
?limit_recursion:int ->
?iflags:icflag ->
?flags:cflag list ->
?chtables:chtables ->
string list ->
regexpregexp_or ?study ?limit ?limit_recursion ?iflags ?flags ?chtables patterns like regexp, but combines patterns as alternatives (or-patterns) into one regular expression.
val get_subject : substrings -> stringget_subject substrings
val num_of_subs : substrings -> intnum_of_subs substrings
val get_substring : substrings -> int -> stringget_substring substrings n
val get_substring_ofs : substrings -> int -> int * intget_substring_ofs substrings n
val get_substrings : ?full_match:bool -> substrings -> string arrayget_substrings ?full_match substrings
val get_opt_substrings : ?full_match:bool -> substrings -> string option arrayget_opt_substrings ?full_match substrings
val get_named_substring : regexp -> string -> substrings -> stringget_named_substring rex name substrings
val get_named_substring_ofs : regexp -> string -> substrings -> int * intget_named_substring_ofs rex name substrings
type callout_data = {callout_number : int;Callout number
*)substrings : substrings;Substrings matched so far
*)start_match : int;Subject start offset of current match attempt
*)current_position : int;Subject offset of current match pointer
*)capture_top : int;Number of the highest captured substring so far
*)capture_last : int;Number of the most recently captured substring
*)pattern_position : int;Offset of next match item in pattern string
*)next_item_length : int;Length of next match item in pattern string
*)}type callout = callout_data -> unitType of callout functions
Callouts are referred to in patterns as "(?Cn)" where "n" is a callout_number ranging from 0 to 255. Substrings captured so far are accessible as usual via substrings. You will have to consider capture_top and capture_last to know about the current state of valid substrings.
By raising exception Backtrack within a callout function, the user can force the pattern matching engine to backtrack to other possible solutions. Other exceptions will terminate matching immediately and return control to OCaml.
val pcre_exec :
?iflags:irflag ->
?flags:rflag list ->
?rex:regexp ->
?pat:string ->
?pos:int ->
?callout:callout ->
string ->
int arraypcre_exec ?iflags ?flags ?rex ?pat ?pos ?callout subj
val pcre_dfa_exec :
?iflags:irflag ->
?flags:rflag list ->
?rex:regexp ->
?pat:string ->
?pos:int ->
?callout:callout ->
?workspace:int array ->
string ->
int arraypcre_dfa_exec ?iflags ?flags ?rex ?pat ?pos ?callout ?workspace subj invokes the "alternative" DFA matching function.
val exec :
?iflags:irflag ->
?flags:rflag list ->
?rex:regexp ->
?pat:string ->
?pos:int ->
?callout:callout ->
string ->
substringsexec ?iflags ?flags ?rex ?pat ?pos ?callout subj
val exec_all :
?iflags:irflag ->
?flags:rflag list ->
?rex:regexp ->
?pat:string ->
?pos:int ->
?callout:callout ->
string ->
substrings arrayexec_all ?iflags ?flags ?rex ?pat ?pos ?callout subj
val next_match :
?iflags:irflag ->
?flags:rflag list ->
?rex:regexp ->
?pat:string ->
?pos:int ->
?callout:callout ->
substrings ->
substringsnext_match ?iflags ?flags ?rex ?pat ?pos ?callout substrs
val extract :
?iflags:irflag ->
?flags:rflag list ->
?rex:regexp ->
?pat:string ->
?pos:int ->
?full_match:bool ->
?callout:callout ->
string ->
string arrayextract ?iflags ?flags ?rex ?pat ?pos ?full_match ?callout subj
val extract_opt :
?iflags:irflag ->
?flags:rflag list ->
?rex:regexp ->
?pat:string ->
?pos:int ->
?full_match:bool ->
?callout:callout ->
string ->
string option arrayextract_opt ?iflags ?flags ?rex ?pat ?pos ?full_match ?callout subj
val extract_all :
?iflags:irflag ->
?flags:rflag list ->
?rex:regexp ->
?pat:string ->
?pos:int ->
?full_match:bool ->
?callout:callout ->
string ->
string array arrayextract_all ?iflags ?flags ?rex ?pat ?pos ?full_match ?callout subj
val extract_all_opt :
?iflags:irflag ->
?flags:rflag list ->
?rex:regexp ->
?pat:string ->
?pos:int ->
?full_match:bool ->
?callout:callout ->
string ->
string option array arrayextract_all_opt ?iflags ?flags ?rex ?pat ?pos ?full_match ?callout subj
val pmatch :
?iflags:irflag ->
?flags:rflag list ->
?rex:regexp ->
?pat:string ->
?pos:int ->
?callout:callout ->
string ->
boolpmatch ?iflags ?flags ?rex ?pat ?pos ?callout subj
val subst : string -> substitutionsubst str converts the string str representing a substitution pattern to the internal representation
The contents of the substitution string str can be normal text mixed with any of the following (mostly as in PERL):
0-9+" from an immediately following other number.val replace :
?iflags:irflag ->
?flags:rflag list ->
?rex:regexp ->
?pat:string ->
?pos:int ->
?itempl:substitution ->
?templ:string ->
?callout:callout ->
string ->
stringreplace ?iflags ?flags ?rex ?pat ?pos ?itempl ?templ ?callout subj replaces all substrings of subj matching pattern pat when given, regular expression rex otherwise, starting at position pos with the substitution string templ when given, itempl otherwise. Uses flags when given, the precompiled iflags otherwise. Callouts are handled by callout.
val qreplace :
?iflags:irflag ->
?flags:rflag list ->
?rex:regexp ->
?pat:string ->
?pos:int ->
?templ:string ->
?callout:callout ->
string ->
stringqreplace ?iflags ?flags ?rex ?pat ?pos ?templ ?callout subj replaces all substrings of subj matching pattern pat when given, regular expression rex otherwise, starting at position pos with the string templ. Uses flags when given, the precompiled iflags otherwise. Callouts are handled by callout.
val substitute_substrings :
?iflags:irflag ->
?flags:rflag list ->
?rex:regexp ->
?pat:string ->
?pos:int ->
?callout:callout ->
subst:(substrings -> string) ->
string ->
stringsubstitute_substrings ?iflags ?flags ?rex ?pat ?pos ?callout ~subst subj replaces all substrings of subj matching pattern pat when given, regular expression rex otherwise, starting at position pos with the result of function subst applied to the substrings of the match. Uses flags when given, the precompiled iflags otherwise. Callouts are handled by callout.
val substitute :
?iflags:irflag ->
?flags:rflag list ->
?rex:regexp ->
?pat:string ->
?pos:int ->
?callout:callout ->
subst:(string -> string) ->
string ->
stringsubstitute ?iflags ?flags ?rex ?pat ?pos ?callout ~subst subj replaces all substrings of subj matching pattern pat when given, regular expression rex otherwise, starting at position pos with the result of function subst applied to the match. Uses flags when given, the precompiled iflags otherwise. Callouts are handled by callout.
val replace_first :
?iflags:irflag ->
?flags:rflag list ->
?rex:regexp ->
?pat:string ->
?pos:int ->
?itempl:substitution ->
?templ:string ->
?callout:callout ->
string ->
stringreplace_first ?iflags ?flags ?rex ?pat ?pos ?itempl ?templ ?callout subj replaces the first substring of subj matching pattern pat when given, regular expression rex otherwise, starting at position pos with the substitution string templ when given, itempl otherwise. Uses flags when given, the precompiled iflags otherwise. Callouts are handled by callout.
val qreplace_first :
?iflags:irflag ->
?flags:rflag list ->
?rex:regexp ->
?pat:string ->
?pos:int ->
?templ:string ->
?callout:callout ->
string ->
stringqreplace_first ?iflags ?flags ?rex ?pat ?pos ?templ ?callout subj replaces the first substring of subj matching pattern pat when given, regular expression rex otherwise, starting at position pos with the string templ. Uses flags when given, the precompiled iflags otherwise. Callouts are handled by callout.
val substitute_substrings_first :
?iflags:irflag ->
?flags:rflag list ->
?rex:regexp ->
?pat:string ->
?pos:int ->
?callout:callout ->
subst:(substrings -> string) ->
string ->
stringsubstitute_substrings_first ?iflags ?flags ?rex ?pat ?pos ?callout ~subst
subj replaces the first substring of subj matching pattern pat when given, regular expression rex otherwise, starting at position pos with the result of function subst applied to the substrings of the match. Uses flags when given, the precompiled iflags otherwise. Callouts are handled by callout.
val substitute_first :
?iflags:irflag ->
?flags:rflag list ->
?rex:regexp ->
?pat:string ->
?pos:int ->
?callout:callout ->
subst:(string -> string) ->
string ->
stringsubstitute_first ?iflags ?flags ?rex ?pat ?pos ?callout ~subst subj replaces the first substring of subj matching pattern pat when given, regular expression rex otherwise, starting at position pos with the result of function subst applied to the match. Uses flags when given, the precompiled iflags otherwise. Callouts are handled by callout.
val split :
?iflags:irflag ->
?flags:rflag list ->
?rex:regexp ->
?pat:string ->
?pos:int ->
?max:int ->
?callout:callout ->
string ->
string listsplit ?iflags ?flags ?rex ?pat ?pos ?max ?callout subj splits subj into a list of at most max strings, using as delimiter pattern pat when given, regular expression rex otherwise, starting at position pos. Uses flags when given, the precompiled iflags otherwise. If max is zero, trailing empty fields are stripped. If it is negative, it is treated as arbitrarily large. If neither pat nor rex are specified, leading whitespace will be stripped! Should behave exactly as in PERL. Callouts are handled by callout.
val asplit :
?iflags:irflag ->
?flags:rflag list ->
?rex:regexp ->
?pat:string ->
?pos:int ->
?max:int ->
?callout:callout ->
string ->
string arrayasplit ?iflags ?flags ?rex ?pat ?pos ?max ?callout subj same as Pcre.split but return an array instead of a list.
Result of a Pcre.full_split
val full_split :
?iflags:irflag ->
?flags:rflag list ->
?rex:regexp ->
?pat:string ->
?pos:int ->
?max:int ->
?callout:callout ->
string ->
split_result listfull_split ?iflags ?flags ?rex ?pat ?pos ?max ?callout subj splits subj into a list of at most max elements of type "split_result", using as delimiter pattern pat when given, regular expression rex otherwise, starting at position pos. Uses flags when given, the precompiled iflags otherwise. If max is zero, trailing empty fields are stripped. If it is negative, it is treated as arbitrarily large. Should behave exactly as in PERL. Callouts are handled by callout.
foreach_line ?ic f applies f to each line in inchannel ic until the end-of-file is reached.
foreach_file filenames f opens each file in the list filenames for input and applies f to each filename and the corresponding channel. Channels are closed after each operation (even when exceptions occur - they get reraised afterwards!).
val unsafe_pcre_exec :
irflag ->
regexp ->
pos:int ->
subj_start:int ->
subj:string ->
int array ->
callout option ->
unitunsafe_pcre_exec flags rex ~pos ~subj_start ~subj offset_vector callout. You should read the C-source to know what happens. If you do not understand it - don't use this function!
val make_ovector : regexp -> int * int arraymake_ovector regexp calculates the tuple (subgroups2, ovector) which is the number of subgroup offsets and the offset array.
val unsafe_pcre_dfa_exec :
irflag ->
regexp ->
pos:int ->
subj_start:int ->
subj:string ->
int array ->
callout option ->
workspace:int array ->
unitunsafe_pcre_dfa_exec flags rex ~pos ~subj_start ~subj offset_vector callout
~workpace. You should read the C-source to know what happens. If you do not understand it - don't use this function!