Find the first match of the regular expression "pattern" in "str", starting at position "index".
If found, returns the start and end position, and any captures as additional results.
If not found, returns nil.
If "plain" is true, the search string is plain text, not a regular expression.
Also see string.match which operates in a similar way, but does not return the start and end positions.
Patterns
The standard patterns you can search for are:
. --- (a dot) represents all characters.
%a --- all letters.
%c --- all control characters.
%d --- all digits.
%l --- all lowercase letters.
%p --- all punctuation characters.
%s --- all space characters.
%u --- all uppercase letters.
%w --- all alphanumeric characters.
%x --- all hexadecimal digits.
%z --- the character with representation 0.
%% --- a single '%' character.
%1 --- captured pattern 1.
%2 --- captured pattern 2 (and so on).
%f[s] transition from not in set 's' to in set 's'.
%b() balanced pair ( ... )
Important - the uppercase versions of the above represent the complement of the class. eg. %U represents everything except uppercase letters, %D represents everything except digits.
There are some "magic characters" (such as %) that have special meanings. These are:
^ $ ( ) % . [ ] * + - ?
If you want to use those in a pattern (as themselves) you must precede them by a % symbol.
eg. %% would match a single %
You can build your own pattern classes by using square brackets, eg.
[abc] ---> matches a, b or c
[a-z] ---> matches lowercase letters (same as %l)
[^abc] ---> matches anything except a, b or c
[%a%d] ---> matches all letters and digits
[%a%d_] ---> matches all letters, digits and underscore
[%[%]] ---> matches square brackets (had to escape them with %)
The repetition characters are:
+ ---> 1 or more repetitions (greedy)
* ---> 0 or more repetitions (greedy)
- ---> 0 or more repetitions (non greedy)
? ---> 0 or 1 repetition only
The standard "anchor" characters apply:
^ ---> anchor to start of subject string
$ ---> anchor to end of subject string
You can also use round brackets to specify "captures":
You see (.*) here
Here, whatever matches (.*) becomes the first pattern.
You can also refer to matched substrings (captures) later on in an expression:
print (string.find ("You see dogs and dogs", "You see (.*) and %1")) --> 1 21 dogs
print (string.find ("You see dogs and cats", "You see (.*) and %1")) --> nil
This example shows how you can look for a repetition of a word matched earlier, whatever that word was ("dogs" in this case).
As a special case, an empty capture string returns as the captured pattern, the position of itself in the string. eg.
print (string.find ("You see dogs and cats", "You .* ()dogs .*")) --> 1 21 9
What this is saying is that the word "dogs" starts at column 9.
Finally you can look for nested "balanced" things (such as parentheses) by using %b, like this:
print (string.find ("I see a (big fish (swimming) in the pond) here",
"%b()")) --> 9 41
After %b you put 2 characters, which indicate the start and end of the balanced pair. If it finds a nested version it keeps processing until we are back at the top level. In this case the matching string was "(big fish (swimming) in the pond)".
Examples of string.find:
print (string.find ("the quick brown fox", "quick")) --> 5 9
print (string.find ("the quick brown fox", "(%a+)")) --> 1 3 the
print (string.find ("the quick brown fox", "(%a+)", 10)) --> 11 15 brown
print (string.find ("the quick brown fox", "fruit")) --> nil