Notice: Any messages purporting to come from this site telling you that your password has expired, or that you need to verify your details, confirm your email, resolve issues, making threats, or asking for money, are
spam. We do not email users with any such messages. If you have lost your password you can obtain a new one by using the
password reset link.
Due to spam on this forum, all posts now need moderator approval.
Entire forum
➜ Programming
➜ General
➜ lpeg functions
It is now over 60 days since the last post. This thread is closed.
Refresh page
Pages: 1 2
Posted by
| Albert Chan
(55 posts) Bio
|
Date
| Mon 29 Jan 2018 02:04 AM (UTC) |
Message
| lpeg manual mentioned P(true) and P(false). why are these needed ?
x true = x
true x = x
so P(true) will always be optimized away.
At first, I thought P(false) is to retain captures, then switched to alternative.
(lpeg codes A with captures ...) * P(false) + (lpeg codes B ...)
But it does not. All the captures are gone.
Above just optimized to (lpeb codes B ...)
The only reason lpeg does not do this optimization is due to its
match-time-capture function.
I cannot think of 1 case where P(boolean) is needed.
May that were the reason lpeg re doesn't have them as variables. | Top |
|
Posted by
| Nick Gammon
Australia (23,158 posts) Bio
Forum Administrator |
Date
| Reply #1 on Mon 29 Jan 2018 03:56 AM (UTC) Amended on Mon 29 Jan 2018 03:57 AM (UTC) by Nick Gammon
|
Message
| The only way I can explain that is from the LPEG documentation:
Quote:
patt1 * patt2
Returns a pattern that matches patt1 and then matches patt2, starting where patt1 finished. The identity element for this operation is the pattern lpeg.P(true), which always succeeds.
Note the reference to "the identity element for this operation". With that as a clue, and looking up identity elements, eg.
Algebraic patterns — Identity element
It appears that, useful or not, lpeg.P(true) and lpeg.P(false) are useful in situation where you may (sometimes) want to make something that always succeeds. Perhaps you are writing a function to combine THIS and THAT, and sometimes THAT needs to always match. Being able to return lpeg.P (true) gives you a useful pattern that you can use in that case. |
- Nick Gammon
www.gammon.com.au, www.mushclient.com | Top |
|
Posted by
| Nick Gammon
Australia (23,158 posts) Bio
Forum Administrator |
Date
| Reply #2 on Mon 29 Jan 2018 04:11 AM (UTC) |
Message
| As an analogy, say we had this Lua code:
Now you might say that multiplying by one (the identity operation) is not particularly useful. However what about:
Where 'foo' is a function that returns a number, and sometimes that number is 1.
Now it is useful for the writer of the function 'foo' to have a way of expressing "identity operation" (ie. do nothing, or "always succeed").
Conversely you might it want to fail in which case you return zero. And the equivalent to that in LPEG would be lpeg.P (false). |
- Nick Gammon
www.gammon.com.au, www.mushclient.com | Top |
|
Posted by
| Nick Gammon
Australia (23,158 posts) Bio
Forum Administrator |
Date
| Reply #3 on Mon 29 Jan 2018 04:22 AM (UTC) |
Message
| Maybe this is a good example, I'm not sure.
I want to match "foo" or "bar" or nothing at all followed by "nick". So:
lpeg.match (C ((P"foo" + P"bar" + P(true))) * "nick", "foonick")
That matches foo OR bar OR (true) and then checks for nick.
Results:
foonick --> captures "foo"
barnick --> captures "bar"
nick --> captures ""
others --> fail
|
- Nick Gammon
www.gammon.com.au, www.mushclient.com | Top |
|
Posted by
| Albert Chan
(55 posts) Bio
|
Date
| Reply #4 on Mon 29 Jan 2018 10:48 AM (UTC) Amended on Mon 05 Feb 2018 10:46 PM (UTC) by Albert Chan
|
Message
| since above pattern, literally translated to lpeg re:
lpeg.re do have lpeg.P(true) as variable ... it is ''
P(true) will be optimized away anyway, so above is same as
To push my understanding a bit furthur ...
expr ^ 0 == P { expr * V(1) + P(true) }
However, with current lpeg implementation, RHS is less efficient
Tried push even furthur to represent P(false) as lpeg re !''
The pcode are different, but they are equivalent, since not true == false | Top |
|
Posted by
| Albert Chan
(55 posts) Bio
|
Date
| Reply #5 on Mon 29 Jan 2018 12:22 PM (UTC) Amended on Mon 29 Jan 2018 01:08 PM (UTC) by Albert Chan
|
Message
| Just wanted to share how to use lpeg.P(function)
this was one of my lpeg re pattern for regex "(.*)and(.*)", using =>
pat = re.compile(
"(g <- 'and' / .g)+ => split",
{split = function(s, i) return s:sub(1, i-4), s:sub(i) end}
)
Redo above using lpeg.P(function). Just search and replace "=> " to "%"
pat = re.compile(
"(g <- 'and' / .g)+ %split",
{split = function(s, i) return s:sub(1, i-4), s:sub(i) end}
)
The split function will implicitly cast as lpeg.P object. Although unnecessary, we can rewrite split explicitly
{split = lpeg.P( function(s, i) return s:sub(1, i-4), s:sub(i) end )}
| Top |
|
Posted by
| Albert Chan
(55 posts) Bio
|
Date
| Reply #6 on Tue 30 Jan 2018 01:28 PM (UTC) Amended on Wed 31 Jan 2018 03:01 AM (UTC) by Albert Chan
|
Message
| Found a real use for P(true) in re.lua
local function mult (p, n)
local np = mm.P(true)
while n >= 1 do
if n%2 >= 1 then np = np * p end
p = p * p
n = n/2
end
return np
end
So, mult(p, 0) return P(true), which will optimized away
function name mult is misleading, it should be named re_pow
P.S.
I recently patched lpeg.B so it can move back from current position (upto 255 bytes)
%b = lpeg.B(-1) = move back 1 byte
To optimize %b ^ n to lpeg.B(-n), all I need is 1 line above P(true)
if p == Predef.b then return mm.B(-n) end
| Top |
|
Posted by
| Albert Chan
(55 posts) Bio
|
Date
| Reply #7 on Tue 30 Jan 2018 03:00 PM (UTC) |
Message
| snippet from re.lua:
local m = require"lpeg"
-- 'm' will be used to parse expressions, and 'mm' will be used to
-- create expressions; that is, 're' runs on 'm', creating patterns
-- on 'mm'
local mm = m
why the need for both m and mm ? | Top |
|
Posted by
| Albert Chan
(55 posts) Bio
|
Date
| Reply #8 on Tue 30 Jan 2018 06:52 PM (UTC) |
Message
| Tried to understand meaning of lpeg.B(-patt) (hint, it is not -lpeg.B(patt))
... it is lpeg.P(-patt) !
lpeg.B(-patt) does nothing, except to make its argument a lpeg object
Worse, this is just based on my informal tests, it might do something nasty !
The best I can say, lpeg.B(-patt) is undefined | Top |
|
Posted by
| Nick Gammon
Australia (23,158 posts) Bio
Forum Administrator |
Date
| Reply #9 on Tue 30 Jan 2018 10:15 PM (UTC) |
Message
| Not sure. Both of these match:
print (lpeg.match (P(3) * B(-P('foo')) * 'bar', 'foobar')) --> 7
print (lpeg.match (P(3) * B( P('foo')) * 'bar', 'foobar')) --> 7
|
- Nick Gammon
www.gammon.com.au, www.mushclient.com | Top |
|
Posted by
| Nick Gammon
Australia (23,158 posts) Bio
Forum Administrator |
Date
| Reply #10 on Tue 30 Jan 2018 10:18 PM (UTC) |
Message
|
Albert Chan said:
why the need for both m and mm ?
Not sure, unless it is for documentation. |
- Nick Gammon
www.gammon.com.au, www.mushclient.com | Top |
|
Posted by
| Albert Chan
(55 posts) Bio
|
Date
| Reply #11 on Tue 30 Jan 2018 11:31 PM (UTC) Amended on Tue 30 Jan 2018 11:58 PM (UTC) by Albert Chan
|
Message
| both of your examples confirm my finding: B(-patt) = P(-patt), which is silly.
foo, bar, foobar, = P'foo', P'bar', P'foobar'
-- B match back characters: (foo == "foo")
= (3 * B(foo) * bar):match "foobar" => 7
-- B is actually P, matching: (foo ~= "bar")
= (3 * P(-foo) * bar):match "foobar" => 7
= (3 * B(-foo) * bar):match "foobar" => 7
both return the same position, but doing different things ... try these:
= (3 * B( foobar) * bar):match "foobar" => nil
= (3 * B(-foobar) * bar):match "foobar" => 7
-> these are undefined behavior, can change in the future
-> NEVER use negation argument for lpeg.B | Top |
|
Posted by
| Albert Chan
(55 posts) Bio
|
Date
| Reply #12 on Wed 31 Jan 2018 12:10 AM (UTC) Amended on Wed 31 Jan 2018 01:50 AM (UTC) by Albert Chan
|
Message
| find a bug in re.lua
pat = re.compile "[^a]^2" -- ok
pat = re.compile "[a]^2" -- bad
.\re.lua: attempt to perform arithmetic on local 'p' (a string value)
| Top |
|
Posted by
| Albert Chan
(55 posts) Bio
|
Date
| Reply #13 on Wed 31 Jan 2018 02:15 AM (UTC) Amended on Wed 31 Jan 2018 11:31 PM (UTC) by Albert Chan
|
Message
| Above class range bug turns out to be an example of using P(false) !
There are many ways to fix the bug:
1. P(false) way
make sure more than 1 items -> P(false) + item = P(item)
-> so add a m.Cc(false) * in front of item
-> this guaranteed folding run at least once.
2. complement (likely more efficient)
since complement function always run, return c == '^' and any - p or m.P(p)
-> this also guaranteed class range is lpeg object
3. patch mult(p, n) (somewhat patchy ...)
all other suffixes ^+n ^-n ? * + use lpeg __pow function, which convert p to lpeg object implicitly.
-> add 1 line in mult(p,n) : p = m.P(p) | Top |
|
Posted by
| Nick Gammon
Australia (23,158 posts) Bio
Forum Administrator |
Date
| Reply #14 on Wed 31 Jan 2018 04:03 AM (UTC) |
Message
|
Albert Chan said:
The best I can say, lpeg.B(-patt) is undefined
I think I can explain what is happening, possibly it's a bug.
Try this:
print (lpeg.match (P(3) * B( -P('foo')) * 3, 'foobar')) --> 7
print (lpeg.match (P(3) * B( -P('foo')) * 3, 'foofoo')) --> nil
In other words, the B(-P'foo') contains an assertion (not a match) (B(P'foo') would be a match) and since it doesn't consume any characters the lpeg.B does not move backwards any characters.
If you print the tree for:
You get:
seq
any
seq
any
seq
any
seq
behind 3
seq
char 'f'
seq
char 'o'
char 'o'
seq
any
seq
any
any
You will notice that the lpeg.B generates "behind 3" because it knows it has to go back the length of 'foo' (and then try matching).
However this:
Generates:
seq
any
seq
any
seq
any
seq
behind 0
not
seq
char 'f'
seq
char 'o'
char 'o'
seq
any
seq
any
any
Now it generates "behind 0" (so it is matching the current position, not 3 back) and checking if that equals 'foo' (and then negating the result).
I'm not sure if this is a bug or not. You could argue that an assertion like -P'foo' should not backtrack because it doesn't move the position of the match pointer.
Judging by the generated opcodes, "behind 3" just moves the current pointer backwards, and as the documentation says:
Quote:
Pattern patt must match only strings with some fixed length ...
That means that, if it matches, then the pointer is guaranteed to be back where it started (by consuming input). And since assertions don't consume input, the pointer is not moved backwards. |
- Nick Gammon
www.gammon.com.au, www.mushclient.com | Top |
|
The dates and times for posts above are shown in Universal Co-ordinated Time (UTC).
To show them in your local time you can join the forum, and then set the 'time correction' field in your profile to the number of hours difference between your location and UTC time.
71,925 views.
This is page 1, subject is 2 pages long: 1 2
It is now over 60 days since the last post. This thread is closed.
Refresh page
top