Notice: Any messages purporting to come from this site telling you that your password has expired, or that you need to verify your details, confirm your email, resolve issues, making threats, or asking for money, are
spam. We do not email users with any such messages. If you have lost your password you can obtain a new one by using the
password reset link.
Due to spam on this forum, all posts now need moderator approval.
Entire forum
➜ SMAUG
➜ SMAUG coding
➜ SmaugFuss IMPORTANT Test
It is now over 60 days since the last post. This thread is closed.
Refresh page
Pages: 1
2 3
| Posted by
| David Haley
USA (3,881 posts) Bio
|
| Date
| Reply #15 on Tue 23 Jan 2007 11:29 PM (UTC) Amended on Tue 23 Jan 2007 11:30 PM (UTC) by David Haley
|
| Message
| The descriptor isn't becoming -1. The select call is returning a value smaller than 0 indicating that there was an error.
Without looking too deeply at the network code -- I don't have time now unfortunately -- I suspect that the issue has to do with improper cleanup of descriptors that were closed unexpectedly, which is what is giving you those messages ("EOF encountered on read").
Then, since the sockets are no longer in an acceptable state, when select comes around and accept_new adds all descriptors to the FD sets, select emits the error because some of the FDs given to it were invalid. |
David Haley aka Ksilyan
Head Programmer,
Legends of the Darkstone
http://david.the-haleys.org | | Top |
|
| Posted by
| Matteo2303
(16 posts) Bio
|
| Date
| Reply #16 on Tue 23 Jan 2007 11:35 PM (UTC) |
| Message
| >The select call is returning a value smaller than 0
Yes, sorry, not descriptor. I know that the value is "-1" becaus I used printf( "%d\n\r", etc.. ) for see it.
In any case I think that this problem isn't only "my".
Tnx for your help. In the next days I'll investigate about this your words:
> select emits the error because some of the FDs given to it were invalid.
Bye! | | Top |
|
| Posted by
| David Haley
USA (3,881 posts) Bio
|
| Date
| Reply #17 on Tue 23 Jan 2007 11:45 PM (UTC) |
| Message
| | No, I agree that it doesn't appear to be a problem just with you. The SMAUG network code is pretty shaky -- networking stuff is hard to get right. (It's kind of like the mudprog discussion we've been having here.) So like I said, it doesn't really surprise me that it would perform poorly (i.e. die) under stress. |
David Haley aka Ksilyan
Head Programmer,
Legends of the Darkstone
http://david.the-haleys.org | | Top |
|
| Posted by
| Nick Gammon
Australia (23,173 posts) Bio
Forum Administrator |
| Date
| Reply #18 on Wed 24 Jan 2007 06:00 AM (UTC) |
| Message
|
Quote:
if ( select( maxdesc+1, &in_set, &out_set, &exc_set, &null_time ) < 0 )
{
perror( "accept_new: select: poll" );
exit( 1 );
}
...but how I can prevent this?
You could simply omit the exit line, and just ignore problems if select is overloaded by multiple frequent connections. This would stop the MUD from exiting, and may only happen occasionally. |
- Nick Gammon
www.gammon.com.au, www.mushclient.com | | Top |
|
| Posted by
| David Haley
USA (3,881 posts) Bio
|
| Date
| Reply #19 on Wed 24 Jan 2007 07:42 AM (UTC) |
| Message
| I suspect that might cause trouble. The problem is that one of the descriptors has been invalidated and is kept around in memory. If it gets left there, it might cause all kinds of trouble further down the road.
Still, it's worth a shot, I suppose... sometimes it's good to live dangerously. :-) |
David Haley aka Ksilyan
Head Programmer,
Legends of the Darkstone
http://david.the-haleys.org | | Top |
|
| Posted by
| Matteo2303
(16 posts) Bio
|
| Date
| Reply #20 on Wed 24 Jan 2007 07:55 AM (UTC) |
| Message
| >You could simply omit the exit line, and just ignore problems
No, make "//exit(1);" is a bad thing!
All existing descriptor "resets/restarts"; apparently is ok but it's a panic things: belive me, I had try :( | | Top |
|
| Posted by
| Samson
USA (683 posts) Bio
|
| Date
| Reply #21 on Wed 24 Jan 2007 09:25 PM (UTC) |
| Message
| | What if instead of exit(1) you tried changing it to a continue statement? That in theory would skip it on that pass and come back for it later. Maybe the OS just needs a bit of time to clean up the mess. | | Top |
|
| Posted by
| Matteo2303
(16 posts) Bio
|
| Date
| Reply #22 on Wed 24 Jan 2007 09:44 PM (UTC) |
| Message
| >What if instead of exit(1) you tried changing it to a continue statement?
Mmm, "continue"? Where?
"Select" istance isn't inside a loop. | | Top |
|
| Posted by
| Matteo2303
(16 posts) Bio
|
| Date
| Reply #23 on Wed 24 Jan 2007 09:59 PM (UTC) |
| Message
| Ok, I had test using a GOTO istance (lol) and reproduce a continue effect. This freze the mud. Also, if I use a "return" instead of "exit(1)" when "select" istance is < 0, then all descriptors resets (player disconnected, etc...). This is not the right way.
Bye | | Top |
|
| Posted by
| Samson
USA (683 posts) Bio
|
| Date
| Reply #24 on Thu 25 Jan 2007 02:35 AM (UTC) |
| Message
| Right, ok. So I didn't notice this was happening in accept_new before. Scratch the continue idea.
With the situation you have you're probably out of luck without some major work on the networking code. | | Top |
|
| Posted by
| Gohan_TheDragonball
USA (183 posts) Bio
|
| Date
| Reply #25 on Thu 25 Jan 2007 06:50 AM (UTC) |
| Message
| | i am not as advanced as some of you, but i wanted to put this out there, instead of exiting couldn't you just deny the connection, and let the client connecting know to wait a minute and retry. | | Top |
|
| Posted by
| David Haley
USA (3,881 posts) Bio
|
| Date
| Reply #26 on Thu 25 Jan 2007 09:18 AM (UTC) |
| Message
| Unfortunately no; what's going on here is that somehow SMAUG's internal picture of the network state has gotten messed up and it thinks some connections are open that really aren't. Then it tries to poll on those connections and fails, which is what is causing select to generate an error.
What might be possible would be to loop over all known descriptors and test each one with select individually until you find the one that returns an error, and then completely dump it. You would only do this after select returned an error, of course. This is not an ideal solution -- the real fix would be to not get into this situation in the first place -- but it's better than nothing. |
David Haley aka Ksilyan
Head Programmer,
Legends of the Darkstone
http://david.the-haleys.org | | Top |
|
| Posted by
| Matteo2303
(16 posts) Bio
|
| Date
| Reply #27 on Thu 25 Jan 2007 02:17 PM (UTC) |
| Message
| >With the situation you have you're probably out of luck without some major work on the networking code.
Not "I have", but "we have". SmaugFuss under Cygwing Windows crashes in this situation. A DoS attacks generate a crash in a minute. | | Top |
|
| Posted by
| Nick Gammon
Australia (23,173 posts) Bio
Forum Administrator |
| Date
| Reply #28 on Fri 26 Jan 2007 01:37 AM (UTC) |
| Message
| Does this crash only happen under Cygwin? It wouldn't totally surprise me, that it is only unstable under Windows, and most people running production MUDs would be using Unix.
I once wrote a similar piece of code, to test multiple simultaneous connections to PennMUSH. It worked fine under Unix, but degraded rapidly under Windows. And that was using identical code.
I suspect this is an instability in the Windows Cygwin implementation of sockets, or even the underlying operating system support. |
- Nick Gammon
www.gammon.com.au, www.mushclient.com | | Top |
|
| Posted by
| Samson
USA (683 posts) Bio
|
| Date
| Reply #29 on Fri 26 Jan 2007 02:43 AM (UTC) |
| Message
| I would tend to agree with Nick's assessment since I've stress tested my own code with someone deliberately attacking it with a miniature bot-net. The worst that happened was that it simply stopped accepting new connections until some of the existing ones dropped off. Tested on linux of course.
I don't really see much value in protecting against this in Windows for the very reason that Cygwin isn't suited for a production environment. | | Top |
|
The dates and times for posts above are shown in Universal Co-ordinated Time (UTC).
To show them in your local time you can join the forum, and then set the 'time correction' field in your profile to the number of hours difference between your location and UTC time.
105,882 views.
This is page 2, subject is 3 pages long:
1
2 3
It is now over 60 days since the last post. This thread is closed.
Refresh page
top