Register forum user name Search FAQ

Gammon Forum

Notice: Any messages purporting to come from this site telling you that your password has expired, or that you need to verify your details, confirm your email, resolve issues, making threats, or asking for money, are spam. We do not email users with any such messages. If you have lost your password you can obtain a new one by using the password reset link.

Due to spam on this forum, all posts now need moderator approval.

 Entire forum ➜ SMAUG ➜ SMAUG coding ➜ SmaugFuss IMPORTANT Test

SmaugFuss IMPORTANT Test

It is now over 60 days since the last post. This thread is closed.     Refresh page


Pages: 1  2 3  

Posted by David Haley   USA  (3,881 posts)  Bio
Date Reply #15 on Tue 23 Jan 2007 11:29 PM (UTC)

Amended on Tue 23 Jan 2007 11:30 PM (UTC) by David Haley

Message
The descriptor isn't becoming -1. The select call is returning a value smaller than 0 indicating that there was an error.

Without looking too deeply at the network code -- I don't have time now unfortunately -- I suspect that the issue has to do with improper cleanup of descriptors that were closed unexpectedly, which is what is giving you those messages ("EOF encountered on read").

Then, since the sockets are no longer in an acceptable state, when select comes around and accept_new adds all descriptors to the FD sets, select emits the error because some of the FDs given to it were invalid.

David Haley aka Ksilyan
Head Programmer,
Legends of the Darkstone

http://david.the-haleys.org
Top

Posted by Matteo2303   (16 posts)  Bio
Date Reply #16 on Tue 23 Jan 2007 11:35 PM (UTC)
Message
>The select call is returning a value smaller than 0

Yes, sorry, not descriptor. I know that the value is "-1" becaus I used printf( "%d\n\r", etc.. ) for see it.

In any case I think that this problem isn't only "my".
Tnx for your help. In the next days I'll investigate about this your words:

> select emits the error because some of the FDs given to it were invalid.

Bye!
Top

Posted by David Haley   USA  (3,881 posts)  Bio
Date Reply #17 on Tue 23 Jan 2007 11:45 PM (UTC)
Message
No, I agree that it doesn't appear to be a problem just with you. The SMAUG network code is pretty shaky -- networking stuff is hard to get right. (It's kind of like the mudprog discussion we've been having here.) So like I said, it doesn't really surprise me that it would perform poorly (i.e. die) under stress.

David Haley aka Ksilyan
Head Programmer,
Legends of the Darkstone

http://david.the-haleys.org
Top

Posted by Nick Gammon   Australia  (23,173 posts)  Bio   Forum Administrator
Date Reply #18 on Wed 24 Jan 2007 06:00 AM (UTC)
Message
Quote:

if ( select( maxdesc+1, &in_set, &out_set, &exc_set, &null_time ) < 0 )
{
perror( "accept_new: select: poll" );
exit( 1 );
}

...but how I can prevent this?


You could simply omit the exit line, and just ignore problems if select is overloaded by multiple frequent connections. This would stop the MUD from exiting, and may only happen occasionally.

- Nick Gammon

www.gammon.com.au, www.mushclient.com
Top

Posted by David Haley   USA  (3,881 posts)  Bio
Date Reply #19 on Wed 24 Jan 2007 07:42 AM (UTC)
Message
I suspect that might cause trouble. The problem is that one of the descriptors has been invalidated and is kept around in memory. If it gets left there, it might cause all kinds of trouble further down the road.

Still, it's worth a shot, I suppose... sometimes it's good to live dangerously. :-)

David Haley aka Ksilyan
Head Programmer,
Legends of the Darkstone

http://david.the-haleys.org
Top

Posted by Matteo2303   (16 posts)  Bio
Date Reply #20 on Wed 24 Jan 2007 07:55 AM (UTC)
Message
>You could simply omit the exit line, and just ignore problems

No, make "//exit(1);" is a bad thing!
All existing descriptor "resets/restarts"; apparently is ok but it's a panic things: belive me, I had try :(
Top

Posted by Samson   USA  (683 posts)  Bio
Date Reply #21 on Wed 24 Jan 2007 09:25 PM (UTC)
Message
What if instead of exit(1) you tried changing it to a continue statement? That in theory would skip it on that pass and come back for it later. Maybe the OS just needs a bit of time to clean up the mess.
Top

Posted by Matteo2303   (16 posts)  Bio
Date Reply #22 on Wed 24 Jan 2007 09:44 PM (UTC)
Message
>What if instead of exit(1) you tried changing it to a continue statement?

Mmm, "continue"? Where?
"Select" istance isn't inside a loop.
Top

Posted by Matteo2303   (16 posts)  Bio
Date Reply #23 on Wed 24 Jan 2007 09:59 PM (UTC)
Message
Ok, I had test using a GOTO istance (lol) and reproduce a continue effect. This freze the mud. Also, if I use a "return" instead of "exit(1)" when "select" istance is < 0, then all descriptors resets (player disconnected, etc...). This is not the right way.

Bye
Top

Posted by Samson   USA  (683 posts)  Bio
Date Reply #24 on Thu 25 Jan 2007 02:35 AM (UTC)
Message
Right, ok. So I didn't notice this was happening in accept_new before. Scratch the continue idea.

With the situation you have you're probably out of luck without some major work on the networking code.
Top

Posted by Gohan_TheDragonball   USA  (183 posts)  Bio
Date Reply #25 on Thu 25 Jan 2007 06:50 AM (UTC)
Message
i am not as advanced as some of you, but i wanted to put this out there, instead of exiting couldn't you just deny the connection, and let the client connecting know to wait a minute and retry.
Top

Posted by David Haley   USA  (3,881 posts)  Bio
Date Reply #26 on Thu 25 Jan 2007 09:18 AM (UTC)
Message
Unfortunately no; what's going on here is that somehow SMAUG's internal picture of the network state has gotten messed up and it thinks some connections are open that really aren't. Then it tries to poll on those connections and fails, which is what is causing select to generate an error.

What might be possible would be to loop over all known descriptors and test each one with select individually until you find the one that returns an error, and then completely dump it. You would only do this after select returned an error, of course. This is not an ideal solution -- the real fix would be to not get into this situation in the first place -- but it's better than nothing.

David Haley aka Ksilyan
Head Programmer,
Legends of the Darkstone

http://david.the-haleys.org
Top

Posted by Matteo2303   (16 posts)  Bio
Date Reply #27 on Thu 25 Jan 2007 02:17 PM (UTC)
Message
>With the situation you have you're probably out of luck without some major work on the networking code.

Not "I have", but "we have". SmaugFuss under Cygwing Windows crashes in this situation. A DoS attacks generate a crash in a minute.
Top

Posted by Nick Gammon   Australia  (23,173 posts)  Bio   Forum Administrator
Date Reply #28 on Fri 26 Jan 2007 01:37 AM (UTC)
Message
Does this crash only happen under Cygwin? It wouldn't totally surprise me, that it is only unstable under Windows, and most people running production MUDs would be using Unix.

I once wrote a similar piece of code, to test multiple simultaneous connections to PennMUSH. It worked fine under Unix, but degraded rapidly under Windows. And that was using identical code.

I suspect this is an instability in the Windows Cygwin implementation of sockets, or even the underlying operating system support.

- Nick Gammon

www.gammon.com.au, www.mushclient.com
Top

Posted by Samson   USA  (683 posts)  Bio
Date Reply #29 on Fri 26 Jan 2007 02:43 AM (UTC)
Message
I would tend to agree with Nick's assessment since I've stress tested my own code with someone deliberately attacking it with a miniature bot-net. The worst that happened was that it simply stopped accepting new connections until some of the existing ones dropped off. Tested on linux of course.

I don't really see much value in protecting against this in Windows for the very reason that Cygwin isn't suited for a production environment.
Top

The dates and times for posts above are shown in Universal Co-ordinated Time (UTC).

To show them in your local time you can join the forum, and then set the 'time correction' field in your profile to the number of hours difference between your location and UTC time.


105,882 views.

This is page 2, subject is 3 pages long:  [Previous page]  1  2 3  [Next page]

It is now over 60 days since the last post. This thread is closed.     Refresh page

Go to topic:           Search the forum


[Go to top] top

Information and images on this site are licensed under the Creative Commons Attribution 3.0 Australia License unless stated otherwise.