Register forum user name Search FAQ

Gammon Forum

Notice: Any messages purporting to come from this site telling you that your password has expired, or that you need to verify your details, confirm your email, resolve issues, making threats, or asking for money, are spam. We do not email users with any such messages. If you have lost your password you can obtain a new one by using the password reset link.

Due to spam on this forum, all posts now need moderator approval.

 Entire forum ➜ SMAUG ➜ SMAUG coding ➜ Nanny Problem?

Nanny Problem?

It is now over 60 days since the last post. This thread is closed.     Refresh page


Pages: 1 2  

Posted by Nick Cash   USA  (626 posts)  Bio
Date Thu 08 Jan 2004 02:41 AM (UTC)

Amended on Thu 08 Jan 2004 03:44 AM (UTC) by Nick Cash

Message
This is confusing. Let me describe the situation.

I try to connect to the mud, it prompts for my name. I enter my name (Odis), and it prompts for my password. Now, after I enter my password I get the normal Welcome crap and the the standard load messages for my area:

Loading...
Linking exits...
Done.

Now, so far so good. However, right after that I press enter, and BOOM! It crashes. Now...I've tried for a while to figure this out. No, it doesn't generate a core file. No, it doesn't give me any errors. This seems to happen to everyone that trys to get on. If they don't have an area, naturally, it doesn't show the loading area messaged. I've tried placing a few bug statments in certain locations within the nanny function and I get two of three in the logs.
So, after that, I figured, well, there must be a bug in the code I just installed. However, the only thing it really modified was the loading of asteroids (no snippet, custom code here) and the updating of these asteroids (it compiles with no errors btw). So I uninstalled everything to do with asteroids and it still does the same. So I guess its not that.

My one hint to this problem:
SEGMENTATION VIOLATION

Its in all the logs. However, its very unspecific. So, I come to you to tell me where to search. Aparently it is having trouble loading people. Btw, read the next post for the log file information and the places on where I put bug statements in nanny.

~Nick Cash
http://www.nick-cash.com
Top

Posted by Nick Cash   USA  (626 posts)  Bio
Date Reply #1 on Thu 08 Jan 2004 02:46 AM (UTC)
Message
From the log file:

Wed Jan 7 19:41:21 2004 :: [*****] BUG: Area Loaded
Wed Jan 7 19:41:36 2004 :: [*****] BUG: Enter Has Been Pressed
Wed Jan 7 19:41:36 2004 :: SEGMENTATION VIOLATION

Corrosponding code in the nanny function:

At the very end of case CON_GET_OLD_PASSWORD:

      if ( ch->pcdata->area )
                do_loadarea (ch , "" );
        bug("Area Loaded");
        break;


At the very top of case CON_PRESS_ENTER:

   case CON_PRESS_ENTER:
          send_to_pager( "\014", ch );

          bug("Enter Has Been Pressed");


Note: those are the two that show up in the logs

Last one: at the very top of CON_READ_MOTD:

case CON_READ_MOTD:
        send_to_desc_color( "&z\n\r\n\r", d );
        add_char( ch );

        bug("Displaying MOTD");

        d->connected    = CON_PLAYING;

~Nick Cash
http://www.nick-cash.com
Top

Posted by Nick Gammon   Australia  (23,158 posts)  Bio   Forum Administrator
Date Reply #2 on Thu 08 Jan 2004 03:30 AM (UTC)
Message
Core dump may be suppressed. There was a suggestion recently from Ksilyan, I think, about changing your environment variables to allow that.

Anyway, just run under gdb to find the problem. ie.

cd area
gdb ../src/smaug
run

Then when it crashes type bt to find what call did it.

- Nick Gammon

www.gammon.com.au, www.mushclient.com
Top

Posted by Nick Gammon   Australia  (23,158 posts)  Bio   Forum Administrator
Date Reply #3 on Thu 08 Jan 2004 03:33 AM (UTC)
Message
Has send_to_desc_color been called before this? That doesn't seem to be standard SMAUG. Maybe it has a bug.

- Nick Gammon

www.gammon.com.au, www.mushclient.com
Top

Posted by Nick Cash   USA  (626 posts)  Bio
Date Reply #4 on Thu 08 Jan 2004 03:37 AM (UTC)
Message
#0 0x265e5e62 in ?? ()
Cannot access memory at address 0x265f7026

Err...what exactly does that mean? What line is it that it can't access the memory right?

Thanks for the help...I'm a gdb noob.

~Nick Cash
http://www.nick-cash.com
Top

Posted by Nick Cash   USA  (626 posts)  Bio
Date Reply #5 on Thu 08 Jan 2004 03:39 AM (UTC)

Amended on Thu 08 Jan 2004 03:44 AM (UTC) by Nick Cash

Message
Yes, it has, in almost every case statment in the switch it is called to send color to the descriptor. Switching back to write_to_buffer did not seem to help the cause any.

~Nick Cash
http://www.nick-cash.com
Top

Posted by Nick Cash   USA  (626 posts)  Bio
Date Reply #6 on Thu 08 Jan 2004 03:48 AM (UTC)
Message
Hmm..I just noticed something. The second bug call (Enter Has Been Pressed) comes right before a bunch of do_help calls. However, the last bug call (Displaying MOTD) is not displayed, so I think its safe to guess its not getting there. Perhaps there is a problem with do_help? Maybe help.dat got corrupted.

~Nick Cash
http://www.nick-cash.com
Top

Posted by Samson   USA  (683 posts)  Bio
Date Reply #7 on Thu 08 Jan 2004 03:51 AM (UTC)
Message
send_to_desc_color isn't a standard Smaug color function. It's usually associated with some form of color add-on. One such color add-on is my custom color snippet, which also doubles as a fix for the broken Smaug color system and is in the 3 FUSS packages I distribute.

However, the particular function is not setup to be used anywhere in the FUSS packages and could very well have an unforseen bug in it. This could be the source of the problem. I'm in the process of converting a rewrite of this code someone sent me for AFKMud, which has been tested with the colored descriptor stuff.

What bothers me is that "SEGMENTATION VIOLATION" is showing up in the logs. This suggests to me the codebase being used has a signal handler which is interfering with things. I've been opposed to the interception of SIGSEGV ever sicne I realized how evil doing so can be. It would always produce unusable results. More often than not, people pair it with an auto-copyover thing. The best bet is to just let the code crash, dump its core, and then let the startup script reboot it. Crashes happen. Initiating an auto-copyover is not wise since the crash could very well have come from a corrupt player, or a corrupt descriptor.

So my first bit of advice is to remove the signal trap for SIGSEGV and then let it crash. Chances are you'll get better results and the cause can be narrowed down faster.
Top

Posted by Nick Cash   USA  (626 posts)  Bio
Date Reply #8 on Thu 08 Jan 2004 04:02 AM (UTC)

Amended on Thu 08 Jan 2004 04:06 AM (UTC) by Nick Cash

Message
Indeed the SIGSEV signals are being intercepted. However, all it does is log the last player command and shutdownt he mud using exit( 1 );, which I think I heard is supposed to generate a core. Now, I've heard lots of bad things about SigSev and why its originally commented out in smaug/swr distributions, however, I found a site that another SWR coder had put up discussing the speculation and giving a fix for a couple things and how to handle it and all of that, so I figured I'd do the same. So far, it has worked just fine, and it is not stopping results except in the fact that it won't generate a core file. Why it does that is something I'm unsure of, and something you guru's can probably tell me. I am running off of SWR 1.0 FUSS, so it does have the color snippet you wrote Samson. I do not think the problem is with send_to_desc_color. When I did get a core file it just told me the same error:

#0 0x265e5e62 in ?? ()
Cannot access memory at address 0x265f7026

Anyways, if you really think I should, I will comment out SIGSEV again.

Thanks for the help, time for me to go sleep.

~Nick Cash
http://www.nick-cash.com
Top

Posted by Samson   USA  (683 posts)  Bio
Date Reply #9 on Thu 08 Jan 2004 04:12 AM (UTC)
Message
Doing an exit(1) won't generate a core. exit() is a valid system call which literally tells the mud to stop running. When it does so, it shuts down normally.

You may be able to make it work by calling abort() instead, but as my past experience tells me, intercepting SIGSEGV just doesn't work as well as one might like. I spent the better part of several weeks chasing my tail with an error relating to descritpors and players and the lastcommand variable before I realized the signal handler was trying to act on data that itself had become corrupted. It never dropped core files.

Given that you are using the FUSS package, and that there was a bug in our color code that caused crashes with NULL characters, it's high probablility this is what's happening.

Now I don't know what all the guy's site had to say on the issue, but I think he got lucky it hasn't caused him problems thus far. In any case, "Cannot access memory at address 0x265f7026" isn't enough to go on. Would need to see the backtrace info to see what calls led to that.
Top

Posted by David Haley   USA  (3,881 posts)  Bio
Date Reply #10 on Thu 08 Jan 2004 04:13 AM (UTC)
Message
Turn off the signal handler (comment out the line that registers it.) Run it using GDB as Nick suggested, then it'll show you precisely what went wrong. Backtrace if necessary to see the call stack (list of functions that have been called to reach this point.)

To backtrace: "bt"
To go "up" the stack frame: "up"
To go "down" ... "down"

Those basic commands are enough to let you navigate through the stack frame.

At any time in GDB you can just type "help" and it'll give you a list of topics you may need help in; but they are a bit obscure unless you know how to use a debugger.

David Haley aka Ksilyan
Head Programmer,
Legends of the Darkstone

http://david.the-haleys.org
Top

Posted by Nick Cash   USA  (626 posts)  Bio
Date Reply #11 on Thu 08 Jan 2004 04:35 AM (UTC)
Message
Thanks for the gdb help. Anythoughts on the problem at hand? It doesn't seem to be a help.are problem...

~Nick Cash
http://www.nick-cash.com
Top

Posted by David Haley   USA  (3,881 posts)  Bio
Date Reply #12 on Thu 08 Jan 2004 05:20 AM (UTC)
Message
With the backtrace, it'll be trivial to see where the problem is... without it, personally I have no clue given what you posted. :)

It could be below CON_PRESS_ENTER, below the bit that does the log... but again, I cannot stress how much easier things will be with a backtrace. :)

David Haley aka Ksilyan
Head Programmer,
Legends of the Darkstone

http://david.the-haleys.org
Top

Posted by Nick Gammon   Australia  (23,158 posts)  Bio   Forum Administrator
Date Reply #13 on Thu 08 Jan 2004 05:20 AM (UTC)
Message
Virtually any bug could cause a segmentation fault.

Having the signal handler simply confuses the debug, because the core dump, if it was generated, which it isn't, would be from the signal handler, not the *real* problem place.

Turn off the SIGSEV handler, and then report what the backtrace says. Anything else right now is just guessing.

- Nick Gammon

www.gammon.com.au, www.mushclient.com
Top

Posted by Nick Cash   USA  (626 posts)  Bio
Date Reply #14 on Thu 08 Jan 2004 10:09 PM (UTC)
Message
Sorry for not being clear. After I commented it out it gave the same back trace:

#0 0x265e5e62 in ?? ()
Cannot access memory at address 0x265f7026

~Nick Cash
http://www.nick-cash.com
Top

The dates and times for posts above are shown in Universal Co-ordinated Time (UTC).

To show them in your local time you can join the forum, and then set the 'time correction' field in your profile to the number of hours difference between your location and UTC time.


47,434 views.

This is page 1, subject is 2 pages long: 1 2  [Next page]

It is now over 60 days since the last post. This thread is closed.     Refresh page

Go to topic:           Search the forum


[Go to top] top

Information and images on this site are licensed under the Creative Commons Attribution 3.0 Australia License unless stated otherwise.