Notice: Any messages purporting to come from this site telling you that your password has expired, or that you need to verify your details, confirm your email, resolve issues, making threats, or asking for money, are
spam. We do not email users with any such messages. If you have lost your password you can obtain a new one by using the
password reset link.
Due to spam on this forum, all posts now need moderator approval.
Entire forum
➜ SMAUG
➜ SMAUG coding
➜ Nanny Problem?
It is now over 60 days since the last post. This thread is closed.
Refresh page
Pages: 1 2
Posted by
| Nick Cash
USA (626 posts) Bio
|
Date
| Thu 08 Jan 2004 02:41 AM (UTC) Amended on Thu 08 Jan 2004 03:44 AM (UTC) by Nick Cash
|
Message
| This is confusing. Let me describe the situation.
I try to connect to the mud, it prompts for my name. I enter my name (Odis), and it prompts for my password. Now, after I enter my password I get the normal Welcome crap and the the standard load messages for my area:
Loading...
Linking exits...
Done.
Now, so far so good. However, right after that I press enter, and BOOM! It crashes. Now...I've tried for a while to figure this out. No, it doesn't generate a core file. No, it doesn't give me any errors. This seems to happen to everyone that trys to get on. If they don't have an area, naturally, it doesn't show the loading area messaged. I've tried placing a few bug statments in certain locations within the nanny function and I get two of three in the logs.
So, after that, I figured, well, there must be a bug in the code I just installed. However, the only thing it really modified was the loading of asteroids (no snippet, custom code here) and the updating of these asteroids (it compiles with no errors btw). So I uninstalled everything to do with asteroids and it still does the same. So I guess its not that.
My one hint to this problem:
SEGMENTATION VIOLATION
Its in all the logs. However, its very unspecific. So, I come to you to tell me where to search. Aparently it is having trouble loading people. Btw, read the next post for the log file information and the places on where I put bug statements in nanny. |
~Nick Cash
http://www.nick-cash.com | Top |
|
Posted by
| Nick Cash
USA (626 posts) Bio
|
Date
| Reply #1 on Thu 08 Jan 2004 02:46 AM (UTC) |
Message
| From the log file:
Wed Jan 7 19:41:21 2004 :: [*****] BUG: Area Loaded
Wed Jan 7 19:41:36 2004 :: [*****] BUG: Enter Has Been Pressed
Wed Jan 7 19:41:36 2004 :: SEGMENTATION VIOLATION
Corrosponding code in the nanny function:
At the very end of case CON_GET_OLD_PASSWORD:
if ( ch->pcdata->area )
do_loadarea (ch , "" );
bug("Area Loaded");
break;
At the very top of case CON_PRESS_ENTER:
case CON_PRESS_ENTER:
send_to_pager( "\014", ch );
bug("Enter Has Been Pressed");
Note: those are the two that show up in the logs
Last one: at the very top of CON_READ_MOTD:
case CON_READ_MOTD:
send_to_desc_color( "&z\n\r\n\r", d );
add_char( ch );
bug("Displaying MOTD");
d->connected = CON_PLAYING;
|
~Nick Cash
http://www.nick-cash.com | Top |
|
Posted by
| Nick Gammon
Australia (23,158 posts) Bio
Forum Administrator |
Date
| Reply #2 on Thu 08 Jan 2004 03:30 AM (UTC) |
Message
| Core dump may be suppressed. There was a suggestion recently from Ksilyan, I think, about changing your environment variables to allow that.
Anyway, just run under gdb to find the problem. ie.
cd area
gdb ../src/smaug
run
Then when it crashes type bt to find what call did it. |
- Nick Gammon
www.gammon.com.au, www.mushclient.com | Top |
|
Posted by
| Nick Gammon
Australia (23,158 posts) Bio
Forum Administrator |
Date
| Reply #3 on Thu 08 Jan 2004 03:33 AM (UTC) |
Message
| Has send_to_desc_color been called before this? That doesn't seem to be standard SMAUG. Maybe it has a bug. |
- Nick Gammon
www.gammon.com.au, www.mushclient.com | Top |
|
Posted by
| Nick Cash
USA (626 posts) Bio
|
Date
| Reply #4 on Thu 08 Jan 2004 03:37 AM (UTC) |
Message
| #0 0x265e5e62 in ?? ()
Cannot access memory at address 0x265f7026
Err...what exactly does that mean? What line is it that it can't access the memory right?
Thanks for the help...I'm a gdb noob. |
~Nick Cash
http://www.nick-cash.com | Top |
|
Posted by
| Nick Cash
USA (626 posts) Bio
|
Date
| Reply #5 on Thu 08 Jan 2004 03:39 AM (UTC) Amended on Thu 08 Jan 2004 03:44 AM (UTC) by Nick Cash
|
Message
| Yes, it has, in almost every case statment in the switch it is called to send color to the descriptor. Switching back to write_to_buffer did not seem to help the cause any. |
~Nick Cash
http://www.nick-cash.com | Top |
|
Posted by
| Nick Cash
USA (626 posts) Bio
|
Date
| Reply #6 on Thu 08 Jan 2004 03:48 AM (UTC) |
Message
| Hmm..I just noticed something. The second bug call (Enter Has Been Pressed) comes right before a bunch of do_help calls. However, the last bug call (Displaying MOTD) is not displayed, so I think its safe to guess its not getting there. Perhaps there is a problem with do_help? Maybe help.dat got corrupted. |
~Nick Cash
http://www.nick-cash.com | Top |
|
Posted by
| Samson
USA (683 posts) Bio
|
Date
| Reply #7 on Thu 08 Jan 2004 03:51 AM (UTC) |
Message
| send_to_desc_color isn't a standard Smaug color function. It's usually associated with some form of color add-on. One such color add-on is my custom color snippet, which also doubles as a fix for the broken Smaug color system and is in the 3 FUSS packages I distribute.
However, the particular function is not setup to be used anywhere in the FUSS packages and could very well have an unforseen bug in it. This could be the source of the problem. I'm in the process of converting a rewrite of this code someone sent me for AFKMud, which has been tested with the colored descriptor stuff.
What bothers me is that "SEGMENTATION VIOLATION" is showing up in the logs. This suggests to me the codebase being used has a signal handler which is interfering with things. I've been opposed to the interception of SIGSEGV ever sicne I realized how evil doing so can be. It would always produce unusable results. More often than not, people pair it with an auto-copyover thing. The best bet is to just let the code crash, dump its core, and then let the startup script reboot it. Crashes happen. Initiating an auto-copyover is not wise since the crash could very well have come from a corrupt player, or a corrupt descriptor.
So my first bit of advice is to remove the signal trap for SIGSEGV and then let it crash. Chances are you'll get better results and the cause can be narrowed down faster. | Top |
|
Posted by
| Nick Cash
USA (626 posts) Bio
|
Date
| Reply #8 on Thu 08 Jan 2004 04:02 AM (UTC) Amended on Thu 08 Jan 2004 04:06 AM (UTC) by Nick Cash
|
Message
| Indeed the SIGSEV signals are being intercepted. However, all it does is log the last player command and shutdownt he mud using exit( 1 );, which I think I heard is supposed to generate a core. Now, I've heard lots of bad things about SigSev and why its originally commented out in smaug/swr distributions, however, I found a site that another SWR coder had put up discussing the speculation and giving a fix for a couple things and how to handle it and all of that, so I figured I'd do the same. So far, it has worked just fine, and it is not stopping results except in the fact that it won't generate a core file. Why it does that is something I'm unsure of, and something you guru's can probably tell me. I am running off of SWR 1.0 FUSS, so it does have the color snippet you wrote Samson. I do not think the problem is with send_to_desc_color. When I did get a core file it just told me the same error:
#0 0x265e5e62 in ?? ()
Cannot access memory at address 0x265f7026
Anyways, if you really think I should, I will comment out SIGSEV again.
Thanks for the help, time for me to go sleep. |
~Nick Cash
http://www.nick-cash.com | Top |
|
Posted by
| Samson
USA (683 posts) Bio
|
Date
| Reply #9 on Thu 08 Jan 2004 04:12 AM (UTC) |
Message
| Doing an exit(1) won't generate a core. exit() is a valid system call which literally tells the mud to stop running. When it does so, it shuts down normally.
You may be able to make it work by calling abort() instead, but as my past experience tells me, intercepting SIGSEGV just doesn't work as well as one might like. I spent the better part of several weeks chasing my tail with an error relating to descritpors and players and the lastcommand variable before I realized the signal handler was trying to act on data that itself had become corrupted. It never dropped core files.
Given that you are using the FUSS package, and that there was a bug in our color code that caused crashes with NULL characters, it's high probablility this is what's happening.
Now I don't know what all the guy's site had to say on the issue, but I think he got lucky it hasn't caused him problems thus far. In any case, "Cannot access memory at address 0x265f7026" isn't enough to go on. Would need to see the backtrace info to see what calls led to that. | Top |
|
Posted by
| David Haley
USA (3,881 posts) Bio
|
Date
| Reply #10 on Thu 08 Jan 2004 04:13 AM (UTC) |
Message
| Turn off the signal handler (comment out the line that registers it.) Run it using GDB as Nick suggested, then it'll show you precisely what went wrong. Backtrace if necessary to see the call stack (list of functions that have been called to reach this point.)
To backtrace: "bt"
To go "up" the stack frame: "up"
To go "down" ... "down"
Those basic commands are enough to let you navigate through the stack frame.
At any time in GDB you can just type "help" and it'll give you a list of topics you may need help in; but they are a bit obscure unless you know how to use a debugger. |
David Haley aka Ksilyan
Head Programmer,
Legends of the Darkstone
http://david.the-haleys.org | Top |
|
Posted by
| Nick Cash
USA (626 posts) Bio
|
Date
| Reply #11 on Thu 08 Jan 2004 04:35 AM (UTC) |
Message
| Thanks for the gdb help. Anythoughts on the problem at hand? It doesn't seem to be a help.are problem... |
~Nick Cash
http://www.nick-cash.com | Top |
|
Posted by
| David Haley
USA (3,881 posts) Bio
|
Date
| Reply #12 on Thu 08 Jan 2004 05:20 AM (UTC) |
Message
| With the backtrace, it'll be trivial to see where the problem is... without it, personally I have no clue given what you posted. :)
It could be below CON_PRESS_ENTER, below the bit that does the log... but again, I cannot stress how much easier things will be with a backtrace. :) |
David Haley aka Ksilyan
Head Programmer,
Legends of the Darkstone
http://david.the-haleys.org | Top |
|
Posted by
| Nick Gammon
Australia (23,158 posts) Bio
Forum Administrator |
Date
| Reply #13 on Thu 08 Jan 2004 05:20 AM (UTC) |
Message
| Virtually any bug could cause a segmentation fault.
Having the signal handler simply confuses the debug, because the core dump, if it was generated, which it isn't, would be from the signal handler, not the *real* problem place.
Turn off the SIGSEV handler, and then report what the backtrace says. Anything else right now is just guessing. |
- Nick Gammon
www.gammon.com.au, www.mushclient.com | Top |
|
Posted by
| Nick Cash
USA (626 posts) Bio
|
Date
| Reply #14 on Thu 08 Jan 2004 10:09 PM (UTC) |
Message
| Sorry for not being clear. After I commented it out it gave the same back trace:
#0 0x265e5e62 in ?? ()
Cannot access memory at address 0x265f7026
|
~Nick Cash
http://www.nick-cash.com | Top |
|
The dates and times for posts above are shown in Universal Co-ordinated Time (UTC).
To show them in your local time you can join the forum, and then set the 'time correction' field in your profile to the number of hours difference between your location and UTC time.
47,434 views.
This is page 1, subject is 2 pages long: 1 2
It is now over 60 days since the last post. This thread is closed.
Refresh page
top