Register forum user name Search FAQ

Gammon Forum

Notice: Any messages purporting to come from this site telling you that your password has expired, or that you need to verify your details, confirm your email, resolve issues, making threats, or asking for money, are spam. We do not email users with any such messages. If you have lost your password you can obtain a new one by using the password reset link.
 Entire forum ➜ MUSHclient ➜ Python ➜ Parsing XML file to strings?

Parsing XML file to strings?

It is now over 60 days since the last post. This thread is closed.     Refresh page


Pages: 1 2  

Posted by Rakon   USA  (123 posts)  Bio
Date Mon 06 Nov 2006 06:25 AM (UTC)
Message
Greetings.

I understand that MUSH already has an 'ExportToXml' function, but I was wondering if anyone knows of a way to take an XML document in python, and convert the nodes to a python dictionary, or even just a string.

What I'm looking to do is take:

<triggers>
  <trigger
   enabled="y"
   name="trigger1"
   keep_evaluating="y"
   match="*With an heroic effort *"
   send_to="12"
   sequence="100"
  >
  <send>world.ColourTell ("lime", "black", "[CURED]: ")
  world.ColourNote ( "white", "black", "Impaled no longer")world.SetVariable("standing", "off")
  </send>
  </trigger>
  <trigger
   group="gold"
   match="^You get (.*?) gold sovereigns from (a|an) (.*?)\.$"
   regexp="y"
   sequence="100"
  >
  <send>put gold in pack</send>
  </trigger>
</triggers>

And extract each different trigger to display its values:

Trigger: trigger1 
Enabled is yes
Keep evaluting is yes
Match is: *With an heroic effort *

Send value is :
world.ColourTell ("lime", "black", "[CURED]: ")
world.ColourNote ( "white", "black", "Impaled no longer")world.SetVariable("standing", "off")


Now, it doesn't matter about what it says, right now I'm just having a hell of a time trying to extract ANYTHING even using simplified SAX.

Does anyone know how I would be able to do this in python?
Thanks

Yes, I am a criminal.
My crime is that of curiosity.
My crime is that of judging people by what they say and think, not what they look like.
My crime is that of outsmarting you, something that you will never forgive me for.
Top

Posted by Nick Gammon   Australia  (23,120 posts)  Bio   Forum Administrator
Date Reply #1 on Mon 06 Nov 2006 06:43 AM (UTC)
Message
On this page:

http://www.gammon.com.au/scripts/doc.php?general=lua

See: utils.xmlread (s) - XML parser

You could use that, fairly easily, to break down the XML code into individual items, like you wanted.

- Nick Gammon

www.gammon.com.au, www.mushclient.com
Top

Posted by Rakon   USA  (123 posts)  Bio
Date Reply #2 on Mon 06 Nov 2006 06:54 AM (UTC)

Amended on Mon 06 Nov 2006 06:56 AM (UTC) by Rakon

Message
I take it thats Lua though?

And if Im mistaken that only goes through the attributes, putting them each on a line by itself.
This is fine, but:


a, b, c = utils.xmlread ("<foo><bar x='2'/></foo>")


How would you read a file with it to parse to string?
This is why I'd like to use Python, though I've only been able to so far produce a list, with only the 'match' values.
What I'm looking for is a way to capture the '<trigger'
attribs, as well as the send.

import string
from xml.parsers import expat

class Element:
    'A parsed XML element'
    def __init__(self,name,attributes):
        'Element constructor'
        self.name = name
        self.attributes = attributes
        self.cdata = ''
        self.children = []
        
    def AddChild(self,element):
        'Add a reference to a child element'
        self.children.append(element)
        
    def getAttribute(self,key):
        return self.attributes.get(key)
    
    def getData(self):
        return self.cdata
        
    def getElements(self,name=''):
        if not name:
            return self.children
        else:
            elements = []
            for element in self.children:
                if element.name == name:
                    elements.append(element)
            return elements

    def toString(self, level=0):
        retval = " " * level
        retval += "<%s" % self.name
        for attribute in self.attributes:
            retval += " %s=\"%s\"" % (attribute, self.attributes[attribute])
        c = ""
        for child in self.children:
            c += child.toString(level+1)
        if c == "":
            retval += "/>\n"
        else:
            retval += ">\n" + c + ("</%s>\n" % self.name)
        return retval
        
class Xml_list:
    def __init__(self):
        self.root = None
        self.nodeStack = []
        
    def StartElement(self,name,attributes):
        element = Element(name.encode(),attributes)
        
        if len(self.nodeStack) > 0:
            parent = self.nodeStack[-1]
            parent.AddChild(element)
        else:
            self.root = element
        self.nodeStack.append(element)
        
    def EndElement(self,name):
        self.nodeStack = self.nodeStack[:-1]

    def CharacterData(self,data):
        'SAX character data event handler'
        if string.strip(data):
            data = data.encode()
            element = self.nodeStack[-1]
            element.cdata += data
            return

    def Parse(self,filename):
        # Create a SAX parser
        Parser = expat.ParserCreate()

        Parser.StartElementHandler = self.StartElement
        Parser.EndElementHandler = self.EndElement
        Parser.CharacterDataHandler = self.CharacterData

        # Parse the XML File
        ParserStatus = Parser.Parse(open(filename,'r').read(), 1)
        
        return self.root
    
parser = Xml_list()
element = parser.Parse('d:\python\Mush_test.xml')
print element.toString()

But that gives:

<triggers>
 <trigger keep_evaluating="y" enabled="y" send_to="12" match="*With an heroic effort *" sequence="100">
  <send/>
</trigger>
 <trigger regexp="y" enabled="y" match="^Vixen\, the Wilderness Guide makes a complicated(.*?)$" sequence="100">
  <send/>
</trigger>
</triggers>


Any ideas on how to get the 'send' field to show up as well?
I'd also be looking to perform string manipulation on the trigger after the information is stripped from it.

Yes, I am a criminal.
My crime is that of curiosity.
My crime is that of judging people by what they say and think, not what they look like.
My crime is that of outsmarting you, something that you will never forgive me for.
Top

Posted by Ked   Russia  (524 posts)  Bio
Date Reply #3 on Mon 06 Nov 2006 08:57 AM (UTC)
Message
Why not use xml.dom.minidom instead? It's much simpler than expat, and you wouldn't need to build a custom Element class.
Top

Posted by Rakon   USA  (123 posts)  Bio
Date Reply #4 on Mon 06 Nov 2006 11:06 AM (UTC)
Message
Care to go through an example on how I would do that Ked?

Basicly I thought expat MIGHT be easier as all I was trying to do was get the Element names, and attributs and print them out.

Not as easy as I hoped it would be ;)

Yes, I am a criminal.
My crime is that of curiosity.
My crime is that of judging people by what they say and think, not what they look like.
My crime is that of outsmarting you, something that you will never forgive me for.
Top

Posted by Ked   Russia  (524 posts)  Bio
Date Reply #5 on Mon 06 Nov 2006 01:39 PM (UTC)
Message
There's an example in the docs, but for what you want to do it would probably go along the lines of:

from xml.dom.minidom import parse

mydom = parse(filename)

mydom.toprettyxml()


That just reads the document in and prints it out. If you wanted to print a certain element or a list of elements, then you'd need to dive a bit deeper into DOM API. For example, to print out all triggers you'd do this:


mydom = parse(filename)

trigs = mydom.getElementsByTagName('trigger')

for trig in trigs: trig.toprettyxml()


DOM API is documented in the "Structured Markup Language Processing Tools" section of the library reference, but it's under xml.dom, not xml.dom.minidom which is just a simplified version of the former.


Top

Posted by Rakon   USA  (123 posts)  Bio
Date Reply #6 on Tue 07 Nov 2006 02:33 AM (UTC)
Message
Alright, works well so far with an absolute path but if I allow the user to select a file, EG:

from xml.dom.minidom import parse
import EasyGui as eg
file = eg.fileopenbox(msg=None, title='Open file...', argInitialFile=' ')
mydom = parse(file)


trigs = file.getElementsByTagName('trigger')

for trig in trigs:
	print trig.getAttribute('match')




Traceback (most recent call last):
File "<pyshell#33>", line 1, in <module>
reload(File_xml)
File "D:\Python\File_xml.py", line 4, in <module>
mydom = parse(file)
File "D:\Python\lib\xml\dom\minidom.py", line 1913, in parse
return expatbuilder.parse(file)
File "D:\Python\lib\xml\dom\expatbuilder.py", line 924, in parse
result = builder.parseFile(fp)
File "D:\Python\lib\xml\dom\expatbuilder.py", line 207, in parseFile
parser.Parse(buffer, 0)
ExpatError: syntax error: line 1, column 0

Is there a way I need to escape it or something?? Reading over the API docs:
Quote:

parse( filename_or_file, parser)

Return a Document from the given input. filename_or_file may be either a file name, or a file-like object. parser, if given, must be a SAX2 parser object. This function will change the document handler of the parser and activate namespace support; other parser configuration (like setting an entity resolver) must have been done in advance.

So theres no reason it should not work, its a XML file I am trying to load and parse. BUT if I put the file in as absolute, in the code itself, the code works fine. Any reason for this? Or a way to get it to work with being able to select the file?

Yes, I am a criminal.
My crime is that of curiosity.
My crime is that of judging people by what they say and think, not what they look like.
My crime is that of outsmarting you, something that you will never forgive me for.
Top

Posted by Rakon   USA  (123 posts)  Bio
Date Reply #7 on Tue 07 Nov 2006 03:09 AM (UTC)
Message
Okay, now I can only get it working with the file declared in the code, but when I try to return the child 'send' of 'trigger' it prints out the 'DOM Element: at <blah' instead of the node vaule. I've tried using 'nodeValue' to print the text out, but it prints out an empty string, when none of the <send> nodes are empty.


from xml.dom.minidom import parse
import EasyGui as eg
file = open('d:\python\Mush_test.txt')
mydom = parse(file)

trigs = mydom.getElementsByTagName('trigger')
for trig in trigs:
	match = trig.getAttribute('match')
	send = trig.getElementsByTagName('send')
	print 'Match on: ' + str(match) + '\nSend: ' + str(send) + '}\n'


Match on: *With an heroic effort *
Send: [<DOM Element: send at 0x143d0d0>]}

Match on: ^Vixen\, the Wilderness Guide makes a complicated(.*?)$
Send: [<DOM Element: send at 0x1439030>]}

Match on: ^You get (\d+) gold sovereigns from (a|an) (.*?)\.$
Send: [<DOM Element: send at 0x1439e90>]}

Match on: ^You remove (a|an) (.*?)\.$
Send: [<DOM Element: send at 0x1441eb8>]}


What would I use for converting the element to its text value?

Yes, I am a criminal.
My crime is that of curiosity.
My crime is that of judging people by what they say and think, not what they look like.
My crime is that of outsmarting you, something that you will never forgive me for.
Top

Posted by Ked   Russia  (524 posts)  Bio
Date Reply #8 on Tue 07 Nov 2006 05:21 AM (UTC)
Message
You need either element.toprettyxml() or element.toxml(). The str(element) call just returns a generic description of the object but doesn't convert it to XML.

As for the filename, I am not sure what the problem is, but you could try to open the file for reading and feeding the resulting file object to parse() - maybe that will work better.
Top

Posted by Nick Gammon   Australia  (23,120 posts)  Bio   Forum Administrator
Date Reply #9 on Tue 07 Nov 2006 05:25 AM (UTC)
Message
I know you want to use Python and not Lua, however this post shows how I extracted the various fields, including the "send" field, using the internal XML parser:

http://www.gammon.com.au/forum/?id=7123

Maybe you could put that into a plugin, and set a variable, and then query that variable in Python.

- Nick Gammon

www.gammon.com.au, www.mushclient.com
Top

Posted by Rakon   USA  (123 posts)  Bio
Date Reply #10 on Tue 07 Nov 2006 06:25 AM (UTC)
Message
Thanks Nick, I'll look into that.

Ked, after I posted..yeah, relized it helps to open the file being called for reading :)

Though if I try to use 'toprettyxml' or 'toxml' as such:

send = trig.getElementsByTagName('send')
send = send.toprettyxml()


Returns

send = send.toprettyxml()
AttributeError: 'NodeList' object has no attribute 'toprettyxml'


I mean DOM is alot easier for working WITH the XML from what I can tell, and use but even if I try to use something like

trig.firstChild.NodeValue

Just to get what's inbetween the tags of <send> it returns an empty string, or <DOM Text node "
">.

And even when I try to use 'toprettyxml' on the above it returns 'AttributeError: 'unicode' object has no attribute 'toprettyxml'' or
'AttributeError: 'str' object has no attribute 'toprettyxml''


send = trig.firstChild.nodeValue
send = send.toprettyxml()
---
send = str(trig.firstChild.nodeValue)
send = send.toprettyxml()


Thanks for the help again!

Yes, I am a criminal.
My crime is that of curiosity.
My crime is that of judging people by what they say and think, not what they look like.
My crime is that of outsmarting you, something that you will never forgive me for.
Top

Posted by Ked   Russia  (524 posts)  Bio
Date Reply #11 on Tue 07 Nov 2006 07:29 AM (UTC)
Message
Erm, sorry about that. I forgot that getElementsByTagName() returns a list. In fact, it returns the exact same thing as the same call to get the triggers, so it should be treated the same. Try:


send = trig.getElementsByTagName('send')
send[0].toprettyxml()


That should do the trick, as long as the trigger actually has a "send" child. If it doesn't then you'll probably get an empty list, although I am not sure about that.
Top

Posted by Rakon   USA  (123 posts)  Bio
Date Reply #12 on Tue 07 Nov 2006 07:01 PM (UTC)
Message
Hmm that works now, and thats strange, because last time I tried to do that exact same thing I received an error of 'Attribute has no function '__getattr__' or thereabouts.

Yes, I am a criminal.
My crime is that of curiosity.
My crime is that of judging people by what they say and think, not what they look like.
My crime is that of outsmarting you, something that you will never forgive me for.
Top

Posted by Rakon   USA  (123 posts)  Bio
Date Reply #13 on Tue 07 Nov 2006 08:12 PM (UTC)
Message
Mhmm....Always with the problems.

Alright, if I have the following script in a .py by itself, the commands and code work.

filename = eg.fileopenbox(msg=None, title='Open file...', argInitialFile=sys.path[6]+'\example.txt')
file = open(filename)
mydom = parse(file)

trigs = mydom.getElementsByTagName('trigger')

for trig in trigs:
	match = trig.getAttribute('match')
	send = trig.getElementsByTagName('send')
	send = send[0].toprettyxml()
	send = str(send).replace("	","",1)
	send = send.strip("<send>\n")
	send = send.strip("</send>\n")
	print 'Match on: ' + str(match) + '\nSend: ' + str(send) + '\n'
file.close()

And it reports the strings just fine, with the exception that it turns the standard ' " & and symbols into the &quot; tags.

If I put them into a function in a different 'main' program, then I get the following error when it gets called:

...
parser.Parse(buffer,0)
ExpatError: syntax error: line 1, column 0

Now my question is why, in a script by itself the code will work fine and read the file, print out the results. But in the 'main' program, being called in a function it raises an error??

Is there way I can get it to work inside a function call?

Yes, I am a criminal.
My crime is that of curiosity.
My crime is that of judging people by what they say and think, not what they look like.
My crime is that of outsmarting you, something that you will never forgive me for.
Top

Posted by Ked   Russia  (524 posts)  Bio
Date Reply #14 on Tue 07 Nov 2006 09:04 PM (UTC)
Message
The error seems to imply that you don't have the XML string - same error as before, when you tried to parse a filename. Can you post the entire program, or at least the function this is a part of and the 'main' up to and including a call to that function?
Top

The dates and times for posts above are shown in Universal Co-ordinated Time (UTC).

To show them in your local time you can join the forum, and then set the 'time correction' field in your profile to the number of hours difference between your location and UTC time.


62,701 views.

This is page 1, subject is 2 pages long: 1 2  [Next page]

It is now over 60 days since the last post. This thread is closed.     Refresh page

Go to topic:           Search the forum


[Go to top] top

Information and images on this site are licensed under the Creative Commons Attribution 3.0 Australia License unless stated otherwise.