Notice: Any messages purporting to come from this site telling you that your password has expired, or that you need to verify your details, confirm your email, resolve issues, making threats, or asking for money, are
spam. We do not email users with any such messages. If you have lost your password you can obtain a new one by using the
password reset link.
Entire forum
➜ MUSHclient
➜ Python
➜ Parsing XML file to strings?
Parsing XML file to strings?
|
It is now over 60 days since the last post. This thread is closed.
Refresh page
Pages: 1 2
Posted by
| Rakon
USA (123 posts) Bio
|
Date
| Mon 06 Nov 2006 06:25 AM (UTC) |
Message
| Greetings.
I understand that MUSH already has an 'ExportToXml' function, but I was wondering if anyone knows of a way to take an XML document in python, and convert the nodes to a python dictionary, or even just a string.
What I'm looking to do is take:
<triggers>
<trigger
enabled="y"
name="trigger1"
keep_evaluating="y"
match="*With an heroic effort *"
send_to="12"
sequence="100"
>
<send>world.ColourTell ("lime", "black", "[CURED]: ")
world.ColourNote ( "white", "black", "Impaled no longer")world.SetVariable("standing", "off")
</send>
</trigger>
<trigger
group="gold"
match="^You get (.*?) gold sovereigns from (a|an) (.*?)\.$"
regexp="y"
sequence="100"
>
<send>put gold in pack</send>
</trigger>
</triggers>
And extract each different trigger to display its values:
Trigger: trigger1
Enabled is yes
Keep evaluting is yes
Match is: *With an heroic effort *
Send value is :
world.ColourTell ("lime", "black", "[CURED]: ")
world.ColourNote ( "white", "black", "Impaled no longer")world.SetVariable("standing", "off")
Now, it doesn't matter about what it says, right now I'm just having a hell of a time trying to extract ANYTHING even using simplified SAX.
Does anyone know how I would be able to do this in python?
Thanks |
Yes, I am a criminal.
My crime is that of curiosity.
My crime is that of judging people by what they say and think, not what they look like.
My crime is that of outsmarting you, something that you will never forgive me for. | Top |
|
Posted by
| Nick Gammon
Australia (23,120 posts) Bio
Forum Administrator |
Date
| Reply #1 on Mon 06 Nov 2006 06:43 AM (UTC) |
Message
| |
Posted by
| Rakon
USA (123 posts) Bio
|
Date
| Reply #2 on Mon 06 Nov 2006 06:54 AM (UTC) Amended on Mon 06 Nov 2006 06:56 AM (UTC) by Rakon
|
Message
| I take it thats Lua though?
And if Im mistaken that only goes through the attributes, putting them each on a line by itself.
This is fine, but:
a, b, c = utils.xmlread ("<foo><bar x='2'/></foo>")
How would you read a file with it to parse to string?
This is why I'd like to use Python, though I've only been able to so far produce a list, with only the 'match' values.
What I'm looking for is a way to capture the '<trigger'
attribs, as well as the send.
import string
from xml.parsers import expat
class Element:
'A parsed XML element'
def __init__(self,name,attributes):
'Element constructor'
self.name = name
self.attributes = attributes
self.cdata = ''
self.children = []
def AddChild(self,element):
'Add a reference to a child element'
self.children.append(element)
def getAttribute(self,key):
return self.attributes.get(key)
def getData(self):
return self.cdata
def getElements(self,name=''):
if not name:
return self.children
else:
elements = []
for element in self.children:
if element.name == name:
elements.append(element)
return elements
def toString(self, level=0):
retval = " " * level
retval += "<%s" % self.name
for attribute in self.attributes:
retval += " %s=\"%s\"" % (attribute, self.attributes[attribute])
c = ""
for child in self.children:
c += child.toString(level+1)
if c == "":
retval += "/>\n"
else:
retval += ">\n" + c + ("</%s>\n" % self.name)
return retval
class Xml_list:
def __init__(self):
self.root = None
self.nodeStack = []
def StartElement(self,name,attributes):
element = Element(name.encode(),attributes)
if len(self.nodeStack) > 0:
parent = self.nodeStack[-1]
parent.AddChild(element)
else:
self.root = element
self.nodeStack.append(element)
def EndElement(self,name):
self.nodeStack = self.nodeStack[:-1]
def CharacterData(self,data):
'SAX character data event handler'
if string.strip(data):
data = data.encode()
element = self.nodeStack[-1]
element.cdata += data
return
def Parse(self,filename):
# Create a SAX parser
Parser = expat.ParserCreate()
Parser.StartElementHandler = self.StartElement
Parser.EndElementHandler = self.EndElement
Parser.CharacterDataHandler = self.CharacterData
# Parse the XML File
ParserStatus = Parser.Parse(open(filename,'r').read(), 1)
return self.root
parser = Xml_list()
element = parser.Parse('d:\python\Mush_test.xml')
print element.toString()
But that gives:
<triggers>
<trigger keep_evaluating="y" enabled="y" send_to="12" match="*With an heroic effort *" sequence="100">
<send/>
</trigger>
<trigger regexp="y" enabled="y" match="^Vixen\, the Wilderness Guide makes a complicated(.*?)$" sequence="100">
<send/>
</trigger>
</triggers>
Any ideas on how to get the 'send' field to show up as well?
I'd also be looking to perform string manipulation on the trigger after the information is stripped from it. |
Yes, I am a criminal.
My crime is that of curiosity.
My crime is that of judging people by what they say and think, not what they look like.
My crime is that of outsmarting you, something that you will never forgive me for. | Top |
|
Posted by
| Ked
Russia (524 posts) Bio
|
Date
| Reply #3 on Mon 06 Nov 2006 08:57 AM (UTC) |
Message
| Why not use xml.dom.minidom instead? It's much simpler than expat, and you wouldn't need to build a custom Element class. | Top |
|
Posted by
| Rakon
USA (123 posts) Bio
|
Date
| Reply #4 on Mon 06 Nov 2006 11:06 AM (UTC) |
Message
| Care to go through an example on how I would do that Ked?
Basicly I thought expat MIGHT be easier as all I was trying to do was get the Element names, and attributs and print them out.
Not as easy as I hoped it would be ;) |
Yes, I am a criminal.
My crime is that of curiosity.
My crime is that of judging people by what they say and think, not what they look like.
My crime is that of outsmarting you, something that you will never forgive me for. | Top |
|
Posted by
| Ked
Russia (524 posts) Bio
|
Date
| Reply #5 on Mon 06 Nov 2006 01:39 PM (UTC) |
Message
| There's an example in the docs, but for what you want to do it would probably go along the lines of:
from xml.dom.minidom import parse
mydom = parse(filename)
mydom.toprettyxml()
That just reads the document in and prints it out. If you wanted to print a certain element or a list of elements, then you'd need to dive a bit deeper into DOM API. For example, to print out all triggers you'd do this:
mydom = parse(filename)
trigs = mydom.getElementsByTagName('trigger')
for trig in trigs: trig.toprettyxml()
DOM API is documented in the "Structured Markup Language Processing Tools" section of the library reference, but it's under xml.dom, not xml.dom.minidom which is just a simplified version of the former.
| Top |
|
Posted by
| Rakon
USA (123 posts) Bio
|
Date
| Reply #6 on Tue 07 Nov 2006 02:33 AM (UTC) |
Message
| Alright, works well so far with an absolute path but if I allow the user to select a file, EG:
from xml.dom.minidom import parse
import EasyGui as eg
file = eg.fileopenbox(msg=None, title='Open file...', argInitialFile=' ')
mydom = parse(file)
trigs = file.getElementsByTagName('trigger')
for trig in trigs:
print trig.getAttribute('match')
Traceback (most recent call last):
File "<pyshell#33>", line 1, in <module>
reload(File_xml)
File "D:\Python\File_xml.py", line 4, in <module>
mydom = parse(file)
File "D:\Python\lib\xml\dom\minidom.py", line 1913, in parse
return expatbuilder.parse(file)
File "D:\Python\lib\xml\dom\expatbuilder.py", line 924, in parse
result = builder.parseFile(fp)
File "D:\Python\lib\xml\dom\expatbuilder.py", line 207, in parseFile
parser.Parse(buffer, 0)
ExpatError: syntax error: line 1, column 0
Is there a way I need to escape it or something?? Reading over the API docs:
Quote:
parse( filename_or_file, parser)
Return a Document from the given input. filename_or_file may be either a file name, or a file-like object. parser, if given, must be a SAX2 parser object. This function will change the document handler of the parser and activate namespace support; other parser configuration (like setting an entity resolver) must have been done in advance.
So theres no reason it should not work, its a XML file I am trying to load and parse. BUT if I put the file in as absolute, in the code itself, the code works fine. Any reason for this? Or a way to get it to work with being able to select the file? |
Yes, I am a criminal.
My crime is that of curiosity.
My crime is that of judging people by what they say and think, not what they look like.
My crime is that of outsmarting you, something that you will never forgive me for. | Top |
|
Posted by
| Rakon
USA (123 posts) Bio
|
Date
| Reply #7 on Tue 07 Nov 2006 03:09 AM (UTC) |
Message
| Okay, now I can only get it working with the file declared in the code, but when I try to return the child 'send' of 'trigger' it prints out the 'DOM Element: at <blah' instead of the node vaule. I've tried using 'nodeValue' to print the text out, but it prints out an empty string, when none of the <send> nodes are empty.
from xml.dom.minidom import parse
import EasyGui as eg
file = open('d:\python\Mush_test.txt')
mydom = parse(file)
trigs = mydom.getElementsByTagName('trigger')
for trig in trigs:
match = trig.getAttribute('match')
send = trig.getElementsByTagName('send')
print 'Match on: ' + str(match) + '\nSend: ' + str(send) + '}\n'
Match on: *With an heroic effort *
Send: [<DOM Element: send at 0x143d0d0>]}
Match on: ^Vixen\, the Wilderness Guide makes a complicated(.*?)$
Send: [<DOM Element: send at 0x1439030>]}
Match on: ^You get (\d+) gold sovereigns from (a|an) (.*?)\.$
Send: [<DOM Element: send at 0x1439e90>]}
Match on: ^You remove (a|an) (.*?)\.$
Send: [<DOM Element: send at 0x1441eb8>]}
What would I use for converting the element to its text value? |
Yes, I am a criminal.
My crime is that of curiosity.
My crime is that of judging people by what they say and think, not what they look like.
My crime is that of outsmarting you, something that you will never forgive me for. | Top |
|
Posted by
| Ked
Russia (524 posts) Bio
|
Date
| Reply #8 on Tue 07 Nov 2006 05:21 AM (UTC) |
Message
| You need either element.toprettyxml() or element.toxml(). The str(element) call just returns a generic description of the object but doesn't convert it to XML.
As for the filename, I am not sure what the problem is, but you could try to open the file for reading and feeding the resulting file object to parse() - maybe that will work better. | Top |
|
Posted by
| Nick Gammon
Australia (23,120 posts) Bio
Forum Administrator |
Date
| Reply #9 on Tue 07 Nov 2006 05:25 AM (UTC) |
Message
| I know you want to use Python and not Lua, however this post shows how I extracted the various fields, including the "send" field, using the internal XML parser:
http://www.gammon.com.au/forum/?id=7123
Maybe you could put that into a plugin, and set a variable, and then query that variable in Python. |
- Nick Gammon
www.gammon.com.au, www.mushclient.com | Top |
|
Posted by
| Rakon
USA (123 posts) Bio
|
Date
| Reply #10 on Tue 07 Nov 2006 06:25 AM (UTC) |
Message
| Thanks Nick, I'll look into that.
Ked, after I posted..yeah, relized it helps to open the file being called for reading :)
Though if I try to use 'toprettyxml' or 'toxml' as such:
send = trig.getElementsByTagName('send')
send = send.toprettyxml()
Returns
send = send.toprettyxml()
AttributeError: 'NodeList' object has no attribute 'toprettyxml'
I mean DOM is alot easier for working WITH the XML from what I can tell, and use but even if I try to use something like
trig.firstChild.NodeValue
Just to get what's inbetween the tags of <send> it returns an empty string, or <DOM Text node "
">.
And even when I try to use 'toprettyxml' on the above it returns 'AttributeError: 'unicode' object has no attribute 'toprettyxml'' or
'AttributeError: 'str' object has no attribute 'toprettyxml''
send = trig.firstChild.nodeValue
send = send.toprettyxml()
---
send = str(trig.firstChild.nodeValue)
send = send.toprettyxml()
Thanks for the help again! |
Yes, I am a criminal.
My crime is that of curiosity.
My crime is that of judging people by what they say and think, not what they look like.
My crime is that of outsmarting you, something that you will never forgive me for. | Top |
|
Posted by
| Ked
Russia (524 posts) Bio
|
Date
| Reply #11 on Tue 07 Nov 2006 07:29 AM (UTC) |
Message
| Erm, sorry about that. I forgot that getElementsByTagName() returns a list. In fact, it returns the exact same thing as the same call to get the triggers, so it should be treated the same. Try:
send = trig.getElementsByTagName('send')
send[0].toprettyxml()
That should do the trick, as long as the trigger actually has a "send" child. If it doesn't then you'll probably get an empty list, although I am not sure about that. | Top |
|
Posted by
| Rakon
USA (123 posts) Bio
|
Date
| Reply #12 on Tue 07 Nov 2006 07:01 PM (UTC) |
Message
| Hmm that works now, and thats strange, because last time I tried to do that exact same thing I received an error of 'Attribute has no function '__getattr__' or thereabouts. |
Yes, I am a criminal.
My crime is that of curiosity.
My crime is that of judging people by what they say and think, not what they look like.
My crime is that of outsmarting you, something that you will never forgive me for. | Top |
|
Posted by
| Rakon
USA (123 posts) Bio
|
Date
| Reply #13 on Tue 07 Nov 2006 08:12 PM (UTC) |
Message
| Mhmm....Always with the problems.
Alright, if I have the following script in a .py by itself, the commands and code work.
filename = eg.fileopenbox(msg=None, title='Open file...', argInitialFile=sys.path[6]+'\example.txt')
file = open(filename)
mydom = parse(file)
trigs = mydom.getElementsByTagName('trigger')
for trig in trigs:
match = trig.getAttribute('match')
send = trig.getElementsByTagName('send')
send = send[0].toprettyxml()
send = str(send).replace(" ","",1)
send = send.strip("<send>\n")
send = send.strip("</send>\n")
print 'Match on: ' + str(match) + '\nSend: ' + str(send) + '\n'
file.close()
And it reports the strings just fine, with the exception that it turns the standard ' " & and symbols into the " tags.
If I put them into a function in a different 'main' program, then I get the following error when it gets called:
...
parser.Parse(buffer,0)
ExpatError: syntax error: line 1, column 0
Now my question is why, in a script by itself the code will work fine and read the file, print out the results. But in the 'main' program, being called in a function it raises an error??
Is there way I can get it to work inside a function call? |
Yes, I am a criminal.
My crime is that of curiosity.
My crime is that of judging people by what they say and think, not what they look like.
My crime is that of outsmarting you, something that you will never forgive me for. | Top |
|
Posted by
| Ked
Russia (524 posts) Bio
|
Date
| Reply #14 on Tue 07 Nov 2006 09:04 PM (UTC) |
Message
| The error seems to imply that you don't have the XML string - same error as before, when you tried to parse a filename. Can you post the entire program, or at least the function this is a part of and the 'main' up to and including a call to that function? | Top |
|
The dates and times for posts above are shown in Universal Co-ordinated Time (UTC).
To show them in your local time you can join the forum, and then set the 'time correction' field in your profile to the number of hours difference between your location and UTC time.
62,701 views.
This is page 1, subject is 2 pages long: 1 2
It is now over 60 days since the last post. This thread is closed.
Refresh page
top