see whatever…

jump to menu

March 2, 2008

Namespaces are bad but just not bad enough?

Filed under: RelaxNG,XML — see @ 1:15 pm

namespaces are one of the arguments people keep pushing when they want to reject XML (as whole or not). That was the first argument Dave Winer threw on RSS 1.0 back in 2000 and last year at XTech 2007, this was again one of the main arguments the WHATWG threw against XHTML 2.0. And people keep doing that because they’ve noticed that we can’t seriously deny that XML namespaces are insane and because we’ve written it many times in the past

–Eric van der Vlist on the xml-dev mailing list, Tuesday, 12 Feb 2008 20:31:19


Every now and then people seem to argue against namespaces but it seems there is no better alternative just yet:

Compare with (W3C) DOM: DOM is bad, really bad, and so people went and invented something better: JDOM, dom4j, XOM etc for Java; Amara, ElementTree for Python; see your favorite XML tool for your language and even DOM based wrapper libraries like jQuery (which does more but can be seen as a DOM replacement) for Javascript used in Browsers which only have access to the DOM.

Compare to (W3C) XML Schema: XML Schema is at least not what it has been hoped for, for some people just a failure. So people went and invented something better like Schematron, Examplotron etc and probably most importantly Relax-NG which is getting more and more support in the last years.

But namespaces? Maybe I am being ignorant to alternative propositions but it seems no one came up with a better solution yet. Of course namespaces are a bit more core XML then DOM or XML Schema but hey, people even suggest improvements for XML 2.0 (see the current discussion on xml-dev).

So I guess namespaces are just not bad enough ;)

February 1, 2008

coincidence or pre-pycon release wants?

Filed under: CSS,cssutils,Python,XML — see @ 7:40 pm

Maybe just coincidence but at least two Python libraries (or tools) I frequently use just got or soon get a new release

  • Epydoc, a Python package documentation tool just released v3.0
  • lxml is close to 2.0 (2.0b2 is out and final 2.0 should be out latest next week)

maybe coincidence but maybe also because Pycon is coming up…

On a side note, cssutils will probably have a new release in the next week or two as well ;-)

UPDATE: lxml just got a definite 2.0 release.

HTML5 or XHTML 2.0

Filed under: Web,XML — see @ 2:01 pm

Very interesing notes by Eric van der Vlist about HTML5 versus XHTML 2.0. I did not partake in the WHATWG (which would be an option – I know) nor do I know both specs in every detail but still I think I may have an opinion on the matter as I did read quite a lot about both and as a web developer have quite a lot of experience with HTML 4 (and it s obvious shortcomings of course) but also XHTML and even XHTML 2 which I used for a little project just to see if it is usable.

There are a few areas where both specs basically want the same, like removing older and unneeded, unused or misunderstood elements or attributes so both try to clean the cruft off HTML4.

I do think XHTML 2.0 seems a bit better in its concept e.g using @role for certain elements instead of HTML5s approach of introducing quite a few new elements so having a better forward compatible notion I guess (Erics comparison with Docbook is very convincing).

On the other hand HTML5 at first seems easier for authors as it does not use the strict XML syntax XHTML wants. (Also HTML5s Webforms are not as revolutionary as XForms, that probably is for another discussion).
What I did not know is that this strictness of XML is not specified as as strict as implemented currently (see the article for a nice overview). So it seems thinkable to have an XML syntax for XHTML 2 which is usable for almost everybody and not just the <irony>strange people</irony> (like me) who actually care about standards…

All in all I don’t think one of both specs is much better and should win (like HD-DVD versus Blu-Ray ;) ) but one could think of a mixed vocabulary taking the best of both ideas. Just some ideas:

  • use XHTML 2 <section> and <h> and @role instead of HTML4 <article>. <section> is in both I think but there are too many elements in HTML5 already (are <aside> , <header>, <footer> or <dialog> really necessary?)
  • use HTML5 video/audio elements (<object> does seem to have failed somehow).
  • use HTML5 forms but make XForms available in its own namespace (putting XForms into XHTML 2 is not the best idea but this way it is available if wanted or needed). HTML5 forms could then be as default, XForms if you really need the extra power of XForms. Maybe HTML5 forms could be made a bit simpler in this case – <datagrid>? (please don’t take this point too hard, I know XForms better than WebForms)
  • use the XML syntax and DOM and therefor do not loose the ability to treat HTML5 with all the XML techniques and libs available: XPath, XSLT, XQuery, adding other vocabularies via namespaces like RDF, Microformats etc
  • use XHTML2 meta data ideas (the simple triples) which I think are a great way to add semantics but simple (compare RDF…)
  • make the resulting format be parsed in browsers with a more forgiving (but well defined in the scope of XML or even more defined) parser and behaviour

I think the main point that Eric van der Vlist article for me was that it is impossible to imagine all possible use cases or usages for a format during specification so the specification should be as general (not the right term, I can’t think of one now) as possible. This is one area where XTHML 2 is superior and HTML5 seems to look on the uses of the past without making it easy enough to cope with future enhancements. (BTW, as far as I know HTML5 is based on statistical information about e.g. which CSS classes are used most often. This is very important but maybe this point has had too much influence on some of HTML5 decisions? – no personal offense meant!).

I do not think XHTML 2 is perfect so I really think the result should be a mixed vocabulary with a mixture of the best ideas from both specs.

December 17, 2007


Filed under: Web,XSLT — see @ 11:22 pm

Noticing the the Django book is final I began reading it today (I wanted to look into Django for some time now but never really came to it). Anyway the book is quite nice, easy reading with simple examples (a bit too targeted to beginners which I guess I am not really anymore).

To try the examples I downloaded the newest Django release (0.96.1 I think), untared it and started Funny enough this simple step failed… After looking into the source of I noticed this line: package = dirpath[len_root_dir:].lstrip('/').replace('/', '.'). Being on Windows this of course failed and I changed it to ‘\\’. I guess one should use os.path.sep in this case so I decided to add a proper bug report to the Django Trac. Again this simple idea failed as my short post was not allowed as the system thought of it as spam :(

So I gave up on this. Two simple things failing does demotivate me…

But I started reading the book anyway and tried the examples too (after the hacked install ;) ). Until Chapter 4 where I am currently it all makes sense (should do with Djangos good reputation). I am not quite sure if another templating system was needed though. I still prefer XSLT for templating as it gives me total freedom about the HTML and if the source XML is based on XHTML it is not even difficult to write or understand. A few well-placed added elements (in another namespace) do all what I want in theses cases (e.g. a menu renderer). Anyway Djangos templating and its philosophical background seem to make sense but look like JATS in the end.

As a side note, I use at work for a small project at the moment. It works very well and XSLT usage was very simple to add. I guess Django is better for more complex apps but is fine for simple stuff. Documentation is lacking though and I don’t think there will be any “web.pybook” soon ;)

Another side note: Another project at work uses Wicket (Java Framework) which is not too nice for/to web developers, too difficult to really control HTML and Javascript. I do not have to do the backend here so that’s ok but still and also Django look much much nicer and easier (no surprise really but anyway…).

August 19, 2007

Parsing XPath…

Filed under: Java,Python,XML — see @ 8:00 pm

… not using XPath on XML but actually parsing the XPath itself. I did a quick googling and about one of the few libs there seem to be is SAXPath which is part of Jaxen, the Java XPath lib also bundled with XOM. While I do like XOM and try to use it whenever I need to use XML and Java I was actually looking for an XPath parser for Python. Of course I could use SAXPath with Jython, but I could not find a XPath parser for pure (C)Python.

A basic splitter simply splitting an Xpath in its steps would be sufficient for the beginning so I thought so write my own. This basic functionality should not be too hard but in the end not too useful either… A complete XPath parser would probably be very hard to do, even XPath 1.0 is not the easiest, XPath 2.0 support would even be better.

So I am still wondering though if there is a library (or a part of a library like 4Suite, libxml (or better lxml) which does not seem to do this on first sight…) which does  XPath anyway and would emit the actual parsing of the XPath itself?

April 19, 2007

beautiful XML

Filed under: RelaxNG,XML,XSLT — see @ 8:37 pm

Completely my view too: beautiful XML. Maybe XQuery is not ugly either, but I do not know enough to judge this one.

March 3, 2007

Ajax history

Filed under: Javascript,Programming,XML,XSLT — see @ 12:10 pm

I added Ajax functionality to parts of my showcase website namely to most of the galleries (like the photo or portrait gallery) a while ago. The galleries are build using a simple homegrown gallery XML format which is rendered with XSLT. Adding the Ajax functionality was not very difficult, I use the fine JQuery library for that. (I still have to migrate my own scripts of the past years to use JQuery which should make most of them obsolete but also easier to use and also much shorter ;) .

Adding a working browser history to it was not as easy as expected though. I simply did not understand how to bring the available JQuery history plugins to work. The whole matter is not easy but not too difficult and to better understand what is going on I decided to build my own history (but still using JQuery of course).

First try was to use the URL hash method which does work perfectly on Firefox, but not on IE (for reasons query Google for “ajax history”, several interesting sites describe the problems much better than I can). The “hack” using dynamic built iframes does work xbrowser (at least Firefox and IE, Safari is a different beast [as far as I understand no working possibility yet at all], I did not test Opera yet) so I changed the implementation to use iframes instead.

It does work now, only missing bit is bookmarking which would be possible if I added the URL hash in addition to the iframe hacking. I may add it in the future, it is not a very important issue for the use case on the galleries though as they are too simple to actually need bookmarking on single works in one gallery (there is always an overview). More applicationary sites would need that though.

BTW, during testing I found a bug (at least I think it is one) in JQuery which was very hard to track down. Most people probably won’t be affected by it but for people using REST it might be quite relevant.

Basically IE below version 7 seems to use POST for all Ajax requests if initialized with the wrong ActiveX control. The servlet I wrote for my galleries implements GET only so it resulted in a “Method not implemented” HTTP error which I never saw before (as most servlets use the same implementation for GET and POST, which is even recommended by most books but maybe it is not too intelligent anymore) and did not know where it may came from. Tracking it down was not easy as I normally use all the fine Firefox plugins like Firebug or LiveHTTPheaders which simple are of no use debugging IE ;)
But I came accross Fiddler some time ago which actually tracks all HTTP traffic on a PC. So after checking that I saw that IE used POST for its Ajax requests which of course failed on a servlet implementing GET only…

November 12, 2006

IE and apos character entity

Filed under: Markup,Web,XML — see @ 8:08 pm

I never thought about the character entity references too much. For XML I assumed at least the 5 predefined ones for character & <> ” and ‘. Naively I assumed this for HTML as well (plus the usual other ones used for years like entities for german umlauts etc).

But it seems I was wrong. ' which is the character entity for ‘ is only defined for XML but not HTML. It is used so seldom that it never struck me until I did some tests on IE 7 today. On my GIVE-A-WORD feature on my personal site I needed to escape ‘ as it is used in Javascript parts in element attributes which use ” as delimiter. So I had escaped ‘ with &apos; for quite some time now. It seems though that IE does not know this char ref and simple outputs the complete string… serving the exact same thing as application/xml does tickle IE into XML mode which then knows the ref. Really strange and unnecessary too. Solution would be to use a numeric reference, might do that or simple ignore this case which will happen rather seldom anyway.
For more details see e.g.


Not easy to write this post in WordPress BTW, the HTML editor keeps changing the entities or does weird things with them…

June 20, 2006

allowed characters in XML

Filed under: XML — see @ 9:36 pm

I never really thought about which characters are allowed in XML. I vaguely knew NUL was not but that was all I cared about maybe because I never had a problem (I mostly do XSLT on given XML so no real problem there).
At work we now had a problem with “XML” documents (built with dom4j) which contained characters #28 and #31 probably by copy/pasting text from PDF documents in the XML generating application (and therefor not really XML at all). dom4j never complained generating the XML but of course complained when parsing the resulting document “thing” again as most character between #00 and #32 (+ #127 but except #09 and #13 (which I used in XSLT several times to generate nicer looking text output)) are not allowed in XML at all.

Until now I never had these problems maybe because I used libs like XOM which are smart enough to know when building the XML that these characters are not valid. To be sure I looked in my XML in a Nutshell and found the above information.

I told my coworkers better not to use dom4j some time ago for several problems we encountered in the last year. So another problem added to that pile.

BTW I re-read an interview with Elliotte Rusty Harold on Artima which exactly explained this problem. After that I did a quick test with the simple minidom implementation for Python. Same problem there, generating the XML goes though but results in a broken document which is no longer parsable.

Got to be aware of these problems, luckily most of the time I vaguely remember something like the above interview ;)

May 17, 2006


Filed under: XML,XSLT — see @ 8:55 pm

I guess I use XML as a domain specific language in most cases. Processing with XSLT simply is very convinient. The topic came to my attention while reading an article about DSL (processed by Java) in german Java magazine.
Not specific to DSL but I wonder where the advantages of using Java generally are, I speculate hardly anyone will be using Java in 5 years anymore? All seems to be complicated as Java is still growing from a huge thing to a giant mass…

Older Posts »

Powered by WordPress