I guess encutils are obsolete as I found chardet (Universal Encoding Detector) today which actually is from last year. I assumed there would be a similar (and better) library to detect encodings but never found anyone so started encutils which is probably much inferior to chardet. Now everyone needing this functionality is better off with chardet whose author Mark Pilgrim is someone much more involved in these things. But I guess it was worth the effort writing my own lib and maybe I try to update some things off it with just using chardet…
March 18, 2006
August 23, 2005
Released even another version. Finally implemented (or better used a python cookbook recipe) for XML auto encoding detection (BTW, is that actually right licensewise, on the cookbook site I could only find a short note like “usage free”).
Of course I could have linked to the recipe but this way it is more convenient and I changed a small bit as well. In case no encoding can be “autodetected”
"utf-8" instead of
None is returned which is IMHO more to XML 1.0. I may be wrong of course but there will be quite a few versions of encutils to come anyway as there are a few todos left…
I wonder if anyone actually needs or uses this module anyway, for now it is more or less an exercise in understanding the spec dependencies, the whole encoding thing and maybe even improve my Python a bit (the
buildlog function is definitely a thing I will use for quite a few things as the logging module is nice but a bit complicated to set up for a simple log).
August 21, 2005
The one thing about Java I like are the javadoc files in a standard format, Python does seem to have a few more than a standard one but I guess that is ok.
I generated API documentation for both cssutils and encutils with epydoc. The inline documentation being in reStructuredText format anyway this was quite simple. I did not recognize before that epydoc (aside from its own markup) supports ReST markup , a simple command line switch or
__docformat__ = 'restructuredtext' in each file is sufficient (thanks to the docutils mailing list!). A few adjustments in the comments and a nice package of docs is done. Actually an editor supporting ReST while writing code would be very nice, maybe this is coming out of the docutils project sometime but I guess not in the neat future. Would I be a better programmer I would have a try but I guess it would take me years to do that… (and I should at least try to finish my own projects before that).
August 17, 2005
a new release which at least acknowledges a possible XML declaration of a document. No autodetection (of a possible e.g. BOM) is done yet. I wonder if the
guessEncoding function should do this anyway, I guess an optional parameter (like useBOM) would be better than enabling this by default. Depends what you want to do with the script (if anything . The specs should be followed and to the useBOM seems a good way
August 16, 2005
I was working on encutils and wondering if the algorithms to find the encoding from a HTTP header, XML declaration, HTML meta information and also default encodings for content-types are complete. It is quite complicated as quite a few rfc and specs are defining the problem. I hope I have these complete (probably not , maybe I should post it to xml-dev, or is there a better newsgroup/newslist?
July 4, 2005
I suspected there should be libraries in the Python lib that do what I was looking for in encutils and today I found them. I looked through “Python In A Nutshell” again and found at least mimetools.Message which I – at least partly – re-did.
I guess there is still much to learn and know, the standard lib does contain lots of stuff I suspect it would contain but have not noticed yet.
Well, at least I got some more experience writing stuff… (and maybe encutils contains a bit more than the stuff that’s already available).
June 26, 2005
Originally developed for an addition of the cssutils package this small helper set of functions sprung out. The whole issue of encodings relating to HTTP header information, Media-Types, encoding information in the files themselves like HTML meta elements or the XML declaration is a tricky business. I read the O’Reilly article XML on the Web Has Failed quite a while ago and actually used it as a starting point. I guess I missed lots of tiny bits in the array of specifications but hopefully this thing will prove useful.
I could not find any Python library doing this kind of stuff (I currently need HTML and XML only) so I thought thats a small enough project I might be able to handle. If there is a similar
library or so available please let me know but at least it was kind of fun to work on it.
If you find any problems, issues or spec violations please tell me too…