I’ve just moved an improved license validator into place . While I had initially planned to migrate away from straight-CGI coding with this update, the gods were against me. I started prototyping in Quixote , and ran into problems with Unicode encoding. So I decided to put the conversion on hold. If I figure out how to emit Unicode from Quixote, I’ll probably do this at some point in the future.
This update does, however, have some real improvements, all of which were suggested by the excellent testers on the cc-metadata mailing list. These include:
- Seperate forms for URL or paste-in validation; previously the URL always overrode the text area, which could be frustrating
- The addition of some sample RDF, as well as convenience buttons on the form
- Better error reporting. If your RDF (XML) is not well-formed, the validator will now tell you where the error occurs (if it knows); more on this in a moment
- When parsing a web page, problems in one RDF segment won’t affect validation of other segments
- You can now view the raw RDF from the results page by clicking the show raw RDF links
- Things that look like links, are
- Magnets are properly magnetized
- rdf:about is displayed
- If a RDF segment does not contain a license, the validator says so, and then tells you what it might contain
- Various cosmetic changes
With such a large number of changes, the source code layout has also changed. Instead of a single file, the code is now split into a handful of modules. I’ll have a tarball up later this evening.
A note regarding validation: currently the validator only “validates” as much as RDFlib does; I’m working on that. Further, if you receive some weird traceback, or an “unknown error” message, please e-mail me.
Thanks for all the suggestions and trouble reports; try it out and let me know what you think.
Thanks to the feedback I’ve received about the Creative Commons license validator, I now have a list of improvements and changes to implement. I’ve also decided that now is the time to move away from emitting plain-jane HTML with print statements, and move to a framework that actually helps me out. To that end, I’ve initially decided to use Quixote. I’ve used Quixote casually a few times, and think it fits for a few reasons:
- it’s lightweight
- it’s very code oriented: a function or module is a page
- it’s templating language is about as simple as you get
New things that will be implemented include better error messages, sample RDF in the form for instant gratification, and more useful summary of your RDF input. If you have any other suggestions, please let me know.
It seems like all I’m writing about here lately is ccValidator and ccLicense, but I suppose that’s OK. After my announcement this morning, it was pointed out that the validator didn’t exactly handle Unicode properly. So, some more regexing (for the XML encoding) and print "foo".encode(encoding) action, and tah-dah! New ccValidator , now with a gooey Unicode center.
Thanks, “Maxime”:http://www.organigramme.net/; keep those bug reports coming.
Based on yesterday’s excellent feedback, I’ve updated the ccValidator code to it’s new, improved version. Things fixed include:
- handling of large RDF blocks
- handling of work information is largely improved (multiple works are supported, sub-segments render properly, etc)
- tweaks to handle changes made in ccLicense.py
So go on, validate your license, and let me know if you run into any problems. I still need to wrap the exception handling better so a CGI traceback doesn’t spew when something doesn’t validate, and would like to modify the validation script to use some sort of templating (instead of print statements). Suggestions for a good, Python-ic templating solution?
Shortly after announcing the validator to the cc-metadata list this morning, I had my first bug report. In my defense, it’s not even a bug in my code, per say, but some invalid metadata. However, there were several discoveries I made that I’m working to roll into both ccValidator and ccLicense.py. In no particular order:
- Python 2.2 seems to have a broken re module which barfs on really long matches. 2.2.3 and all the 2.3 series seem to have this fixed (note: this is just my observation, if anyone can confirm/correct this, feel free)
- ccLicense.py returns incorrect results if the RDF block defines more than one work and…
- …I was trying way to hard to parse the work meta-data; some simple TripleStore action will do fine, thank-you
- And finally, the way I extracted the licenses was also a little embarassing.
So I’ve fixed the last three and my webhost is graciously upgrading Python as we speak (I hope), so a new, improved, working better than ever version of the Validator should be ready real soon now. And I’m just kicking myself that I didn’t think of the last 3 issues sooner.
Well, what good is a weekend not spent coding? In this case, I managed to hack together a very rudimentary RDF parser/validator specialized for Creative Commons licenses. You can find the web interface here and the sourcecode for the CGI here. The Python module which does most of the heavy lifting, cclicense.py, will be up soon. I’d love your feedback on any or all of it.
Right now it can parse raw RDF or retrieve a URL and scan it for RDF (using a simple regular expression). In either case it parses it and spits out the licensing and work information it finds.
It still needs some work when it comes to parsing work description information, especially with sub-elements (like Agent’s). I’d also love to hear suggestions for improving the output mechanism; it currently runs as a simple CGI, so the result page’s HTML is manually emitted with print statements. Any suggestions for making this work smarter?