ZopeMag's mascot the ZOPE fish


Article Finder
People
Issue 9 - Revision 8  /   February 7, 2005 


 
  ZopeMag Links:
Latest Issue
About the Fish
Issue 10
Issue 09
Issue 08
Issue 07
Issue 06
Issue 05
Issue 04
Issue 03
Issue 02
Issue 01
 
 
Downloads
     
  Letter from the Editor:


Interviews:
Each issue we interview important people in the Zope world.

  Joel Burton

Articles:
Throughout the quarter we cover topics of interest to Zope developers, designers, and users.

  Improving WebDAV in Zope

  Profiling Zope (Part II)

  Redesigning the portal with CPSSkins and CPSPortlets

  Zope and Flash

  Localization (Part I of II)

Product Review:
Too many Products, too little time? ZopeMag keeps you up-to-date which Zope Products are worthwhile checking out.

  Corp Calender
  BastionLedger


Book Review:
Thanks to a growing subscriber base we can now offer even more to our readers. Zope and Plone Book Reviews!

  The Definitive Guide to Plone


Guides:
This quarter we bring you a new SuperGuide. Our miniGuides and SuperGuides give you the background knowledge you need to mastering Zope.

  miniGuide to writing Zope 2 Products
 
 
Downloads
     
  URLs / Download
Products we talk about in this issues Articles and Reviews

     


Improving WebDAV in Zope and Plone
-Leveraging Standards to Improve the WebDAV experience... NOW!
- - - - - - - - - - - -

By Sidnei da Silva  | September 20, 2004

print

____
 
 
Sidebar - Definitions
WebDAV: WebDAV stands for "Web-based Distributed Authoring and Versioning". It is a set of extensions to the HTTP protocol which allows users to collaboratively edit and manage files on remote web servers. With the right tools editing documents can be as easy as editing a file on your local machine and you don't have to use a Web Browser.
Schema: A schema in Archetypes is simply the abstraction of the properties akin to your object. It is built using field objects. Each field is responsible for representing one type of data, such as an integer with IntegerField, a string with StringField and so on.
Dublin Core Metadata Initiative (DCMI) : is an open forum engaged in the development of interoperable online metadata standards which support a broad range of purposes and business models. DCMI's activities include consensus-driven working groups, global workshops, conferences, standards liaison, and educational efforts to promote widespread acceptance of metadata standards and practices.
 
____
Introduction

New tools are continually being created to enable new features and faster development of the most various kinds of applications. Obviously, the main target here is the browser interaction. Plone has proven to be especially suited for this niche, being widely recognized as one of the best Open Source Content Management Systems out there.

However, Content Management is not only about being able to control your content. The Web browser, though a universal tool that is available on the most varied platforms, enabling people to reach and control their content from anywhere, is not rich enough to match existing tools with years of acceptance and tight desktop integration. This is the domain of Document Management, which is increasingly converging with Content Management.

The WebDAV protocol was created for the purpose of integrating Web content into desktop authoring environments. WebDAV stands for “Web-based Distributed Authoring and Versioning”. It is a set of extensions to the HTTP protocol which allows users to collaboratively edit and manage files on remote Web servers. WebDAV has been present in Zope since the early days as a supported way of interacting with content, along with FTP and XML-RPC, but it has been mostly ignored due to the above-mentioned reasons.

Providing a reasonable WebDAV experience is not a Herculean task. However, it requires some preparation and thought. Care must be taken not to break backwards compatibility. All the contracts must be clear and predictable; moreover, the scope must be clear so that we don't get stranded in overengineering and endless discussions.

Over the last couple of months, while working for Enfold Systems, I've developed a couple of tools to improve and provide better control over the process of creating and modifying content via WebDAV. In the following pages I will explain the goals of each of these tools and discuss how they can be used together to build a better WebDAV experience by employing existing standard technologies and libraries, instead of reinventing the wheel.

3C: Controlling Content Creation

The first place where control is desired when using WebDAV is having a consistent behavior when you upload a piece of content. What kind of resource should be created? What is the format? How should the content be extracted from the upload? Is the metadata available in the upload?

For this purpose, a tool exists in CMF which decides what object should be created given an uploaded file. This tool is called “Content Type Registry” and consists of a series of “predicates” that are applied in sequence until one of them matches certain aspects of the upload request. When a match occurs, the “portal_type” associated with the matching predicate is used to create a new content object.

This tool is available on stock CMF and Plone sites, and can be found at the root of the site under the “content_type_registry” id. Though the UI for controlling the predicates is not optimal (and, in fact, can be extremely confusing), a lot of power is hidden there, and with the right configuration it can save you a lot of sweat.

Each predicate is composed of a configurable “rule” which maps to exactly one “portal_type”. By default, the available predicates that you can choose from are the following:

Figure 1: Content Type Registry inside a Plone Site

  • major_minor: Matches on major and minor parts of a content type header. Examples: “text/html”, “text/plain”, “application/pdf”. You can select just the major part (the one before the “/”), both parts, or none. If you select just the major part, the predicate will apply to anything which starts with it. If you select none,
  • extension: Matches on the filename extension of the uploaded file. Examples: filenames ending with “.gif”, “.jpg” and “.png” are matched to the “Image” portal_type.
  • mimetype_regex: A regular expression is matched to the content type header. This is very similar to “major_minor”, except that it can match a wider range of content types.
  • name_regex: A regular expression is matched to the filename. This is similar to the “extension” predicate, except that it can also match to the part of the filename before the extension. Example: filenames of the form “doc[0-9].html” are matched to the “Document” portal type.

As you can see, these predicates can be very useful for simple cases, where you have an “opaque” piece of content being uploaded and your tools behave accordingly.

However, one may want extra flexibility, such as having the ability to control the type of content being created based on some value inside the file being uploaded. This is especially true for XML files, where you may want to encode the content type to be created into a special tag, or even select a different content type based on the presence of a tag or not.

Figure 2: Illustration 2Content Type Registry interface, showing the new predicates added by the CTRExtras product

For such special needs, a new product has been created to provide an additional set of predicates. This product is called “CTRExtras” and provides the following new predicates:

  • html_meta_headers: Matches to a tag “<meta>” tag content to be mapped into a content type. Example: <meta name="Type" content="Document" />;
  • rfc822_headers: Matches to an RFC822-style value to be mapped into a content type. Example: Type: Document
  • xmlns_predicate: Matches to some combination of element namespace, element name, attribute namespace, attribute name and attribute value to select the content type. Examples: <type>Document</type>
  • <type xmlns="http://cmf.zope.org/namespaces/default/"> Document</type>

    <metadata xmlns="http://plone.org/ns/archetypes/" xmlns:cmf="http://cmf.zope.org/namespaces/default/" cmf:type="Document" />

    <metadata type="Document" />

    <metadata xmlns:cmf="http://cmf.zope.org/namespaces/default/" cmf:type="Document" />

Of the examples above, certainly the most interesting one is that for XML. This kind of flexibility is desired, indeed required, given the nature of XML itself: it is possible to express a piece of information in several different ways.

Importing and Exporting

Though many people come to Zope from other CMSs, or from no CMS at all, very little effort has been put into creating a standard way of bulk-loading existing content and pre-populating content objects from existing data. It's common practice to build one-off tools that only solve the customer's immediate problem. A similar problem arises when one has to export content from Zope to an external system, either for integration or for doing off-band publishing directly from Apache.

The biggest block to enabling a structured approach to importing and exporting content has been the lack of an introspectable content definition. With the advent of Archetypes, this situation has improved considerably. Most projects are switching existing development to use Archetypes, and it's now uncommon to see a new project which is not based on Archetypes.

Bearing this in mind, it is clear that having a pluggable import/export framework for Archetypes has a good chance of bridging the gap and defining the next standard for content import/export in Zope.

Figure 3: Changing an XMLNS Marshaller Predicate in the Marshaller Registry tool

Archetypes has several layers of indirection for abstracting storage from presentation and for allowing for high flexibility and pluggability of new components. One of these layers is called “marshall”, and it's used mainly for handling the uploading and downloading of content through WebDAV and FTP. Out-of-the-box, Archetypes provides two Marshaller implementations:

  • PrimaryFieldMarshaller: Relies on the presence of a “primary field” for import and export. When a file is uploaded, the content of the file is set as the value of this “primary field”, untouched. When a piece of content is downloaded, the contents of the primary field will be returned, also untouched.
  • RFC822Marshaller: Also relies on the presence of the primary field, although this is not required. It takes the values of all fields in the schema and serializes them into an RFC822-like text file, which is then downloaded. When uploading, it tries to parse the file and set field values as field names matched with RFC822-like fields.

Currently, Archetypes allows one Marshaller implementation to be used at a time and doesn't try to do any check to see if the file being uploaded complies with the Marshaller being used. For example, if you are using an RFC822Marshaller and upload an HTML file, it's very unlikely that something useful will happen. In fact, there's a high chance that the only thing you will get is an exception.

In order to improve the use of this feature, the “Marshall” product has been developed. “Marshall” provides the following features.

  • A “ControlledMarshaller” implementation that delegates to a tool the decision which Marshaller will be used for handling an upload or download request.
  • A “Marshall Registry Tool” where you can configure which Marshaller will be used depending on a series of conditions.

The Marshaller Registry approach is quite similar to the “Content Type Registry”. Several predicates are configured, and then evaluated sequentially until the first one matches. When there's a match, a Marshaller is fetched and processing is delegated to it.

In addition to the above-mentioned features, the Marshall product also provides two predicates. Each predicate by default has a “TALES Expression” that is evaluated to decide whether the predicate applies or not. The predicate may also be additionally configured for rules that may apply. The only thing required from the predicate is the implementation of the “IPredicate” interface. A couple of variables are defined, which are to be used by the TALES Expression:

  • object_url: You guessed it: the object's absolute url
  • object: The object itself
  • nothing: A.k.a. “None”
  • user: The currently logged-in user
  • modules: Same one as in Page Templates
  • request: The current REQUEST
  • mode: Either “marshall” or “demarshall”
  • filename, content_type and data: The filename, content type and data of the file being uploaded

The predicates available by default are:

  • Default Predicate: The simplest predicate possible. A TALES Expression to be evaluated, which maps to a Marshaller implementation.
  • XMLNS Element/Attribute: An XML-geared predicate, highly based on the “xmlns_predicate” from “CTRExtras”, and with exactly the same semantics, except that it maps to a Marshaller implementation instead of to a portal_type.
ATXML: Towards a standard format for exchanging Archetypes data/metadata

During my short history of development in the Zope community, I've been constantly surprised by the attitude of most developers when they are faced with a request for developing new features. Their gut reaction, nearly 90% of the time, goes like this: “Nothing existing fits the job. Let's write something from scratch.”. In doing this, they usually choose non-standard ways of dealing with the problem.

While this might solve the immediate problem and give the developer a warm feeling of inner satisfaction, in the long term it just yields a complex of gnarly code and tons of different tools that don't play well with each other. More than that, it ends up causing more pain down the road, when a person who didn't originally develop the system is recruited to provide maintenance for the system.

As you can imagine, besides having to understand the gnarly code, the person also needs to figure out a non-standard format without any documentation: as a result he may end up himself rewriting large parts of the system from scratch before even trying to understand where the real problem is located.

I had this issue in mind for a long time when, to my surprise, Paul Everitt came up with the idea of defining an XML-based format, if possible with “RelaxNG” validation, which could be used for importing and exporting content, specifically Archetypes-based content. He had a couple of requirements, such as being able to roundtrip content while keeping “UIDs” and “references” intact and being able to make non-UID-based references.

The first suggestion was to make use of RDF, and though this initially seemed like a great idea, it quickly got dropped because of our lack of immediate knowledge and the fact that we would have to introduce yet another library dependency, which was not desired in the short term. The fact that we expect this to be available by default in the next major release of Plone is yet another reason not to add more dependencies.

If you are familiar with RelaxNG or any other kind of XML validation in general, there's a reasonably complex issue involved here, just waiting to be uncovered. Archetypes schemas can arbitrarily define fields, and a field can have one of many different types. And it gets even more tricky: a field may be validated using several validators only available in the Python form, and which we cannot introspect or easily map into an XML schema form.

Thus, the interim solution we've come up with was the following:

  • Define a single base RelaxNG schema for basic validation
  • Use existing vocabularies where the mapping applies, so that we have some level of interoperability with external systems
  • Provide a registry of RelaxNG schemas to Archetypes schemas so that every schema can be validated to a more fine-grained level
  • Be able to export an Archetypes schema to a RelaxNG schema to support the above
Figure 4: Editing an ATXML document in Emacs, using nxml-mode and the ATXML RelaxNG Schema. nxml-mode shows the valid tags for the current context.

It's obvious that mapping such a complex entity as an Archetypes schema to RelaxNG is not a lossless conversion. However, extra care will be taken to make the loss as low as possible.

At this moment, we have not yet started on the RelaxNG schema export for the registry.

One of the ideas we have in mind for the RelaxNG registry is to use an extension of the format used by James Clark's nxml-mode for the feature known as Schema locating file.

Deconstructing ATXML

The ATXML RelaxNG is relatively simple, yet still powerful given the recursive nature of RelaxNG. We tried to map the most commonly used fields to the “Dublin Core Specification for XML”, as CMF by default provides a working subset of the Dublin Core set.

For Archetypes references, we tried to be flexible and allow one to use 3 distinct kinds of references:

  • UID references, when you know the UID for the target element
  • Path references, when you don't know the UID, but you do know the relative path within the site
  • Metadata references, when you know neither the UID nor the path, but you know some metadata that might uniquely identify the piece of content within the site.

UID and path references are resolved directly at runtime, when the file is being processed. Metadata references are resolved by doing a Catalog query, and if there's exactly one match, the reference is made by using the object found. If zero matches or more than one match is found, an exception is raised, signaling the error.

Looking at the schema definition you probably notice (if you have some basic knowledge of RelaxNG) that pretty much everything is optional. This is especially nice in case you want to make bit-sized updates using XML, such as updating only one field or a small set of fields without having to provide values for the whole schema.

The possible elements have been split into 3 main groups:

  • DublinCore A subset of the Dublin Core specification that we support. Note that we used XSD schema datatypes to do validation of element contents as well. The Dublin Core Specification for XML says that any Dublin Core element may be specified multiple times. We extended this to ArchetypesFields and during processing multiple values are collected either as lists, or as strings joined by a new line, depending on the target field.

  • DateInfo CreateDate and ModifyDate, expressed using the Adobe XMP namespace. We've decided to use these instead of the suggested DublinCore elements because they provide a more concise and stricter date format.

  • ArchetypesFields Archetypes fields that do not fit into one of the two categories above. They have an “id” attribute which maps to the field name, and the field value is expressed as the content of the element tag, with the exception of ReferenceFields .These offer a choice instead of expressing the reference using either “uid”, “path” or “metadata” elements, where the “metadata” element is a recursive definition allowing you to use the same format as the whole file, as if you had included an excerpt from an external file.

Exposing extra features through WebDAV

When interacting with Zope exclusively through WebDAV, one may want to retrieve extra information, such as workflow state, permission settings, Dublin Core metadata, etc. By default, Zope doesn't make this information available through WebDAV. However, given the way WebDAV is implemented in Zope, especially PROPFIND requests, it was relatively simple for us to build a framework that simplifies exposing those features.

When Zope's WebDAV implementation receives a PROPFIND query it looks for an object's “property sheets” (which are by default contained in a “propertysheets” attribute, which is an instance of “OFS.PropertySheets.DefaultPropertySheets”) to build a proper response. The property sheets implementation then computes a list of existing property sheets on the object, extending this list with a “DavProperties” instance, which is a “virtual” (in the sense that it's non-persistent) property sheet created at runtime.

Based on this analysis, we decided that it would be trivial to patch this mechanism in order to provide additional “virtual” property sheets which would then be exposed through a PROPFIND query.

To make this even more powerful, and to add some level of control as to how/when the properties are made available, a new tool named “property_set_registry” was created. The tool itself is nothing more than a container for “Property Set Predicates”, which is evaluated to build a list of “virtual property sheets” to be made available for a given object.

Figure 5: Showing the Property Set Registry tool inside a Plone site

Each predicate has a permission, to be checked in the context of the target object, and a TALES expression. If both the permission and the TALES expression evaluate to True, then the predicate applies to the given object, and a list of “PropertySheet” instances is returned. In order to achieve maximum flexibility and a minimum set of dependencies, these features were made available in a standalone product called “PropertySets”. The PropertySets product provides only the basic framework and a dummy “PropertySetPredicate” as an example. It works with a bare-bones Zope installation. An additional product has been created for providing a working example of how to use the framework. It is called “CMFPropertySets” and offers the following features:

  • An abstract “DynamicPropset” implementation that makes it easy to create new “virtual property sheets” by subclassing and overriding some methods and attributes.
  • A “DublinCoreProperties” implementation, which exposes the Dublin Core metadata for objects which implement the “Products.CMFCore.interfaces.DublinCore.Dublincore” interface.
Figure 6: Adding a Property Set Predicate to the Property Set Registry

  • A “DCWorkflowProperties” implementation, which exposes the “catalog variables” for an object which has an associated “DCWorkflow”.
Show me the code

For our WebDAV project, we made some improvements to Greg Stein's davlib. Our improvements include the following features:

  • Add a per-thread callback function to allow for notification of progress when sending a request
  • Preferably use streams when sending requests.
  • Add some error handling to be able to distinguish between “PermanentFailure”, “TransientFailure”, “TimeoutFailure” and “AddressFailure”.
  • Quote URLs passed in to davlib.

Let's go through a complete example showing Zope's behavior with a standard Plone site, after installation and configuration of the following products:

  • PropertySets
  • CMFPropertySets
  • Marshall
  • CTRExtras

>>> from WebDAV import davlib

First, let's create a WebDAV connection to the
Zope instance running on localhost:8188


>>> conn = davlib.DAV('localhost', 8188)
>>> conn.setauth('admin', '123')

Define a couple of helper methods

>>> def get_prop(conn, url, prop, depth=0):
...    r = conn.getprops(url, prop[1], ns=prop[0], depth=depth)
...    r.parse_multistatus()
...    values = []
...    for resp in r.msr.responses:
...      values.append(resp.propstat[0].prop[prop].textof())
...    return values
...
>>> def objectIds(conn, url):
...    prop = (u'DAV:', u'displayname')
...    return get_prop(conn, url, prop, depth=1)
...

List the object's ids in a folder

>>> objectIds(conn, '/plone/Members/admin')
[u'admin', u'index_html', u'.personal']

Create a file-like object with a reasonable size

>>> from cStringIO import StringIO
>>>
>>> s = StringIO()
>>> s.write(1024*30*'xXx')


Define a callback object that prints out a status percentage

>>> class Callback:
...    def __init__(self):
...      self.reset()
...    def reset(self):
...      self.steps = []
...      self.last_total = None
...      self.last_position = None
...    start = exception = reset
...    def __call__(self, total, position):
...      self.steps.append(100.0 / total * position)
...      print '%.2f%%' % self.steps[-1]
...      return True
...

Make the connection use the callback

>>> conn.set_callback(None); conn.set_callback(Callback());

Upload the file to ``test.dav``. Print some
status percentages every 16,384 bytes (see: BLOCKSIZE on davlib.py)

>>> conn.put('/test.dav', s)
17.78%
35.56%
53.33%
71.11%
88.89%
100.00%
<WebDAV.davlib.DAVResponse instance at 0x30077d28>

Define a helper function for getting the review state


>>> def review_state(conn, url):
...    prop = (u'http://cmf.zope.org/propsets/dcworkflow',
...    u'review_state')
...    return get_prop(conn, url, prop, depth=0)
...

Get the review state for the user member folder index_html

>>> review_state(conn, '/plone/Members/admin/index_html')
[u'visible']

Define helper functions for getting/setting the subject

>>> def subject(conn, url):
...    prop = (u'http://cmf.zope.org/propsets/dublincore',
...    u'subject')
...    return get_prop(conn, url, prop, depth=0)
...
>>> def set_subject(conn, url, value):
...    prop = (u'http://cmf.zope.org/propsets/dublincore',
... u'subject')
...    set_prop(conn, url, prop, value)
...

Get the current subject...

>>> old_value = subject(conn, '/plone/Members/admin/index_html')
>>> print old_value
[u'']

Set a new subject...

>>> set_subject(conn, '/plone/Members/admin/index_html',
...    'WebDAV\nTutorial')
>>> subject(conn, '/plone/Members/admin/index_html')
[u'WebDAV\nTutorial']


And then restore the old subject...


>>> set_subject(conn, '/plone/Members/admin/index_html',
...    old_value[0])
>>> subject(conn, '/plone/Members/admin/index_html')
[u'']

Get the ``left_slots`` property for the Plone site...

>>> prop = (u'http://www.zope.org/propsets/default', u'left_slots')
>>> old_value = get_prop(conn, '/plone', prop)
>>> print old_value
[u'\nhere/portlet_login/macros/portlet\nhere/portlet_recent/macros/
portlet\nhere/portlet_related/macros/portlet']

Set a new value for ``left_slots``...

>>> value = ('here/portlet_recent/macros/portlet\nhere/'
...    'portlet_related/macros/portlet')
>>> set_prop(conn, '/plone', prop, value)
>>> get_prop(conn, '/plone', prop)
[u'here/portlet_recent/macros/portlet\nhere/portlet_related/macros/portlet']


And then restore the old value

>>> set_prop(conn, '/plone', prop, old_value[0])
>>> get_prop(conn, '/plone', prop)
[u'\nhere/portlet_login/macros/portlet\nhere/portlet_recent/
macros/portlet\nhere/portlet_related/macros/portlet']

Create a 'Press Release' object, setting some metadata

data = """\
<?xml version="1.0" ?>
<metadata xmlns="http://plone.org/ns/archetypes/"
   xmlns:cmf="http://cmf.zope.org/namespaces/default/"
   xmlns:dc="http://purl.org/dc/elements/1.1/">
  <cmf:type>
   Press Release
  </cmf:type>
  <dc:title>
   Some Title
  </dc:title>
  <dc:subject>
   Data
  </dc:subject>
  <dc:subject>
   Test
  </dc:subject>
  <dc:description>
   There is no one who loves pain itself, who seeks after it and wants to
   have it, simply because it is pain.
  </dc:description>
  <dc:contributor>
   sidnei
  </dc:contributor>
  <dc:contributor>
   alan
  </dc:contributor>
</metadata> """
>>> conn.put('/plone/Members/admin/press_release', data)

>>> objectIds(conn, '/plone/Members/admin')
[u'admin', u'index_html', u'.personal', u'press_release']

And check some properties to make sure their values have
been initialized correctly

>>> prop = (u'http://cmf.zope.org/propsets/dublincore',
...      u'description')
>>> get_prop(conn, '/plone/Members/admin/press_release', prop)
[u'There is no one who loves pain itself, who seeks after it and wants
to\nhave it, simply because it is pain.']

>>> prop = (u'http://cmf.zope.org/propsets/dublincore', u'title')
>>> get_prop(conn, '/plone/Members/admin/press_release', prop)
[u'Some Title']

Final Remarks

Most of the products that were shown in this article are already available in public CVS/SVN either as part of the “Collective” or “Archetypes” projects. Some of them are not yet available, but should be released “Real Soon Now”:).

We encourage you to try these products for yourself, as time permits, and to report your experience to us.

Note that eventually we plan to fold some or all of these products into Plone itself, provided that the proposed PLIPs are accepted.

  • Enfold Systems for the great work environment and making those cool products available to the general public
  • Zope Europe (ZEA) for sponsoring the Marshall product
  • Mark Hammond for his Jedi skills and infinite knowledge, always fixing my bad (en)coding
References

Sidnei da Silva:

Sidnei da Silva started to work with python just after leaving the ISP he was working for, as PHP programmer and doing Sysadmin tasks. Actually, he fell into python by accident . On the company he started, X3ng , with a few friends from University, they decided to use Zope, so he's got Zope installed and started playing with ZClasses, to drop it right away a few hours later for writing python products. He used to do python programming for food, until a good friend from Texas opened his eyes: now he does python for food and a beer .


shim
shim  ZopeMag is committed to bringing you the best in Zope Documentation. shim
shim


Home   Subscribe   FAQ   Contact   Write for us   Privacy Policy   Weekly News   PyZine   opensourcexperts.com  

Reproduction of material from any of ZopeMag's pages without prior written permission is strictly prohibited. Copyright 2003 - 2005 ZopeMag Zope/Plone hosting by Nidelven IT