Wild pipeline API thoughts

(Note: this is a long post, and most certainly the syntax highlighter will make it look funny on your aggregator. You might want to visit the web page to get a better grasp of it)

In 2006 it will be roughly 6 years since I started juggling with XML pipelines. As my few fellow readers might remember, I’m starting to hate XML languages with a passion, but once again this doesn’t mean I don’t like XML anymore and, even more, this doesn’t mean my love for Cocoon is fading. I’m still convinced that pipeline-based processing is the way to go: the road to complex yet maintainable results clearly goes through decomposing the problem in a set of easy step to be performed sequentially and incrementally.

I’m also still convinced that XML is here to stay, for a number of valid reasons, yet I think that the overall scenario has changed since the original Cocoon vision. XML is possibly the most important player out there, but didn’t manage to pursue its Borgish ambition to assimilate everything else: there is a growing party of people who are realizing how the idea that everything could (and should!) be represented as XML is pretentious at least, and stupid at most.

This leaves us, however, with two important concepts: we need pipelines, and we need to steer clear of XML when it doesn’t make sense. To achieve the first goal what we need is a generic, easy and intuitive pipeline API. And it should be a programmatic API, because we need pipelines everywhere, and we need them to be easy enough to grasp for the average programmer (think Facade on steroids): what bugs me with the currently available pipeline API is how they tend to be clumsy and counter-intuitive. Think SAX as the perfect example of why we need a more generic and easier pipeline API and machinery: in the SAX world if you want to pipe events from foo to bar you just do this:

[java]
foo.setContentHandler(bar);
[/java]

Things however get complicated when baz and boo enter the picture. Now you have to:

[java]
baz.setContentHandler(boo);
bar.setContentHandler(baz);
foo.setContentHandler(bar);
[/java]

Which, counter-intuitively, means building the pipeline starting from the last component and going all the way to the first one. In addition to that, those statements are usually interspersed on code that contains other statements such as creation and configuration of the various pieces. Moreover, from a functional point of view the above code could be rewritten as:

[java]
foo.setContentHandler(bar);
bar.setContentHandler(baz);
baz.setContentHandler(boo);
[/java]

Which, even if it looks better from the user point of view (the pipeline steps are now ordered) it has no relation with the underlying model. In fact, you could actually obfuscate stuff when considering switching jobs:

[java]
foo.setContentHandler(bar);
baz.setContentHandler(boo);
bar.setContentHandler(baz);
[/java]

The above lines will still work as expected, but I dare anyone to understand who sends events to whom in a real life scenario where those statements might be ten lines away from each other. To me, this just doesn’t sound right.

Now enter Cocoon and see how, in its declarative sitemap, it shines from the user’s point of view:

[xml]

[/xml]

This just sounds right: pipeline steps are listed sequentially, as they should be, and everyone now understands who’s first and who’s next. But, heck, this is a domain-specific language by all means, moreover written in XML. No way this stuff can be reused in different context, and the “strong typing” nature of the Cocoon pipeline (where everything starts with a Generator and ends with a Serializer, assuming not only that the whole world will talk XML but actually that the whole world will talk SAX) makes things even more difficult.

Finally, consider what the Unix genius have brought us:

[code]
$ grep index.html access.log | awk ‘{ print $1 }’ | sort | uniq | wc -l
[/code]

I think there’s no better comment for the above solution than this:

“A designer knows he has achieved perfection not when there is nothing left to add, but when there is nothing left to take away.” (Antoine de Saint-Exupery)

Expressing the pipeline concept with just one character (the | sign) is a clear indicator of what could be achieved when thinking about simplicity: the concept is powerful, yet the user space view of it is as simple as it can get (admittedly, a bit opaque but it doesn’t take much to get used).

So, what do the above snippets bring to us? In this quest for a simple pipeline API we learn that simplicity is key and that the principle of least surprise suggests that the pipeline declaration should happen at once and in an ordered way. Sticking to Java, this leaves us with something like

[java]
pipeline.setupPipeline(List components);
[/java]

or (uglier, but sometimes just effective enough):

[java]
pipeline.setupPipeline(PipelineComponent[] components);
[/java]

Actually I’d much rather see the setup happening during the construction phase, but for the sake of interface design I’ll leave the convenience method for now. This means that our Pipeline interface becomes something like:

[java]
interface Pipeline extends PipelineComponent {

void setupPipeline(List components);

void start();

}
[/java]

Easy and effective: whoever knows the pipeline concept should be able to grasp how this API works in minutes. We need to talk about the PipelineComponent interface though, and this is related to the next wild idea: pipeline machinery.

(warning: shaky ground ahead, this is the part which needs *much* more thinking)

In the OO world things aren’t quite as simple as in a CLI environment, where all you have are basic pipeline contracts such as “whatever byte streams comes from the left side is pumped to the right side”. We have objects here, and I don’t want this generic pipeline API to be strongly typed as in being able to work just with – say – SAX events or other XML gibberish. What I want is an API which is able to work with many formats in a way that’s transparent to the user: this means that the various pipeline stages should be able to express their contracts in terms of required input and output format. It’s up to the pipeline machinery providing adapters and bridges so that, say, a pipeline component working with SAX events might be able to cooperate with another component working with streams. This could be accomplished either through some kind of PipelineDescriptor, with annotations or just through different interfaces: whatever keeps things simple, makes me happy.

Another nice solution comes (again!) from a chat with Sylvain reminding me of the IAdaptable approach in Eclipse. This solution fits like hand in glove with a world of interchangeable and heterogeneous pipeline component stages, even though I have a few concerns thinking about the added complexity for pipeline component writers in implementing an Adaptable strategy: if the first and foremost objective of this API is simplicity, then writing components should be as easy as possible.

Anyway, the final outcome of all this would be something like this during the pipeline assembly phase:

[java]
PipelineComponent current;
PipelineComponent next;

if (next.accepts(current)) {

current.setNext(next);

} else {

PipelineComponent adapter = next.getAdapter(current.class);

current.setNext(adapter);
adapter.setNext(next);

}
[/java]

With this mechanism in place, in theory, the pipeline is much more versatile: sticking to the XML world it would be possible to build pipelines using whatever mix of SAX, DOM, StAX, AXIOM and YouNameWhat. Moreover, it would be easy enough to provide adapters to the stream world, tees and nested pipelines (there is a reason for Pipeline to extend PipelineComponent after all).

Of course I expect the pipeline machinery to do much more than just adaptation: caching, logging, monitoring and management are vital to the pipeline deployer. But the real point of this effort is, once again, simplicity. Once I’m able to do this:

[java]
// Get the default pipeline implementation
Pipeline pipeline = PipelineFactory.getPipeline();

// Set up the pipeline with an array of PipelineComponents
pipeline.setupPipeline({reader, transform1, transform2, streamAdapter });

// grab the InputStream from the latest component
InputStream result = streamAdapter.getInputStream();

// start processing
pipeline.start();

// enjoy results
is.read()…
[/java]

or this:

[java]
// This time we use SAX events straight away
pipeline.setupPipeline({reader, transform1, transform2 });

// connect to the pipeline result
transform2.setContentHandler(myContentHandler);

// start processing and handle events coming in
pipeline.start();
[/java]

or, why not:

[java]
pipeline.setupPipeline({reader, transform1, transform2 });

anotherPipeline.setupPipeline({something, pipeline, somethingElse});

anotherPipeline.start();
[/java]

Then I could do this from within Cocoon:

[javascript]
function handlePage() {

var pipeline = cocoon.newPipeline({ file(“something.xml”, xslt(“foo.xsl”), forms(), i18n() });

cocoon.sendPipelineAndWait(pipeline);
}
[/javascript]

but also, when Cocoon is not an option:

[xml]
< %@ taglib uri="http://jakarta.apache.org/taglibs/pipeline" prefix="pipeline" %>
[/xml]

Conclusion: if you managed to survive this far, well, congrats and thanks for sticking: it’s been a bumpy ride and there are certainly a ton of rough edges, but the more I think about it, the more I’m convinced that a simple, painless and easy to use Pipeline API could be an invaluable tool. I’d love to use the incredible experience of Cocoon in building solid pipelines to factor out a new and fresh approach that allows anyone to enjoy the power of pipeline-based processing: it’s not going to be easy, but the goal is indeed worth the effort. Now, finding the time to make it happen is a totally different question…

(Unusual) hacking fun

These post-Christmas days are a bit less hectic (but hey, just a tad) than what I’m used to survive to normally, so I’m having some good old hacking fun (you know, the kind of stuff you don’t know how much you’ve been missing until you return to it).

The excuses for firing up that IDE again were multiple: I wanted to take Maven 2 for a spin and, after my recent rant about pipeline languages vs. APIs, I wanted to try some concepts out and see where they might bring. I’m well far away from a solution, but so far I have a few points to make:

  • Maven2 is really, really nice. I’m still reluctant to say that it rocks since I need to see how it behaves with complex stuff, but so far it has way exceeded my expectations. After a few years spent juggling megabytes of jars even for the simplest stuff, seeing that my (automagic, just mvn assembly:assembly) distribution of what I’ve got so far is *just* 4K despite having 15 different library dependancies makes me so happy that I might break in tears any moment now.
  • E4X looks terrific. It’s so immensely powerful to sparkle a lot of weird ideas in mind to exploit every single bit of it in my quest to reduce XML-induced overtyping and messy stuff. Now, if only I could convince Rhino to use my (Java) DOM trees directly in E4X instead than having to go through string serialization every time (yuck!) I’d be a very, very, happy puppy (suggestions are welcome, of course).
  • JSON is another neat piece of technology worth visiting. A suggestion from Sylvain revealed how my quest for a comfy pipeline API might be soon over if I manage to bend it a little bit to my needs, but so far looks promising indeed.

I’m almost positive work stuff will make me drawn any moment now, so that I’ll be forced to quit these nice experiments, but I definitely want to pursue the above technologies, see if and how they might help in the OSS stuff I’m involved in and, last but not least, bring them to our projects where it makes sense. Moreover, as my formal new year proposition, I want to commit myself to some hacking on a regular basis: yes, old lawyers farts can code!

Your presentation sucks, your presentation rocks!

I’m an easily bored kind of guy: if you want me to pay attention to what you’re saying, you’re better make sure that the topic is interesting enough AND that you’re presenting it the right way.

This gets extemely important if I’m at a conference and I’m following your talk: there is a good chance I can get Internet access while you’re talking, and if you don’t manage to grab my attention, I will happily switch to surfing and e-mail. So, in the spirit of being constructive, here are a few random suggestions for you to make sure I listen up.

Slides: the root of all evil. I’m strongly convinced that a good talk doesn’t need slides at all, except from graphs, code snippets and nice pictures, however I also realize that slides is what the audience is expecting nowadays. That being said, I have a whole sleeve of problems with slideware, but let’s stick to the main points:

  • first of all, here is some news for you: we all can read. If your presentation is all about reading slides, than thanks but I can do that myself in a fraction of the time it’s taking to you. Not to mention that what you’re saying is no news anymore.
  • for the same reason, mile-long slides should be avoided. Stick to a few bullet points, and make them interesting enough for me to hear from you what was that catchy phrase about.
  • do your homework, and study your slides. It sucks so badly when you advance to the next slide and stop for a few seconds to actually read it. Actually, you should start talking about the next slide before it hits my eyes: have me expect something, and I’ll be all ears

The way you present: in most cases, if attendees’ thoughts were floating as comics balloons, you would see a bold and loud “BORING!” flashing all over the room. Ok, this is technical stuff so you shouldn’t act like a clown, but still there are a few tricks that keep us from snoring:

  • walk around: don’t do your presentation sitting on a table or standing behind a conference desk, as if you were nailed to it. If you move around, the audience will have to follow you, and coincidentally might even hear a word or two of what you’re saying.
  • if you walk around (and you should) do a favor to us all, and buy yourself a wireless presenting mouse: it’s just a few bucks, but it will radically change your audience experience. Changing slides shouldn’t require walking to your notebook and click a mouse: we’ll get bored in no time flat, especially if you’re the kind of guy who needs to read what the next slide is about. I know it’s just a second or two, but it’s more than enough to kill the attention threshold.
  • use your body: have some gesture walking your talk. Clap your hands, raise your arms, snap your fingers, squat, duck, tilt: everything would do. Let us know you’re a human being, with moving parts.
  • change your tone of voice: 50-60 minutes are way too long to pay attention to what seems a Text-To-Speech automatic output. Shout, whisper and talk: the audience will be with you.
  • look at me. Actually make sure you look in turn everyone in their eyes. If you stare at the end of the room, people won’t feel you’re having a conversation with them, and will start wondering where to go for dinner.
  • interact with the audience. Perform show of hands, and ask the audience a few open questions. But be careful with it: keep in mind that the audience came to the room to hear something from you, not the opposite. Also, if you’re giving a talk to an international audience, know that you might get less feedback because people are shy to speak out in a foreign language. And there little if anything worst than an open question with no answers.

Finally, you: given all the points above, ask yourself is you’re the speaker kind at all. This has nothing to do with tecnical background: I’m sure you know your stuff. But do wonder whether:

  • do you have a sound knowledge of the language you’re speaking, if that’s not your native one? A telling sign is knowing a few jokes and being able to leverage them. If you’re barely able to write short emails and/or you have a frightening accent that won’t let anyone but your fellow countrymen undestand what you’re saying, the answer is probably not.
  • do you feel comfortable standing in front of a crowded room? If you’re somewhat shy or easy to feel under pressure, that will show up: you’ll speak with a feeble tone, you’ll start muttering stuff, and the audience will turn its attention to something more interesting like counting people in the room and performing statistics over their hair colour.

Presentation is somewhat a form of art: like it or not, technical content is not enough to have people walk out enthusiastically from your room thinking they learnt something or that they definitely give your stuff a try: the way you’re presenting makes the difference. Make sure you’re entertaining: you’re audience will thank you, and they will come back to your next talk.

ApacheCon random bits

ApacheCon just started with a great keynote about computer history, I chaired my first session and I’m now sitting in a crowded room to hear all the best and latest from Apache Geronimo, while trying to catch up with e-mail and office-related issues (sometimes being able to check e-mail sucks badly). These have been incredibly packed days, full of geek fun hacking on Cocoon and its possible migration path to OSGi and with some wonderful moments spent with my wife. Some random bits of what happened during the past few days follow:

  • driving to Stuttgart went smooth, despite Switzerland doing its best to piss me off. I always hated having to pay a yearly fee to get to Swiss highways even if it’s just for a day: even though the total amount is probably less of what I’d pay in tolls for a corresponding mileage on italian highways, still I don’t like the idea, so finding out that a good deal of the road was on ordinary single-lane roads going across villages and cities provided a good chance to refresh my curse and swear words vocabulary.
  • the hotel is great, offering one of the best breakfasts I could experience in a long time (people who know me are aware of how much of a difference that can make). Beds are incredibly soft and fluffy, and I had a few sleeps to remember (well, driving 500kms surely helps in sleeping well).
  • Stuttgart is having an important tennis event, and a good deal of players share our hotel: it’s somewhat funny, in the evening, to see the hotel reception packed with tanned athletes on one side, and over/underweight pale geeks with funny shirts on the other.
  • food (well, apart from breakfast, that is) hasn’t been that great so far. We’ve been unable to find a decent place in Stuttgart: when I’m abroad I costantly refuse to eat italian food, and my wife doesn’t like oriental. Given that here you basically have to choose between italian and chinese restaurants, you can imagine us walking frantically downtown to find some local place, with no real luck (bad suabian food really sucks, believe me). Suggestions are welcome, of course.
  • A notable exception to the bad food experience happened yesterday. I left the Hackathon early, to drive around Stuttgart with my wife and do some sightseeing. Eventually, we had quite a trip: we were able to get on a scenic road crossing the Black Forest, where we had some breathtaking views of the impressive landscape, with our final destination being Baden-Baden. Well, actually Baden-Bades was just so-so, which made us consider having dinner in Karlsruhe, just to be able to see a different place. On the highway, we found out that Heidelberg wasn’t that far away, and we decided to go the extra (20) miles. Well, it was definitely worth the fuss: Heidelberg was really nice, and after a short walk I had one of the best “Schweinaxe” (spelling?) I can remember. It took us a while to get back to Stuttgart, and eventually I got lost trying to find the hotel, but it has been a great day to remember overall.

It’s now time to get back to the conference with the usual setup going on: enjoy conference, use the wi-fi network, hunt power plugs to recharge your batteries, meet great people and so on. More on this later…

Cocoon tutorial at ApacheCon

I just received the final confirmation for my tutorial at the upcoming ApacheCon Europe (are you subscribed yet? If not, hurry up: it’s a great place to be!). Time to start updating slides, handouts and other courseware: squeezing all things Cocoon in 180 minutes is going (again) to be a challenging task, given also my aim to give the audience a comprehensive overview of what’s going on under Cocoon’s hood and how to build fantastic software on top of it. See you there then: meeting Apache people will be great as usual, and German beer won’t hurt at all.

On a side note, this time I will be traveling with my better and beloved half: we are planning to drive to Stuttgart and get a few days off (the following weekend, at a very least, but possibly a few days more) after the conference. Any advice from locals or experienced visitors about places to stay and sights to see around the Stuttgart area (or on our drive back to Italy) would be greatly appreciated.

8 simple rules for building a manageable Cocoon application

A few notes to self to learn from my own mistakes and avoid spending endless hours next time trying to clean things up when you’re close to production. Some of these rules are of course already part of our internal Cocoon development process, but I thought I could share them online and see if someone has better approaches to suggest to a few simple but cumbersome issues when starting a Cocoon-based project:

  • Don’t even start any development before customizing the Cocoon environment for the project needs. This boils down to “cut the crap”: be ruthless in excluding blocks, don’t enable any of them “just in case”. Unnecessary stuff will clutter at best and bite you at random times in most cases. Do version-control your local.build.properties and local.blocks.properties.
  • Using Cocoon’s xconf tool, have a set of files ready to further edit the Cocoon configurations, cutting even more crap. The block system still has some quirks, and you end up with unnecessary stuff such as JMS or XSP when you don’t need it and really don’t want it around.
  • Always take relocability of your Cocoon application into account, from the very beginning of your project. Your application should work from the webapp root (mounted as /), under a context path (/whatever) or as a sub-sitemap (/whatever/the/user/wants). This means that you should always pass a “mount-point” attribute within your rendering pipelines and components, calculating where your application has been mounted, and building paths accordingly. Unfortunately the current Cocoon API will only tell you the full request path and the URI handled by the sitemap, so you need to mangle those data by substracting (string-wise) the sitemap URI from the request URI in order to obtain the path you’re looking for. That is, if your sitemap is mounted on /context/foo and your request is for /context/foo/bar/baz, the sitemap URI will be bar/baz, while the data you need is actually /context/foo to build your links properly: yeah, that gives me headaches as well, and I have an overdue patch to provide this location in an easier way. But please tell me if I’m just missing an obvious solution.
  • Don’t ever rely on Cocoon samples for functionality. If you need any feature such as XSLTs, JS files or sitemaps, import it into your project right away. Do it now, or you will regret it in the future. A lot.
  • The above note has a special meaning for form handling. The impressive and powerful Cocoon Forms resources are part of the samples. Since you probably want to reuse them, to avoid any temptation, just package everything up in a JAR file and use resource://whatever as your $resources-uri parameter in your forms stylesheets. the following Ant task might come handy:
   <target name="pack-cforms-resources" depends="jar" 
      description="package all the CForms resources">
      <jar jarfile="${dist.dir}/cocoon-forms-block-resources.jar"
         basedir="${cocoon.home}/src/blocks/forms/samples">
          <include name="resources/**"/>
          <exclude name="**/.svn"/>
      </jar>   
   </target>

and, later on:

      <map:transform type="forms"/>
      <map:transform 
          src="resource://resources/forms-samples-styling.xsl">
        <map: parameter name="resources-uri" 
          value="resource://resources"/>
      </map:transform>
  • Keep a very strict separation between application files and resources that a user might want to customize. This is expecially true for stylesheets and client-side javascripts/images/icons. Consider using the xsl:import feature in your stylesheets, providing an empty user-oriented version of your stylesheets, where templates can be overridden. Might hurt performance a bit, but sure helps tremendously for real life scenarios, where your customer is supposed to customise files and you’re supposed to provide updated versions of your applications. If you don’t, every deployment after the first one becomes an incredibly painful process.
  • Ideally, package the whole application as a jar file: remember that Cocoon can easily access jar-packaged resources with the resource: protocol, and this holds true even for that overrideable XSLT who can import from your jar files. You can even package your root sitemap inside a jar and reference it with the resource:// protocol.
  • Don’t forget error and logging management. There is nothing worse that the default Cocoon error page popping up in front of your customers, so plan for a sensible handle-errors section right from the beginning of your project. Also, don’t forget to customise your logkit.xconf file. These resources can be easily part of a shared project codebase, so there is no excuse for preparing them in advance.

Geeky propositions for 2005

2004 has definitely been a business-oriented year. Not that I don’t enjoy talking to customer and advocate open source in general, quite the opposite actually, but I’ve been sorely missing some “hands on” stuff. So, given my huge amount of Copious Free Time (not!) these are my geek side Open Source objectives for Y2005:

  • build a Lucene & Cocoon based blog engine/micro CMS, as much feature complete as possible;
  • finish up and publish Sourcerer, our Cocoon based remote repository management tool;
  • help the Jackrabbit & JSR170 effort;
  • get back to some serious Cocoon stuff
  • build a ESI compliant HTTP proxy cache, possibly with Simpleweb if nothing better comes out in the meantime
  • check out Rails, which looks definitely cool;
  • decide whether Daisy (which looks quite good but still has a few things I’m not really sold on) can really be my preferred CMS alternative. If so, join the party.

Phew, that’s a lot of stuff. I know I won’t have the time to accomplish even a fraction of that, but the list looks interesting anyway…

Writing stuff for/with Cocoon

I’ve been coding again lately. Boy, that feels good for a change, after a lot of time spent doing project management, offerings, backoffice and a LOT of travel. Sure, you earn free miles for vacation flights, but what’s the point when you don’t have any time off?

Anyway, back to coding: I’ve been using Cocoon again and I’m amazed at how far it went, how incredibly powerful CForms + continuations are, and how easy is to build a seemingly complex app with very few lines of code: I have sitting on my hard disk a full-fledged web file manager with inline XHTML editing, with full internationalization and skin support. with this little thingy you can manage almost everything: a local file system or a remote FTP/WebDAV server (we’re using it as a Subversion frontend). All this fits into less than 150K of overall code, and could definitely use more optimization. We’re using it as a poor man’s CMS and fits the job quite nicely. And yes, as soon as I manage to polish it a bit better, it will be Open Sourced. Hey, glad you asked! :)

There is quite a bit of room for optimization in Cocoon though, especially with regard to ease of configuration and deployment. There are very promising work in progress (Sylvain and Carsten rock!) about better componentization and modular configuration, but this is really, and badly, needed. It’s incredibly and overly painful to come up with a nice setup to integrate, deploy and distribute an application inside Cocoon. At best, there is a royal PITA in managing patch files for configuration, copying resources and libraries all around the place and figuring out the best way to make an application really portable (where “portable” to me means that it can be mounted anywhere inside a Cocoon installation: Application “foo” should work as /, /foo, /bar/foo and so on).

Howerer, no pain no gain, and using Cocoon indeed pays off in the end: big time. And I hope to have some time this year to help in this area.

(now, if only the Slide folks had not badly broken their client library I would be a much happier man…)