15 December
2006
Digg digg it  |  Slashdot slashdot.org  |  Reddit reddit  |  del.icio.us del.icio.us  |  Technorati Technorati

The ravenous bugblatter beast of planet Python (aka the Devil Framework).

Well, it's time to come out of the shadowy dungeons and talk about the beast we at D-Level are nourishing since year 2000; the Devil Framework system. I don't want to talk about its strengths and its features but about the challenges, small and big, we faced, and still face, while developing and using it.

Not entirely unlike: an introduction to the Devil Framework

First of all, a brief introduction is needed: the Devil Framework is an Open-Source platform for controlling, integrating and visualizing heterogeneous technologies in a distributed environment, developed 99.9% (97.72% actually) in Python. Not very enlighting, I now. Whenever I tell this description to a possible customer, she asks: “Fine, but what can this software do?”. “Mostly everything.” I reply. Even more un-enlighting. So as I do with the customer, I'll explain the platform with a real life, in production, example.

We had this manufacturing group, owner of multiple independent business/production units distributed all-over the country. This units were incorporated into the group at different times, and each one kept its own IT systems, creating a very heterogeneous and not very inter-operable IT environment. There was no simple way, or no way at all, to access, aggregate and operate on the available data sources, neither at the group level neither at the single plant/business unit level. The management of the group had to wait reports from the management of the units, who had to wait reports from the sub-unit managers (production, sales, etc.), and so on. A long, boring, incomplete and error prone process, with an outcome nor flexible nor reusable.

Here comes our hero. We installed one or more collection nodes (Collectors) in each group's unit. This nodes inter-operate with the administration/sales databases, collect data from production hardware, manage plant infrastructure (air conditioning, security/presence systems, video surveillance, etc.). Collected data, once normalized in a standard format, is used for real-time and batch reporting (through the IceBridge graphical console, e-mail and SMS messages, etc.), automatic anomaly detection, alarms management, historization, cost/price simulations and more. Relevant data is also transmitted to the group master system (MCP, Master Control Program....a Tron reference) where more reports are generated, anomalies checked, etc.

All this nodes are accessible over the Internet thought our IceBridge multi-platform graphical console (which currently runs on Linux, Windows and OS X). Each user is presented with only the data, forms and modules assigned to his/her role. Administration of this distributed system is performed at the MCP level. All configuration changes, software upgrades, user interface updates and other administrative operations are automatically propagated to all sub-nodes and consoles.

Security and access control is enforced thought the built-in multi-user authentication, authorization and PKI sub-systems; data transmissions are encrypted. Each user has roles assigned and access permissions granted. The user interface automatically reconfigures itself in function of user's roles.

Vaporware?

You have gone to the web site but you have not found a way to get the framework! Does it really exists? Why is it not on sale?

Yes, it exists and it is in production use. Is not on sale as a package because we have not completed (yet) the documentation: this is a very complex system, with a lot of functionalities and without the proper documentation it's mostly useless. But if you want to test it, just send me an e-mail and I'll try to give you a working copy.

And if you think you may need it and want more infos or install it in production, don't hesitate to contact me or info@dlevel.com.

Design challenges

The Devil Framework is a complex system. It needs to interact with a lot of devices and software (almost unknown at design time), it needs to be easy to manage and develop with/for, it needs to be multi-platform, it needs to be secure and reliable. Not an easy task. Here comes a brief and incomplete chronicle of the choices that influenced our design and way of work.

About Laziness: Tools For Fast Development Cycle

I'm a lazy developer. The other Devil Framework's developer, my friend Andrea, aka “bolz”, (aka “Manare”, you can find him as a “dead body target” on an American's Army server near you), is even lazier than me, if possible.

We don't like the long write/build/run/test/crash code development cycle. We don't like to type 300 lines of code to open an empty window. We don't like programming languages and libraries that get in the way of developers. We don't like to write documentation (bolz does not write documentation at all !!!). WE ARE LAZY DEVELOPERS. PERIOD.

We wanted to use an “expressive” programming language; a language that has powerful programming concepts incorporated into the language, a big library of reusable code, and a write/compile/run free development cycle.

So I choose Perl. I was a Perl fan at the time (~1999/2000). It was an illuminating experience: before discovering the dynamic languages world I was a C/C++ programmer (you know, char *, segfaults, mem-leaks and all the other wonderful stuff). Perl is a wonderful language, super-powerful and super-fast to develop with. It let me develop the first version of the Devil Framework (at the time creatively named “Project-X”) in a 36 hour code-burst-experience (never happened before nor after... typed code for hours without interruption, than “perl server.pl ; perl client.pl” and all worked at first run! Amazing!). But Perl is also super-prone to unmaintainable code nightmares. I'm Italian, I love pasta, but spaghetti-code? No Thanks.

The “mutable-laws” of Perl were not enough to scare us, so we kept working with it.

But our Copernican revolution was just behind the angle: we believed (wrongly) that we needed a web based GUI instead of the PerlTK toolset we were using. So I searched for a web framework and I found Zope. So cool, so hot, but wait.... what's this snake thing? Oh my God, a positional language, spaces influence execution flow. No, no, no, back to my lovely ultra compact friend. But interfacing Perl with Zope was a pain in the ass. So we started learning Python to port our software (client and server) under Zope. Oh my God, this positional thing is non bad at all, and the language, clean and powerful.

To make a long story short, as the project grew up, Python revealed itself the right tool for the task; but not Zope, and not the decision of using a web GUI interface. So we were back to the design board. But Python was there to stay (so did Zope, has our web server :-) ).

Laziness 2, The Revenge: Portability

The first lazy-developer-aware tool was found. Nice.

One of the benefits Python brought was its high portability; thanks to it we could write an highly portable server (and bolz actually did it).

As we had a portable server, my silly pride made me badly want a portable client. I had to choose a portable GUI toolkit for the client. After having evaluated the Python TK port and the wxPython toolkit I chose the last. After implementing a hell of a lotta code, I had to accept the reality: at the time, wxPython portability and stability was not so good. In a dark moment of desperation and frustration I choose to rewrote all the code using another toolkit, one I've previously discarded because of the high license costs: PyQT (a Python wrapper for the QT toolkit).

If you need to write highly portable GUI code on Windows, OS X and Linux, if you need a stable GUI environment, if you need a consistent API, if you need a fast GUI development environment and if you need a documented toolkit, believe me, use QT/PyQT and you'll not regret it. The toolkit and the wrapper have their glitches, but are irrelevant compared to the advantages (at time of this writing we are using version 3.3.7 of the QT toolkit, so I can not comment about the new 4.x version; but I suppose it should be even better).

Laziness3: Source Code Management

The Devil Framework is a big project, with thousands of files and ~240000 lines of Python code.

Totals grouped by language (dominant language first):
python:      233020 (97.72%)
ansic:         2480 (1.04%)
cpp:           1833 (0.77%)
makefile:       571 (0.24%)
sh:             560 (0.23%)


Total Physical Source Lines of Code (SLOC)                = 238,464

Development Effort Estimate, Person-Years (Person-Months) = 62.71 (752.50)
 (Basic COCOMO model, Person-Months = 2.4 * (KSLOC**1.05))

Schedule Estimate, Years (Months)                         = 2.58 (30.98)
 (Basic COCOMO model, Months = 2.5 * (person-months**0.38))

Estimated Average Number of Developers (Effort/Schedule)  = 24.29

Total Estimated Cost to Develop                           = $ 8,471,018
 (average salary = $56,286/year, overhead = 2.40).

SLOCCount, Copyright (C) 2001-2004 David A. Wheeler
Please credit this data as "generated using David A. Wheeler's 'SLOCCount'."
We are two developers only, but we are really good (especially me) in loosing data and code. Moreover the source code structure evolved a lot over time and code refactoring is something we do on a daily basis. A source code management is a sine qua non.

The primary phase (CVS)

Must be said, at our defense, that we adopted a version control system almost from the beginning of our venture (hundred of CVS backup CDs are stored somewhere to prove it). Nevertheless we managed to lose data and corrupt the CVS repository too many times (I told you we are good at it). Moreover CVS impacted, both technically and psychologically, our ability to manipulate the source tree (and consequentially the source code itself). When it's so difficult to move and rename a file, you don't do it and you postpone check-ins/commits (invalidating the usefulness of the SCM concept itself). You risk to become conservative and very prone to justify bad design decisions. We were so conservative that we ignored other revision control systems for years.

The quintessential phase (GIT)

One year ago bolz (an extremely curious guy, and a terminal danger when using another one's computer) gave a try to this new Linus-brain-child tool. It changed our way to look at source code management completely. I'll not make neither a dissertation, neither an introduction to GIT. Just read the docs and try it. Its very fast and powerful. It works, and is wonderful if you happen to work in a distributed development environment (as we do). Great tool. Try it.

Don't Panic: Complexity Management

As I said above, the Devil Framework is a complex system, both on the server side and on the client side (even if in complete different ways). Our approach to the design challenges was to apply as often as we could the KISS principle and to be coherent to the “pythonic-way” of doing things.

Common Design

A common requisite of both sides was a plugin based infrastructure that gives the system free room to expand as needed. Our laziness added another requisite: the plugin infrastructure had to support run-time adding, updating and removing of plugins from a central location.

Another requisite was an expandable API management system with a fine-grained security sub-system. Again, our laziness added the requisite of a distributed transparent API: calling a remote API on a node had to be like calling a local API on the console or on a server (so we could incorporate a Python shell on the console that lets us work interactively on every node of the system , the Unix equivalent of ssh).

Application Server Design

First of all, I must tell what was the application field for the first design (the Perl one) of our system: collection, normalization and analysis of data generated by network security and logging systems like Snort, netfilter / iptables, Windows Events Log, syslog, etc. (it had to be a component of a network security appliance for remote monitoring, for witch we had developed a small Linux distribution too). That was our plan till one day we saw the light: security is a process, with its rules, but nevertheless a process like any other one.

To work on a process you collect data (input), elaborate it (mangle) and produce some result (output). And this is the design of a Devil Framework Application Server node. The node collects data from devices (like a PLC, a webcam, a database, a syslog file, another program, etc.) thought input plugins, elaborates it thought mangle plugins (scanning for anomalies, activating response actions, applying normalization procedures, etc.) and sends it to the proper output plugins (to store data into databases, send it to other nodes, etc).

A simple but yet powerful design.

IceBridge Console Client Design

The IceBridge console is an example of what Microsoft calls a smart-client application: a simple infrastructure that provides just the tools to connect to a server and download and execute the client code provided by “remote” plugins. This simple concept almost removes the burden of software upgrade and gives the power of complete application customization to the needs of every single user.

And it's a mega-productivity-upgrade for the developer too. Imagine this: you change your plugin's code, data and user interface. With a simple double-click you install the plugin on the central server, update it (propagating the changes to all the nodes of the system that are using it), run it, download its client stuff into the console and make it active (dynamically updating the GUI). Everything without restarting nor the server nor the client. Not bad, uh? When you get used to this style of working you'll never go back!

Share and Enjoy: Seamless Distributed Environment

As the original business plan for our system was to develop a network/host monitoring service and related appliance, we had to face the problem of managing and running software in a distributed and heterogeneous environment. We had no previous contact with distributed systems (except a brief “relationship” with the Amoeba OS while at university) so we had to learn on our skin the perils of distributed environments. We decided to base our distributed system on a “tree-like” topology, with a master (MCP) node that stores and propagates all configurations and plugins to the underling nodes.

The sub-eta data transmission system.

First we needed a way to send and request data in both proactive and reactive ways. We developed a protocol agnostic RPC and event transmission layer: the first incarnation was based on the XML-RPC protocol but it proved too slow and limited for our purposes, so we had to develop our own RPC protocol (PRP, Python Remote Protocol) and event transport layer (but kept the infrastructure for adding and using other protocols). We came up with a simple, elegant and efficient system (at least for us) that gives the ability to transparently call local and remote APIs on every node and console connected to the system. A local call is something like:

result = self.api.my.api.call (*args, **kargs)

while its remote equivalent (on the first reachable node) is:

result = self.rpc.my.api.call (*args, **kargs)

or for any node:

result = self.rpc[“node_id”].my.api.call (*args, **kargs)

or:

result = self.rpc.node_id.my.api.call (*args, **kargs)

Return values and exception are transparently transmitted back to the caller, and un-pickleable objects are encoded/decoded via a customizable set of codecs.

Event types are defined in a central location but can be generated from any node and walk up to the root node (if not specified otherwise).

Don't panic

Next we wanted to have the system work in presence of unreliable and not always-connected networks, maybe even hidden behind NAT. The problem was resolved using:

Thanks For All The Fish

Wow, for now I've finished. This is a very long post, especially as it's my first blog post, but resuming more than 5 years of work is not easy. This is also an introductory post, in next ones I promise I'll be more technical and more pythonic.

Last note. I'm not the first one to say it and not the last one: “Eat your own dog-food.”. I eat it everyday, and everyday I find something to change, a little or a lot. Using your own products puts you, in a limited way, in the seat of the final user (well, at the moment we are our final users, but someday, sooner or later...). Don't be afraid to critique your work, try to look at it as you look at other products you buy and use. But keep your vision. Try to do it everyday, while “eating”.

Iob's Final Message to His Creation

We apologies for the inconvenience.

(And the bad English)

(And thanks to Douglas Adams for the quotes/hints/ideas)


Category Devil Framework 
Posted by alex at 13:48 | Comments (0) | Trackbacks (0)
<< My blog inauguration message. | Main | The power of “magical” APIs. >>
Comments
There is no comment.
Trackbacks
Please send trackback to:http://www.dlevel.com/blogs/alex/7/tbping
There is no trackback.