Event rate of arrival analysis with R
Here is a very common problem: suppose you’re give a series of event timestamps. The events can be anything—website logins, persons entering a building, anything that recurs regularly in time but whose rate of arrival is not known in advance. Here is, for example, such a file which I had to analyze:
05.02.2010 09:00:18 05.02.2010 09:00:18 05.02.2010 09:00:21 05.02.2010 09:00:23 05.02.2010 09:00:24 05.02.2010 09:00:29 05.02.2010 09:00:29 05.02.2010 09:00:30 05.02.2010 09:00:35
and so on for several thousand lines. Your task is to anlyze this data and to derive “interesting” statistics from it. How do you do that?
My initial reaction was, hey let’s try to derive the rate of arrival per second over time. Histograms are one way of doing this, except that histograms are known for the dramatically different results they can yield for different choices of bin width and position. So instead of histograms, I tried doing this with so-called density plots.
That, it turns out, was a terrible idea. I think I must have spent a day and a half figuring out how to use R’s density function and its arguments, especially the bandwidth parameter. There are two problems with density: 1) it yields a density, which means that you have to scale it if you want to obtain a rate of events; 2) it’s an interpolation of a sum of kernels, which has the unfortunate side-effect of yielding a curve whose integral is not necessarily unity.
In the morning of the second day, I realized I had been solving the wrong problem. I’m not really interested in knowing the rate of arrival. What I really need to know is how many items is my system handling simultaneously. Think about that for a second. If your data represent the visitors to your website, you really don’t want to know how many visitors come per second if you don’t know how much time the server needs to serve each one. In other words, if you want to make sure the server never melts down, you need to know how many users are served concurrently by the server, and to know that you also need to know how much time is needed for each request.
Or again, if you’re designing a building or a space to which people are supposed to come, be served somehow, and then leave, you really don’t need to know how many people will come per hour; you need to know how many people will be in the building at the same time.
There is something called Little’s Law which states this a bit more formally. Assuming the system can serve the requests (or people, or jobs) without any pileup, then N=t× Λ, where N is the number of requests being served concurrently, t is the time spent on each request, and Λ is the request rate. Now it should be obvious that if you know t, the data will give you N from which you can derive Λ (if you want).
Here’s how I did it in R. Suppose the data comes in a zipped data.zip file, with timestamps formatted as above. Then:
library(lattice) # always, always, always use this library
# Get raw data as a vector of DateTime objects
data <- as.POSIXct(scan(unzip("data.zip"),
what=character(0),
sep="\n"),
format="%d.%m.%Y %T")
# Turn it into a dataframe which will be easier to use
data.lt <- as.POSIXlt(data)
data.df <- data.frame(time=data,
sec=jitter(data.lt$sec, amount=.5),
min=data.lt$min,
hour=data.lt$hour)
data.df$timeofday <- with(data.df, sec+60*min+3600*hour)
rm(data, data.lt)
Note that for this example we'll assume all events happened on the same day.
Now here's the idea. We're going to build a counter that counts +1 for each event and -1 when that event has been served. In R, we can do that with the cumsum function. For example, suppose we have a series of ten events spaced apart according to a Poisson distribution with mean 4:
> x <- cumsum(rpois(10,4))
> x
[1] 6 9 14 20 26 33 37 38 38 44
Suppose each event takes 6 seconds to serve, and build a structure holding x, the coordinates of the original events and of their completion times, and y, the running counter of the number of events being served:
> temp <- list(x=c(x, x+kLatency),y=c(rep(1,length(x)),
rep(-1, length(x))))
> reorder <- order(temp$x)
> temp$x <- temp$x[reorder]
> temp$y <- cumsum(temp$y[reorder])
If you now plot the temp structure here is what you would get:
With all this in place, we have now everything we need to go ahead. The little script above can go into its own function or it can be defined as xyplot's panel argument:
kLatency = 4 # seconds
xyplot(1~timeofday,
data.df,
main = paste("Concurrent requests assuming", kLatency, "seconds latency"),
xlab = "Time of day",
ylab = "# concurrent requests",
panel = function(x, darg, ...) {
temp <- list(x = c(x, x + kLatency),
y = c(rep(1,length(x)),
rep(-1, length(x))))
reorder <- order(temp$x)
temp$x <- temp$x[reorder]
temp$y <- cumsum(temp$y[reorder])
panel.lines(temp, type="s")
},
scales = list(x = list(at = seq(0, 86400, 7200),
labels=c(0,"","",
6,"","",
12,"","",
18,"","",24)),
y = list(limits = c(-1, 20)))
)
Unfortunately I cannot show you the results here, but this analysis showed me immediately when and how often the webserver would be under its most heavy load, and directly informed our infrastructure needs.
Book review: Agile Project Management with Scrum
I began reading Ken Schwaber’s ‘Agile Project Management with Scrum’ for two reasons: 1) it’s a book about Scrum, and 2) it’s from Ken Schwaber, one of the fathers of Scrum. Having now read it, I think these are the only reasons I don’t entirely regret reading it.
The book is a series of case studies, bases on real-world experiences Schwaber has had managing projects. Each case study shows how a particular aspect of Scrum was applied, adapted, or tweaked on a real project. Schwaber devotes each chapter to a distinct aspect of Scrum, e.g. the ScrumMaster’s role, the project backlog, the sprint planning, etc.
As such, the book is clearly intended for experienced ScrumMasters, which I am not. But I think an experienced ScrumMasters reading this book will find it lacking in depth. There is almost too much material in this book, covered too shallowly. Each chapter in this book might easily provide enough material for a separate book, or a workshop, or a detailed whitepaper. Even with my limited knowledge and experience of Scrum I felt frustrated by the lack of detail Schwaber gave in the book. For instance, he never shows us a real-world example of a product backlog. Instead of showing us a snapshot of a real sprint backlog taped to a team’s room, the book shows us a nicely formatted table with very obviously watered-down entries.
In short, I think an experienced ScrumMaster will find this book lacking in detail, whereas the Scrum student will find it hard to relate to the case studies. As such I find it hard to recommend this book, and I think anyone interested in Scrum should rather consult Schwaber’s earlier book, ‘Agile Software Development with Scrum’, or Henrik Kniberg’s excellent ‘Scrum and XP from the Trenches’, also available for free from here.
How to include MathML in a WordPress blog
UPDATE
I’ve decided to disable direct MathML support on this blog due to the many browser incompatibilities it introduces. You can read the original version of this post on this blog’s former home
In: Tools · Tagged with: MathML, XHTML
Project GreenFire mentioned on Java Posse
On a recent episode of The Java Posse they mentioned a project I had never heard about before: project GreenFire.
From the project’s website:
GreenFire efficient manages and controls heating control of houses and saves energy. GreenFire is based on Java EE 5 (is tested with Glassfish v2), Scripting, RMI and Shoal. SunSpot integration is planed.
I applaude the effort to create a J2EE-based solution to cheap and efficient home automation. I’ve often felt that home automation was, and still is, the domain of do-it-yourselfers, with its attendant reliability problems. Running a home automation solution on a J2EE stack might solve many of these problems.
In short, kudos to the creator(s) of this project, I’ll definitely keep an eye on what you are doing.
In: General · Tagged with: Home automation, Java Posse
Looking up an EJB from a Web Service under JBoss 4.x
EJB injection in Web Services does not work with JBoss (yet), so when you want to use an EJB from your @WebService annotated POJO you have no choice but to look it up yourself.
This can get a little tricky, because each J2EE container can use its own JNDI naming convention when registering the EJB in the global naming context. Here I document how I configured JBoss to register my EJB with a name that I choose, and how I mapped a proper EJB name to that JNDI name.
Naming the EJB
Give your EJB a reasonably unique name in its annotation:
@Stateless(name=”MyServiceBean”)
@Local(SomeInterface.class)
public class ServiceBean implements SomeInterface {
…
}
Tell JBoss what JNDI name to register it under
Include the following in your jboss.xml file:
Tell JBoss to map an EJB name to the JNDI name
In your WAR, configure your jboss-web.xml file to include this:
Lookup the EJB from your Web Service
You can now lookup the EJB directly from your Web Service:
@WebService
public class MyWebService {
private SomeInterface getMyBean() {
try {
return (SomeInterface) new InitialContext().lookup(“java:comp/env/ejb/MyEJBName”);
} catch (NamingException e) {
throw new EJBException(e);
}
}
}
In: Programming · Tagged with: ejb, jboss, jndi, Web service, webservice
7 numbers why building automation can save the world
Automating buildings costs money. Lots, lots of money. The return on investment (ROI) is usually very low, and it takes a long, long time (on the order of 5 to 10 years) for such an investment to pay for itself.
To make matters worse, people who rent the home or apartment they live in have little incentive to make it energy-efficient. They have no guarantee they will still live in the same place 10 years in the future. And landlords? Why would they invest? Energy costs are always born by the tenants, so they too have little incentive.
If financial considerations won’t motivate people to invest in smarter buildings, here I propose another incentive. Building automation, if implemented globally, is one of the most cost-effective strategies for keeping the atmospheric CO2 concentration at safe levels until 2050.
I reviewed Thomas L. Friedman‘s Hot, Flat and Crowded in an earlier post. In that book, Mr Friedman refers to a paper published by Pacala and Socolow in Science in August 2004.
I’ve traced that paper. You can find it here: Stabilization Wedges: Solving the Climate Problem for the Next 50 Years with Current Technologies. Even if you don’t read the full paper, please do read the first couple of pages. The authors do a fantastic job at summarizing our current situation with respect with CO2 emissions and where we are headed if we do not act now. The abstract speaks for itself:
Humanity already possesses the fundamental scientific, technical, and industrial know-how to solve the carbon and climate problem for the next half-century. A portfolio of technologies now exists to meet the world’s energy needs over the next 50 years and limit atmospheric CO2 to a trajectory that avoids a doubling of the preindustrial concentration. Every element in this portfolio has passed beyond the laboratory bench and demonstration project; many are already implemented somewhere at full industrial scale. Although no element is a credible candidate for doing the entire job (or even half the job) by itself, the portfolio as a whole is large enough that not every element has to be used.
Let me summarize the key figures, and please commit them to memory:
280 ppm CO2 atmospheric concentration
For the most part of human history, the CO2 concentration in the atmosphere remained relatively stable at 280 ppm (parts per million). The industrial revolution coincided with the start of a clear increase in CO2 concentration.
375 ppm CO2 atmospheric concentration
The CO2 concentration at the time of the article (2004). But remember that CO2 concentration has always increased since careful measurements started in the late fifities:

CO2 atmospheric concentration measured on Mauna Loa (Hawaii) for the past 50 years, adapted from my thesis.
500 ppm CO2 atmospheric concentration
Even allowing for (healthy) skepticism, most scientists believe that mankind must at all costs prevent the CO2 levels from reaching double the preindustrial concentration, or about 560 ppm. To err on the side of caution, we as a species should pledge never to let CO2 level cross the 500 ppm limit.
7 billion tons of CO2 per year
When Pacala and Socolow wrote the article, mankind was dumping in the atmosphere the equivalent of 7 billion tons of CO2 per year (7 GtC/year). That’s enough CO2 to fill 1 billion hot-air balloons each year. It is also the upper limit of allowed global emissions if we are to stabilize CO2 atmospheric concentrations at their current levels for the next 50 years.
14 billion tons of CO2 per year
If we fail to act now, by 2054 we will be pumping out 14 billion tons of CO2 per year in the atmosphere, according to the so-called Business As Usual (BAU) scenarios. Such an emission rate will almost certainly result in a CO2 concentration of more than 500 ppm, i.e. beyond the safe upper limit. The consequences on global warming can only be disastrous.

Average global temperatures for the last 150 years, adapted from my thesis.
50 years
Stabilizing CO2 emissions is only the first half of the battle. Our goal is to stabilize them at their current levels for the next 50 years, but after that we must devise solutions to reduce them.
7 wedges
The paper proposed 15 potential solutions (or “wedges”) for stabilizing our CO2 emissions. Each one of these is technologically feasible and has been commercially demonstrated. Any of them will prevent the increase in CO2 emissions by 1 GtC/year by 2054. Thus, to keep our CO2 emissions to current levels by 2054, we must implement at least 7 of these 15 strategies on a global scale.
Wedge 3
The third wedge proposed by the authors appears to me as the easiest to implement:
Cut carbon emissions by one-fourth in buildings and appliances projected for 2054.
Yes, that’s right. If we or our children are to make it safely through the second half of this century, we must implement at least 7 of 15 strategies, one of which is the reduction in carbon emissions by 25% in buildings and appliances.
And how, you may ask, can we achieve this? Well, there are really only two solutions. We may switch to more carbon-neutral energy sources, or we may reduce our energy demand. As I’ve argued in a previous post, we should prefer the latter option for the following reasons:
- Our fundamental problem is our dependency on cheap sources of energy. Carbon-neutral energy sources, although much cheaper than only ten years ago, are still far from competitive.
- We have enjoyed cheap sources of energy for so long that we have never had to consider the need to reduce our demand. In other words, we are addicted to energy, not oil.

Credits: RogeSun Media
- It is much, much more cost-effective to reduce the energy demand of buildings and appliances, particularly through better home and building automation, than attempting to replace our current sources of energy with carbon-neutral ones.
Conclusion: an elevator pitch for building automation
We currently emit 7 GtC/year in the atmosphere. If we fail to act now, we will be emitting 14 GtC/year in 2054 and the CO2 concentration will be more than twice its preindustrial level. Building automation, if implemented on a global scale, can make buildings at least 25% more energy effective, which will prevent the emission of 1 GtC/year by 2054, out of 7 GtC/year required to keep at current levels. It is arguably the most cost-effective strategy for mitigating climate change.
In: Energy, Home automation · Tagged with: An Inconvenient Truth, Climate Change, Global warming, Greenhouse gas, Mitigation of global warming
Enjoying a NO energy home
There’s a nice little article about green energy homes over at Greenprofs.com. The author makes a nice point that home owners have many options when it comes to choosing carbon-neutral energy sources: he mentions solar, wind and even hydroelectric power. I cannot help but submit, respectfully, that this article completely misses the point.
There are, conventionally, five different sources of energy, four of which are carbon-neutral and only three of which can be said to be completely renewable:
But, as Thomas L. Friedman mentioned in his excellent Hot, Flat and Crowded (see my review here), there is a sixth energy source: the energy that we do not use.
And the key challenge we face as a race for the next century or so is to shift our energy consumption patterns more and more towards that sixth source, instead of trying to draw more and more energy from the four carbon-neutral ones.
We have the technology and we have the means, particularly when it comes to the energy needs of homes and buildings; it is only a question of bringing those possibilities to the global awareness, instead of letting people believe they will eventually be able to simple replace all coal-powered plants with wind mills.
In: General · Tagged with: Carbon neutrality, Energy, Energy development, Fossil fuel, Hydroelectricity, Renewable energy, Technology, Thomas Friedman
5 Java logging frameworks for building automation software
Back in 2005, when I was writing Java software for an embedded home automation controller, we ran into a little unforeseen problem. The embedded virtual machine we were using implemented only the Java 1.3 API, and so did not offer any logging facilities (which only Java 1.4 started having).
We ended up using the logging implementation from the GNU Classpath project, but choosing a good logging framework in future projects will make a crucial difference for embedded applications that will run for months, if not years, unattended.
Here I recap the most popular Java logging frameworks of the day, and their relevance to the field of building automation.
- Java Logging API
- The default logging framework, a default implementation of which comes with the JRE. Expect this to be available on any platform that implements at least Java 1.4. A good, default choice for many applications, but very limited in its functionalities. Probably the best choice when memory and/or disk space is a critical issue.
- Log4j
- Arguably the most popular logging framework for Java, with equivalent implementations for many other languages. Extremely flexible and easy to configure. If your application runs on a “real” machine then you would be wise to choose this framework, or the more recent Logback (see below). If you run on an embedded platform, the choice will be more difficult and require careful thought. You can also use Chainsaw, a graphical tool for parsing Log4j logfiles if they are formatted in XML, which should probably never be done on an embedded system. The development of Log4j seems, however, to be stuck on version 1.2.
- Logback
- Intended as the successor of Log4j, and written by the same author. I don’t have much experience with it but it’s probably a smart move to get to know it.
- Jakarta Commons Logging
- Not a logging framework per se, but a logging framework wrapper. Certain kinds of applications, such as libraries, should avoid any tight coupling with any particular logging framework and instead use a framework wrapper such as JCL. There will, however, be a (small) memory penalty, which should be evaluated if the application runs on an embedded platform.
- SLF4J
- The author of Logback and Log4j wrote also the Simple Logging Facade for Java, another logging framework wrapper that solves several issues and problems with JCL. I cannot see how a building automation application could be concerned with these kinds of classloading problems, but I like the idea of statically binding the wrapper with the logging framework, something which JCL did not do. Also, it sort of lazily evaluates the log strings, so you avoid the little performance hit when debugging is turned off (an important factor for embedded systems). You should probably prefer this wrapper over JCL, especially if Logback takes off and eventually replaces Log4j.
In: Programming · Tagged with: Java, Java Virtual Machine, Programming
When I hear the word “Entity” I reach for my thesaurus
According to Eric Evans, the author of Domain-Driven Design:
Tackling Complexity in the Heart of Software, one of your first goals as domain analyst is to define what he calls an Ubiquitous Language, i.e., a common vocabulary that your technical team and your business sponsor will agree on, and will use to communicate. Having such a common vocabulary will prevent misunderstandings and will probably help you name things in your program, such as class and method names.
But what do you do when the business people seem unsure themselves of how to define certain concepts in their own domain? How do you even detect such situations?
I’ve found that certain words, when they begin to show up too often in conversations between the technical team and the business people, are good indicators that something needs to be clarified. Words such as object, element, perform, or do. But one word in particular seems to be particularly difficult to clarify once it has become accepted by business people: entity.
I was working on a business application that involved, well, entities that sold widgets in the world on behalf of a certain company. I asked the business people to help me understand what an entity was, or what exactly their responsibilities were.
So is, for instance, an entity associated with a country? Well no, because entity 33 is responsible for more than one country. Ok, so is an entity responsible for a geographic zone, including one or more countries? Well no, because see here, entity 12 handles all the sales in country Foo but only B2B sales in country Bar. The best definition they seemed to be able to come up with was that an entity was something that sold widgets to some people. Not exactly helpful.
Then I asked them a simple question. “Could you give me a dump from your database of the names of all your entities?” Sure enough, they did. And when I read the list of entities, I saw that they all were names of companies incorporated in their respective countries. “Say”, I asked, “aren’t these, like, subsidiaries?”
Subsidiaries they were indeed. It turned out that the company had segmented the market in each country in B2B and B2C products, and the company’s history explained why a given subsidiary got a given market segment in a given country, even in another country than its own.
This realization led to an immediate simplification of our class diagrams, and a lot of class renaming. But the resulting code is now much, much clearer, something even the business people admit.
It almost always pays off to work on redefining vague or incomplete terms. It will help you as a developer, and it might well help your customer too.
In: Programming · Tagged with: Business-to-business, Business-to-consumer, Domain-driven design, Entity, Eric Evans, Market segment
Book review: “Hot, Flat and Crowded”
I’ve read Thomas Friedman’s “Hot, Flat and Crowded”, and firmly believe this book belongs on the shelf of anyone involved in making buildings more energy-effective.
Mr Friedman’s previous bestseller, “The World is Flat”, discussed the changes to our world that enabled more and more people to participate in a global economy. “Hot, Flat and Crowded” is sort of a sequel to that book, although there is no need to have read it first (I haven’t).
Mr Friedman’s thesis is that as more and more people participate in the global economy (“flat”), the standards of living are increased everywhere and world population grows (“crowded”), while more and more people aspire to a western style of life, with its damning consequences to the environment (“hot”).
According to Mr Friedman, the small habit changes that we are all asked to pick up (changing incadescent lights to more efficient lights, turning off our TVs instead of leaving it on standby, etc), while commendable, are simply insufficient. There is nothing we can do invidually to prevent atmospheric CO2 levels from reaching dangerously high levels in this century. Instead, he proposes a series of measures nations should take, such as imposing carbon taxes, cap-and-trade schemes, and several regulatory laws on energy efficiency for vehicules and buildings. Only such drastic measures, he argues, will make a significant difference.
An entire chapter, in particular, is dedicated to what he calls an “energy internet”, in which he envisions how a network of appliances and utilities could cooperate to dramatically reduce the energy demand for buildings. Buildings, incidentally, represent about 40% of any nation’s energy demand, and have this annoying property of being long lived. Once a building is built, it will consume energy and water for the next 30-40 years, which makes it all the more important to build them right from the beginning.
Contrary to many “green” books, though, Mr Friedman gives more arguments in favour of “going green” than the obvious environmental ones. There are at least two other, less obvious reasons why we should work on reducing our energy demand and developing renewable energy sources:
- Current demand for oil finances petro-dictatorship all over the world, preventing many countries from achieving freedom and democracy.
- Many poor countries lack ready access to cheap and clean energy, preventing their further development.
The arguments Mr Friedman develops for these aspects of the problem are worth the price of the book alone, in my opinion. But the real eye-opener is the extensive discussion on the way utilities have traditionally worked and how they should work in the future. He doesn’t say that our homes and buildings should simply become more efficient; he says that by wiring together buildings, appliances and utilities together, we can escape from local optimums in energy efficiency and aim for much more important savings.
And that, of course, is one of the aims of this blog: to bring together information on how to achieve that vision. We definitely have the technology (we have had it for the past 30 years or so), we only need the economic incentives to work on this problem. And Mr Friedman provides several suggestions that governments the world over could and should adopt, suggestions that will, if implemented, leave us no choice but to build the infrastructure he describes.
In: Book reviews · Tagged with: friedman book review hot flat crowded




