Pomodoro + Workrave = Well-being

I’ve been a big fan of the Pomodoro Technique for almost a year now. No, I don’t go as far as actually having a ticking timer in my office in front of my co-workers, and I don’t necessarily plan the day in advance, but I do try to break up my work in 25-min iterations punctuated by 5-min breaks.

I used to use Ubuntu’s Timer applet to alert me to take a break, but a month ago I found what I now believe to be the perfect complement to the Pomodoro Technique: a nice little application called Workrave.

Workrave will let you define work intervals after which it will, shall we say, strongly encourage you to take a break and propose a couple of physical exercises. I’ve installed Workrave under Ubuntu (I believe it runs also on Windows) and configured it for:

* no micro-breaks
* a 5-min rest break after 25 min of work
* a daily limit of 8 hours (never reached)

Try it out! It’s completely free and quite nice. I can also recommend the Pomodoro Technique Illustrated book from the Pragmatic Programmers, but you might also want to begin with the free “official” Pomodoro book.

Posted on April 18, 2011 at 11:30 am by David Lindelöf · Permalink · 2 Comments
In: Uncategorized

Converting Sweave/R plots for inclusion in Word/OpenOffice

Just a quick note to myself:

When you use Sweave and produce high-quality plots in both EPS and PDF formats, you sometimes want to include them in Word or OpenOffice documents.

You can’t directly include PDF files, and you can only include EPS files when they have a preview, which Sweave will not do by default. And anyway, the EPS version of the graphs are made for black and white printers and will not be in color.

I’m using `convert` from the ImageMagick package to convert the PDF plots to PNG files. However, by default `convert` will give low-quality pictures. To have better images, I do:

convert -density 300 XXX.pdf XXX.png

If you feel geeky enough you can even include this in a Makefile:

figures: $(patsubst %.pdf, %.png, $(wildcard report-*.pdf))

report-%.png : report-%.pdf
convert -density 300 $< $@

Posted on April 14, 2011 at 6:13 am by David Lindelöf · Permalink · Leave a comment
In: Uncategorized

Git and Scientific Reproducibility

I firmly believe that scientists and engineers—particularly scientists, by the way—should learn about, and use, version control systems (VCS) for their work. Here is why.

I’ve been a user of free VCSs for a while now, beginning with my first exposure to CVS at CERN in 2002, through my discovery of Subversion during my doctoral years at EPFL, culminating in my current infatuation with Git as a front-end to Subversion. I’m now a complete convert to that system and could not imaging working without it. Every week I discover new use cases for this tool that I had not thought about before (and that I suspect the Git developers didn’t, either).

This week I found such a use case for Git: enforcing scientific reproducibility. Let me explain. I’m currently working on prototype software written in MATLAB that implements some advanced algorithms for the smart, predictive control of heating in buildings. As part of that work we need to evaluate several competing algorithm designs, and try out different parameters for the algorithms.

The traditional way of doing this is, of course, to set all your parameters right in your code for the first simulation, to run it, then to set the parameters right for the second one, to run it again, and so on. There are several problems with this approach.

First, you need a really good naming convention for the data you are going to generate to make sure that you know exactly which parameters you set for each run. And coming up with a good naming scheme for data files is not trivial.

Second, even if your data file naming convention is good enough that you can easily reproduce the experiment, how can you be sure that the settings are exactly right? That you didn’t, perhaps, tweak just that little extra configuration file just to work around that little bug in the software?

Third, how will you reproduce those results? Even assuming that you ran all your simulations based on a given, well-known revision number in your VCS (you do use a VCS, don’t you?), you will still need to dive in the code and set those configuration parameters yourself. A tedious, error-prone process, even if you manage to keep them all to one source file.

I think a system like Git solves all these problems. Here is how I did it.

I needed to run 7 simulations with different parameters, based on a stable version of our software, say r1409 in our Subversion repository.

I’m using Git as a front-end to Subversion. I began by creating a local branch (something Git, not Subversion, will let you do):

$ git checkout -b simulations_based_on_r1409

This will create a new branch from the current HEAD. Now the idea is to make a local commit on that local branch for each different set of parameters. Here is how:

  1. Edit your source code so that all parameters are set right.
  2. Commit the changes on your local branch:
    $ git ci -am "With parameter X set to Y"
    [simulations_based_on_r1409 66cea68] With parameter X set to   
  3. Note the 7 characters (66cea68 above) next to the branch name. These are the first 7 characters of the SHA-1 hash of your entire project, as computed by Git.
  4. Run your simulation. Log the results, along with the short hash.
  5. Repeat the steps above for each different configuration you want to run the experiment with.

By the end of this process, you should have in your logbook a list of experimental results along with the short hash of the entire project as it looked during that experiment. It might, for instance, look something like this:



















Hash Parameter X Parameter Y Result
66cea68 23 42 1024
a4f683f etc etc etc

As you can see there are at least two reasons why it’s important to record the short hash:

  1. It will let you go back in time and reproduce an experiment exactly as it was when you ran it first.
  2. It will force you to commit all changes before running the experiment, which is a good thing.

I’ve been running a series of simulations using a variation on this process, whereby I actually run several simulations in parallel on my 8-core machine. For this to work you need to clone your entire project, once per simulation. Then for each simulation you checkout the right version of your project, and run the experiment.

Quite seriously, I would never have been able to do anything remotely like this with a centralized version control system. The possibility to create local branches and to commit to them is a truly awesome feature of distributed version control systems such as Git. I don’t suppose the Git developers had scientists and engineers in mind when they developed this system, but hey, here we are.

Are you a scientist or an engineer wishing to dramatically improve your way of working? Then run, do not walk, to read the best book on Git there is.

Posted on January 24, 2011 at 4:00 pm by David Lindelöf · Permalink · One Comment
In: Research, Tools · Tagged with: ,

Thou shalt save energy

I’m not sure anyone else is saying this, so I will: I think **there is
a clear and unambiguous scriptural mandate to reduce our current
energy consumption**.

Now before you dismiss this post, this author and this blog as just
another bible-thumping fanatic, remember that in certain countries,
certain political parties profess strict adherence to Scripture while
being overtly skeptical about the whole climate warming problem. I
think they are wrong and here’s why.

Let’s first review why, from a scriptural point of
view, one could in principle argue that whether we take action or let things run
as they are makes no difference. I’ve heard some of these arguments from very good
(christian) friends, and I hope I’m not going to offend anyone by
refuting them later in this post:

* Revelation 21:1 tells us that all of creation will eventually be
destroyed and replaced by a new one:

Then I saw a new heaven and a new earth; for the first heaven and the first earth passed away, and there is no longer any sea.

* God is sovereign, so no matter what we do, things will run according
to His will.

* It is highly arrogant for Man to believe that they can do anything
about the climate.

The last two arguments are probably the easiest to answer. God is
certainly sovereign, but that doesn’t remove our responsibility for
doing the right things and making the right choices in life. In fact,
God intends us to be co-creators with Him and to participate, so to
speak, in the creative act. This point has been persuasively argued
for by several authors such as C.S. Lewis and J.I. Packer.

The big problem with the first argument is that, even though the
current creation is indeed doomed in the long run, God asked us from
the beginning to take care of it, cf. Genesis 1:28:

God blessed them; and God said to them, “Be fruitful and
multiply, and fill the earth, and subdue it; and rule over the fish of
the sea and over the birds of the sky and over every living thing that
moves on the earth.”

See that? Commandment nr 3 in the whole Bible: subdue the Earth and
rule over it. No mention here of letting things run its course simply
because the creation is about (in the biblical perspective) to be
replaced.

Ah but you might argue that this command was given *before* the Fall,
and that everything went downhill since then. You’re right about the
downhill part, but look at this, Gen 3:23:

therefore the Lord God sent him out from the garden of
Eden, to cultivate the ground from which he was taken.

Man is kicked out of the Garden of Eden, and what is he to do?
Essentially the same thing, e.g. rule over the Earth and cultivate it
and take care of it. The only difference being, of course, that now
it’s going to be painful to do so (Gen 3:19).

The mandate to take care of creation is repeated several times, for
instance right after Noah comes out of the Ark after the Flood, Gen
9:1-2:

And God blessed Noah and his sons and said to them, “Be
fruitful and multiply, and fill the earth. The fear of you and the
terror of you will be on every beast of the earth and on every bird of
the sky; with everything that creeps on the ground, and all the fish
of the sea, into your hand they are given.”

Or of course Psalm 8:5-6:

Yet You have made him a little lower than God,
And You crown him with glory and majesty! You make him to rule over
the works of Your hands; You have put all things under his
feet.

God intends us clearly to rule and manage His creation, no matter what
ultimate fate awaits it.

But the real suprise came to me while re-reading the following
(Deut. 22:6-7):

“If you happen to come upon a bird’s nest along the way,
in any tree or on the ground, with young ones or eggs, and the mother
sitting on the young or on the eggs, you shall not take the mother
with the young; you shall certainly let the mother go, but the young
you may take for yourself, in order that it may be well with you and
that you may prolong your days.

This was one of the many commands given Israel before entering the
promised land. The spirit of this passage, and of others like it, is
unambiguous: God is asking us simply to be **utmostly careful in managing our
natural resources**. We are forbidden to view Earth as just a vast source of
riches to be exploited as quickly and efficiently as possible. We are
explicitly commanded to make sure that Earth can go on being such a
source of riches, indefinitely if needed.

(I’ve even read somewhere that the number of years Israel spent in
babylonian captivity, 70, corresponds to the number of years the land
should have been allowed to rest since Israel took possession of it,
but didn’t. Here again, the importance of allowing natural resources
to replenish themselves is evident.)

The concept of “rest” is a potent one in Scripture. We are to rest
once a week. The land was to rest once every 7 years. We were supposed
to leave alone the corners of our fields that weren’t harvested. In
other words, **Scripture is full of passages mandating a careful
management of our natural resources**. Arguing that we can do whatever
we want with Earth simply because it is doomed to the eternal fire
anyway is not only lazy and criminal, it is also doctrinally false.

Posted on January 6, 2011 at 4:18 pm by David Lindelöf · Permalink · 2 Comments
In: Energy

Neurobat, day one

Yesterday marked my first day as Chief Technology Officer at Neurobat AG, a young company formed in Switzerland to industrialize and market advanced building control algorithms, such as the ones commonly researched and developed at my former laboratory, the Solar Energy and Building Physics Laboratory at EPFL.

This also marks the end of almost three years spent building enterprise integration systems in Java for a certain coffeeshop. I’m now moving back to my original topics of interest, namely the intelligent control and simulation of buildings. Indeed, without disclosing too much, the very first project I will be working on is the implementation of certain ideas formulated during the Neurobat research project carried out aeons ago at LESO-PB. Except this time the systems won’t be running in the quiet and safe environment of an experimental building whose occupants have a history of forgiveness towards enthusiastic graduate students and their ideas—including myself. No, this time we mean business, that is, embedded systems that must be build rock-solid and run unattended for years, or possibly decades.

One issue that’s come up more than once was whether we should keep MATLAB as our lingua franca for prototyping and trying out new ideas and concepts before porting them to languages more, shall we say, closer to the machine. Or should we just dump it (including its non-negligible licencing costs, especially for an non-academic organization) and work directly as close to the metal as we dare?

Personally, without wanting to sound overly smug or anything, I think that someone asking this question has obviously never tried multiplying two matrices in C. The implementation contributed by James Trevelyan to the Numerical Recipes in C website runs to about 33 lines:

void dmmult( double **a, int a_rows, int a_cols,
double **b, int b_rows, int b_cols, double **y)
/* multiply two matrices a, b, result in y. y must not be same as a or b */
{
int i, j, k;
double sum;

if ( a_cols != b_rows ) {
fprintf(stderr,”a_cols b_rows (%d,%d): dmmult\n”, a_cols, b_rows);
exit(1);
}

#ifdef V_CHECK
if ( !valid_dmatrix_b( a ) )
nrerror(“Invalid 1st matrix: dmmult\n”);
if ( !valid_dmatrix_b( b ) )
nrerror(“Invalid 2nd matrix: dmmult\n”);
if ( !valid_dmatrix_b( y ) )
nrerror(“Invalid result matrix: dmmult\n”);
#endif

/* getchar();
dmdump( stdout, “Matrix a”, a, a_rows, a_cols, “%8.2lf”);
dmdump( stdout, “Matrix b”, b, b_rows, b_cols, “%8.2lf”);
getchar();
*/
for ( i=1; i<=a_rows; i++ )
for ( j=1; j<=b_cols; j++ ) {
sum = 0.0;
for ( k=1; k<=a_cols; k++ ) sum += a[i][k]*b[k][j];
y[i][j] = sum;
}
}

Give me instead MATLAB's

y = a * b

anytime. Now of course I realize the comparison is completely unfair. The C version includes error checking, comments, etc. But still, C is, after all, originally a systems programming language, while MATLAB-the-language is a DSL for doing precisely this sort of stuff. I never wanted to prove that C sucked at doing linear algebra—I just wanted to show that most trivial operations in MATLAB would have to be—by us—re-implemented in C before we can even begin using them. And I don't think we have that sort of time. Not outside of academia.

Posted on September 14, 2010 at 2:43 pm by David Lindelöf · Permalink · Leave a comment
In: Announcements

Resources for building simulation

About two weeks ago I posed the following question on the Bldg-sim mailing list:

Where can I find a list of publications relevant to the field of
building simulation? I’m particularly interested in refereed journals
and books.

The ensuing thread has been extremely helpful, in particular Shanta Tucker who pointed me to the IBPSA website. There, on the References link, you will find a fairly complete listing of articles, resources and books, including the full contents of every IBPSA conference paper. What’s not to like?

Posted on August 23, 2010 at 12:00 pm by David Lindelöf · Permalink · Leave a comment
In: Research

DB4ALL: reformatting the mess that Internet has become

I always try very hard to keep my posts within the main topic of this blog, namely computers in the context of building automation and simulation. Occasionally I fail, like for today’s post.

I’d like to tell you about a software company co-founded by a friend and fellow Toastmaster of mine, David Portabella. The company’s name is [DB4ALL](http://www.db4all.com), and they specialize in software for retrieving structured data from the web.

(Disclaimer: I am not affiliated with this company. I have had the opportunity to play with their tool, which I sincerely think is a high-quality one, but I derive no remuneration from writing this piece.)

They’ve developed `Webminer’, a Java library for extracting data in a structured manner from any website. Suppose, for instance, that you need a relational database with the data from the [CIA World Factbook](https://www.cia.gov/library/publications/the-world-factbook/). That data, though in the public domain, cannot be obtained in the form of a relational database, but only by clicking around on the CIA website. But with ‘Webminer’, the smart guys at DB4ALL can write a custom application that will know how to navigate such websites, ‘scrape’ and ‘normalize’ its data, and save it to a relational database for you.

On [DB4ALL's website](http://www.db4all.com) you will find references to [the two most popular datasets](http://db4all.com/databases/) that they’ve mined: the above-mentioned CIA World Factbook, and the SourceForge database of open-source projects. Having such data in a relational form is invaluable for any researcher or marketing analyst. Suppose for instance that you want scientific data on the popularity of different programming languages over time in open-source projects. Well with these datasets you have all you need to get started.

This, for instance, is a screenshot of the SourceForge dataset opened in Excel:

All in all, if you need publicly available data from a website stored in a relational database form, you should definitely consider using [DB4ALL](http://www.db4all.com)’s services.

Posted on July 12, 2010 at 11:09 am by David Lindelöf · Permalink · Leave a comment
In: Announcements

Software engineering best practices in academia

As you might know, my primary background stems from the field of
academia and research, but over the past years my interests have
focused increasingly on software engineering.

With the benefit of hindsight, it’s clear to me today that if I had
known what I know today about software, I would without doubt have
been a much, much more productive researcher and graduate
student. It’s simply not possible today to carry out research without
programming. And research itself, to be considered valuable, requires
exactly the same qualitities demanded from modern software
engineering: repeatability, versioning, and safe explorations.

I’m convinced today that researchers would benefit if practicing
software engineers would give them some feedback on how they solve
these problems. And I’ve often pondered whether I should begin writing
on software engineering topics that I think could be relevant for
scientists and/or engineers, particularly in the academic field. It
could even form the basis for a series of blog posts.

I’d rather ask you, dear reader, for advice on this. **Would you like
me to begin a series of posts on software engineering topics relevant
to scientists and engineers in academia?** And if yes, which particular
subjects would you like to see me discuss?

I’m really, really looking forward to reading your comments on this matter.

Posted on July 8, 2010 at 8:56 am by David Lindelöf · Permalink · Leave a comment
In: Announcements

Weird certificate verification error

I spent most of the day today debugging a very mysterious error we
encountered when trying to programmatically call a web service over SSL
from Java.

Here is the source code with which we managed to reliable reproduce
the error:

import javax.net.SocketFactory;
import javax.net.ssl.SSLSocketFactory;
import java.io.*;
import java.net.Socket;

public class SimpleSSLTest {
public static void main(String[] args) throws IOException {
try {
int port = 443;
String hostname = “somehost.com”;
SocketFactory socketFactory = SSLSocketFactory.getDefault();
Socket socket = socketFactory.createSocket(hostname, port);
InputStream in = socket.getInputStream();
OutputStream out = socket.getOutputStream();
PrintWriter pout = new PrintWriter(new BufferedWriter(new OutputStreamWriter(out)));
pout.println(“GET ” + “/” + ” HTTP/1.0″);
pout.println();
pout.flush();
BufferedReader bin = new BufferedReader(new InputStreamReader(in));
String inputLine;
while ((inputLine = bin.readLine()) != null) {
System.out.println(inputLine);
}
in.close();
out.close();
} catch (IOException e) { throw e; }
}

The website, `somehost.com`, used a SSL certificate signed by our own
internal certificate authority. That authority’s certificate was
stored in a `cacerts` Java keystore. We run this code from the command
line thus:

$ java -Djavax.net.ssl.trustStore=cacerts -cp target/classes/ SimpleSSLTest

When we run this, the application bombs with an exception, the root
cause of which reads as follows:

Caused by: java.security.cert.CertPathValidatorException: CA key usage check failed: keyCertSign bit is not set
at sun.security.provider.certpath.PKIXMasterCertPathValidator.validate(PKIXMasterCertPathValidator.java:153)
at sun.security.provider.certpath.PKIXCertPathValidator.doValidate(PKIXCertPathValidator.java:325)
at sun.security.provider.certpath.PKIXCertPathValidator.engineValidate(PKIXCertPathValidator.java:187)
at java.security.cert.CertPathValidator.validate(CertPathValidator.java:267)
at sun.security.validator.PKIXValidator.doValidate(PKIXValidator.java:261)
… 22 more

We’ve tried to wrap our heads around this problem the whole day and
could make neither head nor tail about it, especially as we didn’t get
this error at all when targeting another host, using another
certificate but signed by the same certificate authority.

As a last resort, I thought of checking exactly which version of Java
we were using. Turned out we were using OpenJDK, the version that
replaced Sun’s version in Ubuntu 10.4. Running the same code with
Sun’s Java SDK solved the problem, but we can’t confidently state that
we understand what was wrong. Perhaps a bug in OpenJDK’s
implementation of JSSE. Who knows.

If you’ve run into the same problem, feel free to leave a comment. I’d
be interested to hear if (and how) you’ve solved it.

Posted on July 1, 2010 at 1:45 pm by David Lindelöf · Permalink · One Comment
In: Programming · Tagged with: 

MATLAB’s inane idea of time

MATLAB seems to have a very peculiar notion on how to represent dates
and times. Yesterday I spent a wonderful couple of hours debugging
some code that’s supposed to compute the sun’s position, most of which
could have been avoided if the MATLAB designers had followed a simple
convention used by, I believe, most computing platforms.

In MATLAB, dates and times are represented internally by a so-called
*serial date number*, defined as the number of time units counted
since a given reference date. If you are like me you will, I suppose,
assume that this reference date is the standard UNIX *epoch*,
i.e. midnight, January 1st, 1970. Well you’re only about two millenia
off—the reference date in MATLAB is the hypothetical (and
non-existent) date of midnight, January 1st, 0000. Never mind that
there never was a year 0000—the calendar goes straight from 1 BC to
1 AD.

And if you *really* are like me you will of course assume that the
unit of time in which this serial date number is counted is seconds,
or at least milliseconds. Wrong again—MATLAB choosed *days* as its
fundamental unit of time. And of course, Octave was forced to follow
MATLAB’s choice:

octave:4> format long
octave:5> now
ans = 734313.962094548

Besides making it much more difficult to make MATLAB interoperate
with, say, Java libraries, there are several problems with this
approach (documented in Octave’s help file, haven’t checked in
MATLAB):

1. The Julian calendar is ignored, so anything before 1582 will be
wrong;
1. Leap seconds are ignored. In other words, MATLAB ignores days that
happened to be 86401 seconds long (yes, there are).

When working with timeseries data, in particular climate data, I
always try to count time from the UNIX epoch—ideally as the number
of seconds from the epoch, the way `date(1)` works when called with
the `+%s` format argument:

18:08:49@netbook$ date +%s
1277309333
18:08:53@netbook$ date +%s
1277309336

In Java, `System.currentTimeMillis()` will return the number of
milliseconds since the epoch:

scala> System.currentTimeMillis
res0: Long = 1277395240485

In R, converting a `DateTime` object to numeric yields the number of
seconds:

> as.numeric(Sys.time())
[1] 1277310008

In short, every computing platform I’ve touched in the recent weeks
represents time starting from the standard UNIX/POSIX epoch, and
always do so in a unit related to seconds. In other words, there is no
justification for MATLAB’s decision to represent time since year 0000,
and even less for doing so in number of days. I don’t mean to bash
MATLAB (well… a bit, maybe). I just regret that anytime I need
MATLAB to interoperate with some other code, I need to include a factor
of 86400 and shift everything by 1970 years.

Posted on June 24, 2010 at 2:18 pm by David Lindelöf · Permalink · 4 Comments
In: Programming · Tagged with: