How not to get hired by Neurobat

When I recruit software engineers I always ask them to first take a short online programming test. Following a recommendation from Jeff Atwood, we use Codility as an online programming testing tool.

The goal of this test is not to assess whether you are a good programmer. I believe there’s more to software engineering than merely being able to code a simple algorithm under time pressure. The goal is to filter out self-professed programmers who, in fact, can’t program. And according to Jeff Atwood again, these people are uncomfortably numerous.

During our current recruitment round we got an angry email from a candidate who performed less than stellarly:

Thank you for your e-mail, outlining that you don’t wish to proceed further with my application.

I fully understand your position, though I feel that your online testing system is flawed. I have been programming C and C++ on and off for 25 years, so I guess if I don’t know it, then nobody does.

It’s simply not realistic to test people under such artificial conditions against the clock, relatively unprepared and in a strange development environment.

Nevertheless, I’m glad to have experienced the test, and it has helped resolve my focus on exactly the type of jobs that I don’t wish to pursue, and the types of people I don’t wish to work with.

This is from a candidate who, according to his resume, is an “experienced IT professional” with 10+ years of experience in C/C++, Javascript, Perl, SQL, and many others. Let’s have a look at the programming test and his solution.

The test consists in two problems, rated “Easy” and “Medium” respectively by the Codility platform. The candidates have one hour to carry out the test. They can take the test only once, but whenever they want. They are given the opportunity to practice first.

Here is the gist of the first, “Easy” problem:

Write a function int solution(string &S); that, given a non-empty string S consisting of N characters, returns 1 if S is an anagram of some palindrome and returns 0 otherwise.

For example, "dooernedeevrvn" is an anagram of the palindrome "neveroddoreven". A minute of reflexion should be enough to realise that a string is an anagram of a palindrome if and only if not more than one letter occurs an odd number of times.

Here is Mr. If-I-don’t-know-it-nobody-does’s solution in toto:

// you can use includes, for example:
// #include <algorithm>
#include <iostream>
#include <vector>

using namespace std;

// you can write to stdout for debugging purposes, e.g.
// cout << "this is a debug message" << endl;

int solution(string &S) {
// --- string size ---
int N = S.size();
char *str;
bool even;
vector<int> cnt(N,0);

// --- even no of letters? ---
if (N % 2)
even = false;
else
even = true;

// --- for faster access ---
str = (char *)S.c_str();

// --- count each letter occurence ---

// --- for each letter and check letter count of all others ---
for (int i=0;i<N;i++)
{
//cout << "checking " << i << str[i]<<'n';

// --- exists at least once ---
cnt[i]++;

// --- check all other positions ---
for (int j=0;j<N;j++)
{
if (j==i)
continue;

if (str[i] == str[j])
//cout << "found match" << 'n';
cnt[i]++;
}
}

// --- if length even, chk all letter count is even ---
if (even)
{
for (int i=0;i<N;i++)
{
// --- if odd count found, then no palindromes ---
if (cnt[i] % 2)
return 0;
}
// --- all letters are even ---
return 1;
}
// --- if odd chk only one letter has odd count ---
else
{
int l_odd_cnt=0;

for (int i=0;i<N;i++)
{
// --- letter appears odd number of times ---
if (cnt[i] %2)
l_odd_cnt++;
}

// --- more than one odd letter count found ---
if (l_odd_cnt == 1)
return 1;
}

return 0;
}


Never mind that this solution has O(n2) time complexity and O(n) space complexity (the test asked for O(n) and O(1) respectively), it is also wrong. It returns 0 for “zzz”. But perhaps the use of C-style char* “for faster access” will compensate for the algorithmic complexity.

Let’s have a look at another solution proposed by a self-titled senior programmer:

// you can use includes, for example:
// #include <algorithm>

// you can write to stdout for debugging purposes, e.g.
// cout << "this is a debug message" << endl;
#define NUM_ALPH 30
#define a_ASCII_OFFSET 97

int solution(string &S) {
// write your code in C++11
//std::map<char,int> letters_to_counts;
//std::map<char,int>::iterator it;
//int len = S.size;

//string alph = "abcdefghijklmnopqrstuvwxyz";

int count[NUM_ALPH]={};  // set to 0
for(int i==0; i< len; i++)
{
char ch= S[i];
int index= (int)ch;  // cast to int
count[index-a_ASCII_OFFSET]^=1;  //toggles bit, unmatched will have 1,
}
int sum_unmatched=0;
for(int i=0; i < NUM_ALPH; i++)
{
sum_unmatched+=count[i];

}
if(sum_unmatched<=1)return 1;
return 0;
}
// did not have time to polish but the solution logic should work


This one doesn’t even compile, but fortunately the “logic should work”. I’m sure it will, being written as it is in C, and with helpful comments too (“cast to int”, really?)

I have several more examples like this one, all coming from candidates who applied to a job ad where I made the mistake to ask for a Senior Software Engineer.

Compare this with a contribution from one who applies to a non-senior position:

#include <algorithm>
#include <map>

map<char, int> createDictionary(string & S) {
map<char, int> result;
for (char ch : S) {
++ result[ch];
}
return result;
}

int solution(string & S) {
map<char, int> dictionary = createDictionary(S);
int numEvens = count_if(dictionary.begin(), dictionary.end(),
[] (const pair<char, int> & p) { return p.second % 2 == 1; });

return numEvens < 2? 1: 0;
}


Not only is this code correct, it also reads well and demonstrates knowledge of the recent additions to the C++ language. And this comes from a relatively younger candidate, who came as far as the in-person interview.

Again, software engineering is about much more than merely programming skills. This test is only the first filter; when the candidates are invited for the interview I ask them to explain their reasoning and their code to a non-programmer, to see how their communication skills stack up. Only then will we consider making them an offer.

Posted on October 16, 2014 at 9:53 am by David Lindelöf · Permalink · 8 Comments
In: Uncategorized

Review: Growing Object-Oriented Software, Guided by Tests

I didn’t know what to expect when I picked up this book. In spite of its excellent reviews I feared it was going to be another redundant addition to the mountain of books harping on the virtues of Test-Driven Development (TDD), without adding anything significant to the standard sermon.

Nothing could be further from the truth.

I read a fair share of technical books, but this book is the only one in years that I immediately began to re-read again after finishing. It is easily one of the most important books on software engineering out there, and is likely to remain so for some time to come.

The authors present what is now known as the London school of TDD, where the correctness of an object is defined by its interactions with its collaborators, not necessarily by its state. Although I had seen mocking frameworks in action before, never had I seen one being used throughout the development of a software project.

Another fascinating idea is the notion of writing an end-to-end test first, before even starting to write unit tests. We have been so thoroughly drilled on the virtues of fast tests, that it doesn’t occur to us anymore that it’s even possible—even preferable—to exercice the whole system, perhaps in a separate test suite.

But the best part of the book is the sample project used to illuminate these concepts. It consists in writing a desktop application with which a user can automate the process of bidding in online auctions. The graphical part is done with the Swing framework in Java, and the application talks via XMPP to the auction house. The first chapter in the case study is about setting up literally an end-to-end test, i.e. a test (written with JUnit) that will verify if the graphical display matches the XMPP communications.

From there on, the case study proceeds with the implementation of feature after feature, always following the same pattern: write the end-to-end test first, implement the feature with TDD, refactor, repeat.

No book is worth reading if it doesn’t change your approach to your existing projects. This one showed me immediately where our current project (an embedded system for energy management) was lacking in terms of testing.

Go read this book, and send me flowers and chocolates.

In: Book reviews

How to determine if a sample is drawn from a normal distribution

Suppose you’ve performed some experiment on a given population sample. Each experiment yields a single numeric result. You have also derived the usual statistics (say, the sample mean and the sample standard deviation). Now you want to draw inferences about the rest of the population. How do you do that?

I was surprised the other day to learn that there’s an ISO norm for that. However, life gets much simpler if you can assume that the parent population is normally distributed. There are several ways to check this assumption, and here we’ll cover what I believe are two of the easiest yet most powerful ones: first an informal, graphical one; then a formal, statistical one.

The graphical method is called a (normal) Q-Q plot. If your sample is normally distributed then the points in a normal Q-Q plot will fall on a line.

Here is a vector of measurements that I’ve been working with recently. (Never mind what these represent. Consider them as abstract data.)

> x
[1] 20.539154 -1.314532 4.096133 28.578643 36.497943 12.637312 6.783382 18.195836 15.464364 20.155207


The command to produce a normal Q-Q plot is included in R by default:

> qqnorm(x)
> qqline(x, col=2)


Note that I also call qqline() in order to draw a line through the 25% and 75% quantiles. This makes it easier to spot significant departures from normality. Here is the result:

No nomination for best linear fit ever, but nothing either to suggest non-normality.

Now for the statistical test. There are actually a lot of statistical tests for non-normality out there, but according to Wikipedia the Shapiro-Wilk test has the highest power, i.e. the highest probability of detecting non-normality on non-normally-distributed data. (I hope I’m getting this right or my statistician friends will tan my hide.)

This test is built-in to R with the shapiro.test() function:

> shapiro.test(x)

Shapiro-Wilk normality test

data: x
W = 0.9817, p-value = 0.9736


You probably have a part of your brain trained to release endorphins when it sees a p-value lower than 0.05, and to trigger a small depression when the p-value is higher than 0.9. But remember what it is we are testing for here. What is the null hypothesis here?

Here, the null hypothesis is that the data is normally distributed. You might find this counter-intuitive; for years, you have been trained into thinking that the null hypothesis is the thing you usually dont’t want to be true. But here it is the other way around: we want to confirm that the data is normally distributed, so we apply tests that detect non-normality and therefore hope the resulting p-value will be high. Here, any p-value lower than, say, 0.05 will ruin your day.

So we have determined both graphically and numerically that there is no evidence for non-normality in our data. We can therefore state that to the best of our knowledge, there is no evidence that the data comes from anything else than a normal distribution.

(Ever noticed how much statisticians love double-negatives?)

In: Uncategorized

MATLAB Coding Conventions

Over the course of four years we have developed, at Neurobat, a set of coding conventions for MATLAB that I would like to share here. The goal of these conventions is three-fold:

• Help scientists and engineers write clean MATLAB code
• Help write MATLAB code that will be easily ported to C
• Provide guidelines to external parties that write MATLAB code for us.

Feel free to redistribute and/or adapt these rules to suit your organization.

Rationale

We have observed that scientists and engineers who use MATLAB tend to write MATLAB code that mirrors their way of thinking: long scripts that perform computations as a series of steps.

Our experience has shown that code written in that style tends to become hard to understand and to modify. Furthermore, it tends also to be hard to port to C. As an alternative, we suggest that both MATLAB and C programs will benefit from the application of the so-called Opaque Data Type programming idiom. Our experience has shown that a disciplined application of this idiom leads to more modular, cleaner code that is also easier to port to C.

In the rest of this article we enumerate the rules that should be followed to apply this idiom to the MATLAB language.

Represent an object with state as a struct

Neither C nor MATLAB has (a satisfying) support for object-oriented programming; however, some degree of encapsulation can be achieved by using structs, which both C and MATLAB support.

We have found structs to be the best way to represent state in MATLAB. The alternatives, namely global variables, or persistent variables, are effectively global variables and cannot be used to represent state held by more than one object.

Provide a meaningful name to the structure

The state-holding structure should represent some kind of object in the real world; provide a name for this structure, so we can understand the purpose of this object.

Represent a module by a folder

Keep all the code related to a particular data structure (constructor, methods and unit tests) under the same folder, with the same name as the structure. The C language lets you implement all functions in the same file, usually called a module. MATLAB requires each (public) function to be defined as the first function in their own .m file. Keep all those .m files in the same folder.

Never expect the client code to access fields directly

No code, except the methods defined in the enclosing folder, is expected (or allowed) to access the fields of the structure directly.

Define a constructor

Never expect the client code to build the struct itself; always provide a suitable function, called a constructor, that will instantiate the proper fields in the structure. The client code should never even be aware that they are dealing with a structure.

Keep a consistent naming convention for functions

C has no namespace, and neither has MATLAB. It is therefore important to adhere to a naming convention for functions. Keep the following naming convention, where xxx is the name of the enclosing folder:

Constructor: xxx_new(...)

Methods: xxx_method_name(xxx, ...)

Destructor (if needed): xxx_free(xxx)

Methods, including the constructor, may accept optional arguments. The first argument to all methods should be an instance of xxx, on which it is understood that the operations will apply.

Keep the Command-Query Separation principle

The Command-Query Separation principle states that a method should either return a computed value, or update the state of the object, but not both. Keep this principle unless doing so would obviously lead to less readable and less maintainable code.

Unit tests

We believe that the practice of Test-driven development leads to better software. We are however aware that applying this practice requires training and discipline. We therefore strongly encourage it for code provided by third parties, without (yet) requiring it. Internally developed code is almost always test-driven.

Code Quality

We understand that producing quality code requires experience, training and discipline. It would be unreasonable to expect the same code quality from scientists and engineers as from professional software craftsmen; however, we encourage you to remain alert to the following signs of deteriorating quality:

• Duplicated code
• Long functions (more than half a screen)
• Long parameter list
• Too many comments (a sign that the code has become hard to read)

Example

This is an example of how a simple PI controller could be implemented, following the guidelines above. Put the three files below under a pid folder, together with test data and test functions:

function pid = pid_new(setpoint, P, I)
pid.setpoint = setpoint;
pid.P = P;
if nargin < 3
pid.I = 0;
else
pid.I = I;
end
pid.error = 0;
pid.ui = 0;
end

function pid = pid_new_value(pid, new_value)
pid.error = pid.setpoint - new_value;
pid.ui = pid.ui + pid.error * pid.I;
end

function control = pid_control(pid)
control = pid.P * (pid.error + pid.ui);
end

In: Uncategorized

How to install RPostgreSQL on OSX Mavericks

Even if you’ve installed the PostgreSQL client binaries via Brew (i.e., brew install postgres), you will run into problems if you try to install the RPostgreSQL package in the usual way, i.e. install.packages('RPostgreSQL') from the R console. That’s simply because there are no binaries on CRAN for Mavericks. If you look at http://cran.r-project.org/web/checks/check_results_RPostgreSQL.html you’ll see that OSX Mavericks is the only flavor that runs into problems building the package.

Fortunately the solution is trivial. Install it from source, and yes, this sounds way scarier than it really is.

Download the source tarball from http://cran.r-project.org/web/packages/RPostgreSQL/index.html. Now resist for a moment the urge to untar the file. Instead, install first the DBI package from the R console:

install.packages('DBI')


and then run this from the directory where you downloaded the tarball:

sudo R CMD INSTALL RPostgreSQL_0.4.tar.gz


Posted on May 23, 2014 at 9:45 am by David Lindelöf · Permalink · 2 Comments
In: Uncategorized

Energy savings with Neurobat

The official video presenting our main product and explaining how it works:

In: Uncategorized

Programming Wingmen

For the past few weeks we’ve been experimenting with a variant of the Pair Programming theme.

Conventional wisdom holds that pairs should be rotated frequently, i.e. several times per day. In our experience, this has been hard to sustain. Instead, we’ve experimented with having two team members bond together and take collective ownership of a feature (or bug, or research item). They stick together until that feature is done. That can last anywhere between a few hours and several days.

The Pair Programming community use the metaphor of two drivers alternating being in the driver’s seat. Implicit in this metaphor is the idea that you stay with your fellow driver only for part of the whole road; several different programming pairs will end up working on the same feature.

What we are proposing instead, is that the same pair sticks together until the feature is complete. If the Pair Programming community uses a land-based metaphor, I’m going to use an aerial one. I’ll liken this to a pair of wingmen flying a missions, sticking together until the mission is done.

I’ve been pairing for almost two weeks now with our test engineer. We write the R scripts that automatically analyze, and generate a report on, this winter’s test data. Speaking for myself, I’m now convinced that this practice carries the same benefits as traditional pair programming, but brings several extra benefits:

First, since we are both responsible for the quality of this work, I feel far more accountable to my fellow team member. If I screw up, it’s our collective work that suffers. Conversely, I have a much stronger incentive for watching out over what he’s doing while he’s flying. We are 100% responsible for the quality of the work; we cannot later claim that some other pair came and introduced errors.

Secondly, it shortens the daily standup since it’s now the wings that presents what was done, what will be done, etc. In fact, it’s been the first time in years that we kept our daily standups within the allocated time.

Thirdly, it strongly encourages us to show, not tell, what was done the day before during the standup. This one is a bit tricky to explain; in fact I’m not sure I understand it, but I suspect some psychological rush is experienced from showing off what I, as a wingman, have accomplished with my partner. I see many more charts, pictures, plots, printouts at the standups since we’ve introduced this practice.

Fourthly, it reduces the guilt associated with pairing with someone instead of working on one’s “own” story. I think one of the main reasons why people don’t pair more often is the fear of not completing what they perceive as their “own” tasks. Indeed, perhaps it is time to revise the daily standup mantra. Forcing people to chant “Yesterday I did this, today I‘ll do that, etc” emphasizes their individual accomplishments.

A big issue with traditional Scrum is that by the end of the daily standup, everyone is so energized for the day ahead that they simply can’t be bothered to pair with someone else. In fact, once everyone has spoken their piece, we often found it useful to re-discuss who will be doing what for the day and with whom. This is obviously a waste of time, that can be eliminated by this wingmen system.

How did we get started? By a simple change. So simple, in fact, that it borders on the trivial, and you can try it today. Assuming you use a whiteboard to track your work in progress, figure out a way to indicate that a story is now owned by two people instead of one. For us, we used to have parallel swimlanes across the whiteboard, one lane per team member. What we did was as simple as merge them together by pairs.

If you have any position of authority in your team, I dare you to try this experiment for just one week, and for just one pair of team members. You’ll thank me for it.

In: Uncategorized

C’est “Test Unité”, m***e!

Most french-speaking professional programmers I’ve worked with will translate “Unit Test” by “Test Unitaire”. This is indeed the term used by the french translation of the Wikipedia article on unit testing.

Forgive my obsessive-compulsive disorder, but I believe the proper french translation of “unit test” should be “test unité”, and not “test unitaire”.

]2 Psst I told them that we write unit tests!

The english “unit” and the french “unitaire” mean two completely different things. “Unit” refers to a small, indivisible part of a system. “Unitaire” is a word that I have never seen used outside of linear algebra. For example a “matrice unitaire” (“unitary matrix“) refers to a complex matrix $U$ whose inverse is its conjugate transpose: $U \times U^* = I$.

I don’t know what a “test unitaire” is, nor do I know that a unitary test is. I do know, however, that a unit test is a test that tests a unit. Therefore, I also know that a “test unité” is a test that tests an “unité”.

In: Uncategorized

When to disable software assertions

I’m currently taking the Software Testing class on Udacity. The instructor spent a couple of videos discussing the pros and cons of leaving software assertions turned on in your production systems. Here is my interpretation:

Posted on April 7, 2014 at 10:08 am by David Lindelöf · Permalink · 2 Comments
In: Uncategorized

The only statistical distribution you need to know

If you pick two people at random and measure their heights, what can you say about the average height of the population?

When dealing with small sample sizes (less than, say, 20) it’s generally not possible to assume that the sample variance is a good approximation of the population variance. There simply isn’t enough data, and you cannot use the normal distribution to infer meaningful confidence intervals.

Statistics is like electrical work: it’s very easy to do a job that looks correct but could kill or maim someone. For example we have all, at least once, taken a small sample and used the normal distribution to get confidence intervals. Here is what we would get for the example above, assuming that the measured heights were 184 cm and 178 cm:

> x <- c(184, 178)
> xbar <- mean(x)
> s <- sd(x)
> qnorm(c(0.025, 0.975), xbar, s)
[1] 172.6846 189.3154


The confidence interval we just obtained ($172 \le \mu \le 189$) looks perfectly reasonable, but it’s wrong. What’s worse, there’s nothing obviously wrong with it. If $s$ was the true population standard deviation it might even be correct; but that’s extremely unlikely.

This is why Student’s t-distribution was invented (or discovered?), to deal with small sample sizes. If we draw $N$ random samples from a normally distributed population, compute the sample mean $\bar{x}$ and the sample standard deviation $s$, then the so-called t-value $\frac{\bar{x}-\mu}{s/\sqrt{N}}$ will be distributed according to the t-distribution with $N-1$ degrees of freedom.

With our example above, $N=2$, $\bar{x} = \frac{184+178}{2} = 181$, and $s=\sqrt{(3^2 + 3^2)/(N-1)}=4.24$. If we want a 95% confidence interval for the true mean $\mu$ we use the 2.5% and the 97.5% quantiles of the t-distribution with 1 degree of freedom:

> N <- 2
> x <- c(184, 178)
> xbar <- mean(x)
> s <- sd(x)
> qt(c(0.025, 0.975), N-1) * s / sqrt(N) + xbar
[1] 142.8814 219.1186


That last result gives you the correct 95% confidence interval for the true mean: $142 \le \mu \le 219$. Compare this with the interval obtained earlier: it is much wider, and therefore the uncertainty on the true mean is much higher than what the normal distribution would have let you believe.

This is, however, pretty good news. As long as your sample size is larger than 1, you can always infer correct (though perhaps useless) confidence intervals for the true mean, provided the population is normally distributed.

Recently I got angry was discussing statistics in a phone conference. The question was how many experimental test sites were necessary to estimate the CO2 reduction potential of Neurobat’s technology. I pointed out that no matter how many test sites we had, the t-distribution would always let us infer confidence intervals about the true CO2 reduction potential. But the statistics experts on the other side of the line disagreed, arguing that anything less than 10 was not statistically significant.

I tried the analogy above and showed that $N=2$ was enough to derive a confidence interval for the average height of the population.

“No, we don’t think so”, was the answer.

That’s when I decided to shut up and write a blog post.

We’ll run a little computer experiment. Suppose again that the height of the swiss population is drawn from a normal distribution of mean 180 cm and standard deviation 20 cm (I am totally making this up). We’re going to draw one sample at a time from this distribution and see the evolution of the sample mean and the confidence interval.

The true mean $\mu$ lies with 95% confidence within $\left[t_{2.5\%, N-1}\times(\frac{s}{\sqrt{N}}+\bar{x}), t_{97.5\%, N-1}\times(\frac{s}{\sqrt{N}}+\bar{x})\right]$. Let’s form this confidence interval as we progressively sample elements from the parent population:

N <- 1000
mean.height <- 180
sd.height <- 20
height.sample <- rnorm(N, mean.height, sd.height)

mean.sample <- numeric(N)
sd.sample <- numeric(N)

for (i in 1:N) {
mean.sample[i] <- mean(height.sample[1:i])
sd.sample[i] <- sd(height.sample[1:i])
}

low.interval <- sd.sample / sqrt(1:N) * qt(0.025, 1:N - 1) + mean.sample
upper.interval <- sd.sample / sqrt(1:N) * qt(0.975, 1:N - 1) + mean.sample


And here is the result:

It’s obvious that for small sample sizes, the confidence interval quickly becomes smaller than the parent standard deviation. We can see this by plotting the log of the confidence interval over time:

The confidence interval becomes smaller than the population standard deviation (20) after only 15 samples. However, even for extremely small sample sizes (less than, say, 5) the confidence interval is roughly 40 cm. The true mean can therefore be estimated within $\pm 20$, or with about 11% error, after only 5 samples. And that is thanks to the t-distribution.