Latest Entries »

It’s such a joyous time becoming a parent! Your software’s been conceived and is now going through an exciting and healthy period of development. You’ve fought your way through bouts of sickness and the odd craving (I’ll write it in… Malbolge*!!!), but everything’s on schedule. But wait… the delivery date is next week! Stay calm, remember your breathing and follow these top tips for delivering something wonderful into the world.

1 – Always release software with a licence – without a birth certificate, no one will know who the parents are.

2 – Waiting isn’t going to make the delivery any better – a late delivery can be as difficult as one that is premature. If things are taking too long, it can be best to induce labour.

3 – Version your software releases – things are going to get complicated if you don’t distinguish between who’s Sr. and who’s Jr. (Michael Jackson’s a bit of a visionary in this field, having included a version number in his son’s name “Michael II”).

4 – Provide contact information – make sure your software has a name tag just in case there’s a mix up at the repository.

5 – Download and test your release – on the big day, you should make a test run so that you’re happy that there’ll be no complications (pack an overnight bag).

For a slightly more, er… considered… take on this subject, check out my Top 5 Tips for Releasing Software article on the Institute’s blog.

* not recommended if you wish to stay sane. It’s named after the eighth circle of hell in Dante’s Inferno for a reason

So what shouldn’t you do?

We mention things you should do when developing software quite a bit. But we were asked an interesting question at the Software Sustainability Institute’s Collaboration Workshop this year: what things shouldn’t you do when developing software?

Come on, there has to be some. And there are – many! But let’s focus on five of the big ones…

1. Don’t develop code you can’t maintain

This has got to be high on the list. Code can turn into spaghetti from out of nowhere, and it’s always worth avoiding. Best to get into good habits early on in the project!

2. Don’t make your software difficult to build and install

We’ve all experienced this with other people’s software. If user’s can’t install it, they’ll move on – perhaps to a piece of software that has inferior capabilities. Why not make it easy, and simplify (or even automate) the build and install processes that are so often fraught with peril?

3. Don’t keep the source code to yourself

If you hold the source code in only one place – your development machine – and you lose it, you only have yourself to blame. If your development machine is your laptop, it’s even easier to lose. Avoid the pain and use a suitable source code repository from the outset!

4. Don’t forget documentation

Writing documentation is boring, but it is necessary. It’s your primary means of communicating your users what the software does and how to do it to. As with difficult build and install processes, you risk disenfranchisement if users can’t find out what they need to know.

5. Don’t overlook testing

Features, features, features. But if you neglect testing your software, you risk losing users, users, users. In the rush to implement and release a really handy new feature, ending up with a release that doesn’t work will not instil confidence in your product. And including a means for developers to run a solid set of automated tests and implement their own is very useful as a fail-fast development environment when they want to modify it themselves.

Well, that’s my five. You’ll notice I haven’t covered any software release “don’ts”, but that’s because I’m currently putting together a related top 5 list of software release “Dos” :) So these are just scoped to software development. If you’re interested, you can check out these in more detail in in my post on the Software Sustainability Institute’s blog.

Maybe you disagree with my list above, in which case let me know what you think are the big software development don’ts.

I first heard about “code smells” during a session I was chairing at Dev8D 2012 on “what makes good code good?”: Ian Bayley, from Oxford Unviersity, suggested “no ‘code smells’”. I’d never heard of this term so I turned to my trusty friend Google to see what was what. Although I hadn’t heard the term before, it turns out that I knew what code smells are… source code that just looks “odd” or doesn’t feel quite right, which are signs that suggests to a developer that refactoring might be in order.

Not only were many of the smells familiar but the “deodorants” were too. For example, replacing arrays that are used as records with objects,

String[] row = new String[3];
row [0] = "Liverpool";
row [1] = "15";

can be replaced by,

Performance row = new Performance();
row.setName("Liverpool");
row.setWins("15");

Or, replacing nested conditionals with guard clauses e.g.:

double getPayAmount() {
    double result;
    if (_isDead) result = deadAmount();
    else {
        if (_isSeparated) result = separatedAmount();
        else {
            if (_isRetired) result = retiredAmount();
            else result = normalPayAmount();
        };
    }
    return result;
};

can be replaced by

double getPayAmount() {
    if (_isDead) return deadAmount();
    if (_isSeparated) return separatedAmount();
    if (_isRetired) return retiredAmount();
    return normalPayAmount();
};

The term “code smell” is attributed to Kent Beck in Martin Fowler’s book Refactoring, Improving the Design of Existing Code (Addison-Wesley, 1999, ISBN 0-201-48567-2). There are lots of online resources that will teach you how to spot smelly code and help with the deodorising so that your code ends up as pure as an Alpine breeze. As a starting point you could try Martin Fowler’s own “catalog of code smells and refactorings“, which lists both symptoms and cures, judiciously highlighted with examples. A complementary resource is Mäntylä and Lassenius’s “bad code smells taxonomy”. This groups together bad smells into a useful, recognisable and amusingly named taxonomy. As a couple of examples they have,

  • The Bloaters, including long methods, large classes, long parameter lists and data clumps (sets of data like 3 integers for RGB colours which could be encapsulated).
  • The Dispensables, anything which can, and should, be removed including lazy classes that don’t do enough, duplicated code, dead code and speculative generality (code which “might possibly be useful someday, maybe” but which incurs maintenance overheads).
  • Other classifications are the change preventers, the couplers and the object-orientation abusers.

Another useful resource is SourceMaking’s pages on refactoring which motivates refactoring before describing many code smells and their refactorings.

Static code analysis tools such as CheckStyle or Pylint can also automatically detect (but not fix, that’s your job) code smells, and these might become a useful part of an nightly test system for your software, or part of a continuous integration server.

I hope this answers your question and the resources above will help you to write more fragrant code in future!

At the Software Sustainability Institute I’m often asked - unsurprisingly – to evaluate the sustainability of software. This typically leads to a report for the developers with observations and recommendations for improvement. Wouldn’t it be better if there was some way of evaluating your own software?

There is! Having a third-party assess the state of your software in some way, be it a colleague testing the install process and documentation to provide feedback, or having the Institute perform a full evaluation is always useful. However, developing the skill to impartially self-assess your own software is invaluable. Adopting an objective ‘green’ user or developer perspective – removing your inner assumptions and knowledge of the software from the equation – can only help your software to become better.

Not only are the Institute’s processes for evaluation available for you to use yourselves, but there is now a very helpful, lightweight and quick sustainability evaluation you can do on your own. You just fill in a simple web form with details about your software, and it returns a list of recommendations (with helpful links) on how you can improve your software. Simple! It investigates a number of key areas related to the sustainability of your software, including the processes for building and installation, documentation, availability, support, licences and source code structure, amongst many others.

You can check out this nifty evaluation resource.

Testing questions about testing!

Today’s post comes courtesy of Mike Jackson, also from the Software Sustainability Institute. If the Institute was the Dukes of Hazzard television show, with Steve as Bo Duke, then Mike Jackson would surely be Luke Duke.  In this post, Mike answers a testing question about testing frameworks in Python.

Software testing is a vital part of software development. It not only allows us to demonstrate that our software satisfies its requirements but to ensure that our software is both correct and robust. Automated software testing provides us with a safety net during development, allowing us to fix bugs, make enhancements and extensions secure in the knowledge that if we break anything then the tests will catch this. After all, there are few things worse than fixing a bug to discover later that, in doing so, we’ve introduced a new one.

Philip Maechling of the Southern California Earthquake Center (SCEC), at USC, recently contacted the Institute with questions about software testing. Philip and his colleagues develop scientific software that outputs computational results into files. These files are typically simple ASCII text files but contain series’ of floating point numbers e.g. time series. Their acceptance testing involves comparing these files to existing reference result files.

Philip posed two questions:

  • Many unit test frameworks (e.g. JUnit and PyUnit) are focused around instantiating an object, or other software module, within a test class, calling methods on that module, then checking the values returned against expected values. While file comparisons can be done with such frameworks, they are complicated due to the need for floating point compares (which is tricky at the best of times), and differences in header information, or non-significant file contents. So, are you aware of any testing tools designed to support tests that are based on file-based comparisons?
  • In our file-based comparison tests, we often use the same reference files in multiple tests. In some testing circles, a directory of tests and expected test results are collected into a datastore called an “oracle”. When you want to know the correct results, you look up your test and find the expected result in this oracle. Are you aware of any software unit or acceptance testing tools that support the idea of a test oracle? The concept is simple, and we have implemented a couple of our own oracle datastores, but we seem to re-invent this each project. If there is a standard solution, I am interested in trying it out.

Question 2 is a generalisation of 1, using a set of reference files across multiple tests. As Philip comments, these reference files can be termed an “oracle”. More generally, “oracle” can be used to refer to anything which validates the outputs of a test i.e. checks that the outputs of the software during the test against the expected outputs. So, for example, in a PyUnit test that compares the outputs of a function, for some specific inputs, to some hard-coded values, the comparison code hard-coded values serve as the oracle. If a developer tests a GUI and assesses the correctness of its behaviour then they are serving as the oracle. Douglas Hoffman’s paper A Taxonomy for Test Oracles from Quality Week, 1998, gives an overview and taxonomy of oracles.

For question 1, an internet search did not reveal any Python frameworks that explicitly support tests that involve comparing floating point data files for equality. Even if a framework were available, there would still be work required by the developer to customise it towards the structure and content of their specific files. Two frameworks which adopt such a solution and provide something close to Philip’s requirements are Cram and TextTest. Cram is a framework for testing command-line applications. It runs commands and compares their outputs to expected outputs.  The outputs are compared using pattern matching and regular expressions. TextTest is similar but also has support for GUI testing. Outputs are compared directly, but filters are provided to handle run dependant content and floating point differences outwith user-defined tolerances.

One can envisage at least two general approaches to comparing output files of floating point values to reference files. The first is to:

  • Write a convertor that can be used to convert the output file data format into a simpler format containing just the floating point data.
  • Write a validator that takes in two floating point data sets and compares these, applying rounding or allowing for equality within defined tolerances.
  • Write each test to load the expected results from the reference files, the actual results from the output files, apply the convertor to both sets of results, then use the validator compare the two.

The second is to:

  • Manually convert the reference files into template files. Regular expressions can be used to both handle parts of the files that might vary across test runs (e.g. headers) as well as for specifying expected floating point values.
  • Write a validator which compares an output file to a reference file, applying the regular expressions in the reference file to assess whether the output file matches.
  • Write each test to apply the validator, comparing the output files to the reference files.

Personally, I prefer the former solution as it avoids messing around with regular expressions.

For either solution, there are a number of Python libraries that can be used to construct a possible solution. These include:

  • PyUnit, Python’s unit test library. This has test assertion commands (assertAlmostEqual and assertNotAlmostEqual) for comparing floating point values to a specific number of decimal places or within a specific tolerance.
  • Python difflib library. This provides functions to compare two files and return the lines for which they differ. This is similar to the output from CVS and SVN “diff” commands. Cram uses difflib.
  • Python re regular expression library. Cram uses re.
  • Python filecmp file comparison library.
  • An introduction to writing regular expressions for floating point numbers.
  • TextTest (source code) and Cram (source code) are both open source products and it might be possible to reuse their functionality for comparing script files.
  • Hamcrest library for building “matchers” which are useful for expressing custom comparisons. It has been ported to many languages including Python.

Mike

Choosing suitable open-source software

At the SeIUCCR Summer School in September I was asked a blinder of a question:

“How do I choose sustainable software for my project?”

Assuming an open-source context for this question, there are many things worth considering. It could be that the functionality of your software needs extended. Not wanting to re-invent the wheel, you’re looking for an appropriate library to provide that functionality. Or perhaps you have an analysis tool that outputs a certain data format that you need to post-process into an image. What should you look for in software?

It’s easy to reach for the first software package you come across that seems to do what you want. Perhaps it’s already installed in your target platform, or it’s the first thing you found on Google. But picking the wrong software can have expensive consequences if it doesn’t do everything you want or, even worse, development and support comes to a stop!

Taking a little time to make an informed choice is time well spent. So what questions can you ask about the software to find out if it’s suitable?

First off, and most obvious: does it do what you want? Be sure you know your requirements, not only what you need now, but what you need in the future. What are the goals? If it’s for a wider community, think about the goals of the user community too. If the software doesn’t meet your needs, you should check to see if the functionality can be extended, or look for more suitable software.

Have a look at the software’s support, and check for an active user and developer community. Check the forums, issue tracker and mailing lists (they should have them!) for activity and responsiveness. If you run into problems, support is your first port of call, so it must be good.

Most importantly, check that the software has a future! If the software’s development and support were to stop, you could find yourself looking around for a replacement. This is ultimately what you are trying to avoid! Some positive indicators for sustainability – in addition to a well established community – are a roadmap, a solid track record of previous releases and an actively maintained website. If you are aware of appropriate open standards that are commonly used within your research field, does the software use them?

How is the software is provided? Check that documentation is available, and whether the pre-requisites for the software are appropriate for your needs. If you plan to extend the software then access to appropriate source code is very  important – is it provided via a source code repository, and is the code in an understandable and maintainable state that you can extend?

Perhaps the most important of all, check the licence conditions of the software. If you intend to distribute the software, check that the licence allows this, and check that you can distribute any modifications or extensions you make.

Of course, with such a complex question, there’s always more to know. Check out the Software Sustainability Institute’s guide on Choosing the right open source software for your project, which goes into more detail.

Lastly, don’t be afraid to ask the software developers if you have any questions. If you get prompt and helpful responses, that’s a good indication you’ll be able to get the right support should you need it. If not, it might be time to look elsewhere. Now where did you put that list of alternatives…

Security Decay: Enter the Dragon!

Security in complex systems is always a tricky business. Consider production Grid infrastructures as an example. The intricacies of establishing working trust relationships between the users and the infrastructure, and between the systems themselves, is a mammoth task. Solving problems with such systems is also very tricky, as I’ve previously found when developing EU-wide Grid interoperability demonstrators of open standards. They appear like dragons: huge, daunting, and difficult to defeat.

The UK National Grid Service asked Steve (well, the Institute really) to help them out with their SARoNGS system. Our arrangement was very effective. The Software Sustainability Institute provided development effort for the investigation, whilst the NGS fixed issues and offered the in-depth systems knowledge that only they could provide.

So what is SARoNGS all about? The Shibboleth Access to Resources on the NGS service greatly simplifies authentication to NGS resources by accepting institutional Shibboleth credentials. It’s great for users, because they don’t need to apply for, own and use an X509 certificate. However, it appeared that the automatically generated SARoNGS certificates were being rejected by the NGS’s Workload Management Service (WMS). In short, you could no longer use SARoNGS certificates to submit jobs through the WMS without seeing a rather ominous error light up the screen:

Connection failed: CA certificate verification failed

We were warned here be dragons, but we ploughed on heedless of the danger.

You may have heard of software decay. This can occur when the environment around a piece of software changes, which leads to failures in the system as a whole. For example, an update to a dependent library or to the operating system could cause a problem. Updating one of the ubiquitous jar files in Java, only to find some of the API functions have become deprecated, can also cause grief. The good news is that there are things you can do to avoid this problem, some of which I’ve looked at already.

Security problems are often esoteric and difficult to solve. The problem could be a software dependency issue, say a newly updated security library with a bug that incorrectly interprets certificate attributes. It could also be a problem with the way in which trust relationships are defined. Sophisticated production Grid systems often trust a veritable legion of Certificate Authorities (CAs). Each CA has its own CA trust certificates, Certificate Revocation Lists (CRLs – a list of certificates not to trust) and signing policies. (I won’t get into how VOMS fits into the picture in this post, but if you’re interested, let me know.) Sorting out certificate problems can be like looking for a needle in a haystack… in a tornado. However, once identified, these issues are often easy to fix.

Systems can also fail when you haven’t changed anything at all, and this was the case with the first problem we found with SARoNGS.

Time is an important concept in security. For example, the NGS proxy certificates have a limited lifetime to reduce their vulnerability if the proxy is compromised. CRLs must also be kept up to date. The problem with expired CRLs is that they can cause the entire authentication step to fail, and this is what had happened with SARoNGS: a CRL in a critical location had expired. We updated the CRL and the first dragon lay dead on our screens!

When establishing trust in Grid systems, you need to decide which certificates to trust and where in the system to trust them. The second problem with SARoNGS was caused by two different signing policies being simultaneously. Some sites were intentionally configured to trust SARoNGS, and others were not. However, the installation of an update using the International Grid Trust Federation (IGTF) bundle meant that the UK e-Science signing policy reverted to the IGTF default: do not trust SARoNGS certificates. Again, an easy problem to fix, but a difficult one to identify. Once SARoNGS trust was reinstated in the signing policy (we used a modified NGS IGTF+ bundle) the problem was resolved and the last dragon soundly defeated.

And so the legend goes, the dragons of SARoNGS were slain. If you ever find yourself developing Grid software and run into a security brick wall, why not take a look at those conspicuous looking CRL and signing policy files? They could be dragons. And dragons need slaying!

“Programming is 10% science, 20% ingenuity, and 70% getting the ingenuity to work with the science.” – Anon

Software developers working on an academic research project will, unsurprisingly, do just that: develop software. But to what extent should a developer care about the research field in which they work? This and other related questions have popped up a few times recently, and this theme also permeated a few sessions at our Collaborations Workshop back in March.

The first thing to realise is that researchers and software developers both use long words, but they are generally different long words. Each group comprises experts (in their own discipline), but each group has its own language. For example (with elementary translations in [ ]) …

Researcher: Can you imagine what we could do with a more timely lattice energy landscape minimisation for these carbolyxate salts!? [If only we could do our calculations faster!]

Developer: Nope – does it have something to do with matrices?


Developer: Did you see that amazing multi-threaded C++ app for distributed problem solving for Android appear on the appstore last night? [Just imagine how many calculations you could do in a shorter space of time - on a smartphone!]

Researcher: Sorry, I don’t watch Star Trek

Ok, so this is a little oversimplified. But such a disparity in communication means that real opportunities are missed. How do developers make sure that they understand the importance of what researchers are saying, and vice versa?

Developers and researchers are waking up to what can be achieved if they work together, and find ways to communicate effectively. As developers, we shouldn’t expect researchers to automatically believe in our religious arguments about proper software design. But if we understand the science a little bit better, then we increase our chance of identifying ways to improve the science through what we know of software – and there lies the true benefit for researchers. You only have to look at the successes of TavernaOGSA-DAI, GridSAM and the ENGAGE projects to see what is possible when researchers and software developers come together in the right way. In the CPOSS ENGAGE project, the improved coupling of a number of software technologies enabled them to perform science that wasn’t even possible before.

Of course, taking the time to learn more about the science takes effort. But if you go that extra mile every once in a while, digging a little deeper into the science behind the software you’re writing, who knows? Software and software infrastructures have a lot to offer research, and you may just be the one to help the scientists make that important discovery.

One of the big problems with research, particularly in these austeric times, is finding the money to travel to all those great conferences. You miss opportunities to present your work, and you can miss out on discovering first hand what else is afoot in your research field.
The great news is that if you use software in your research and you have a good understanding of what’s happening in your field, funding is available from the SSI to help you with this conference travel trouble. Regardless of discipline, the SSI will pay a number of select researchers £3000 a year – in return for keeping us up to date on the latest developments in the field. This obviously helps you with disseminating your greatness and keeping up to date, while allowing us to build a network of Agents to understand which fields most need our help.
There are some compelling benefits to consider:
- Up to £3000 a year to attend the conferences and events that you want to attend
- Your advocacy will ensure that your field benefits from the best support for software development
- Add world-leading researchers to your professional network
- Free attendance at training events for new tools and technologies
- If you develop code, improve your knowledge of effective techniques for developing sustainable software
- A great addition to your CV
Not bad eh? And you don’t have to be a professor or Principle Investigator to qualify – you just need to be ‘in the know’. The SSI are looking for applicants from all disciplines, especially from those fields flagged as strategically important to UK research: the ageing population, environment and climate change, the digital economy, energy and food security.
The closing date for applications is 8th August 2011. If you’re interested, or would like to nominate someone else, why not find out more and apply:
http://software.ac.uk/join-our-agents-network
Plus, saying ‘I’m an Agent for the SSI’ sounds cool.

AgentSq72dpiOne of the big problems with research, particularly in these austere times, is finding the money to travel to all those great conferences. You miss opportunities to present your work, and you can miss out on discovering first hand what is afoot in your research field.

The great news is that if you use software in your research and you have a good understanding of what’s happening in your field, funding is available from the Software Sustainability Institute. Regardless of discipline, the Institute will pay a number of researchers £3000 a year – in return for keeping the institute up to date on the latest developments in the researcher’s field. This helps you with disseminating your greatness and keeping up to date, while allowing the institute to build a network of Agents to understand which fields most need help.

There are some compelling benefits to consider:

  • Up to £3000 a year to attend the conferences and events that you want to attend
  • Your advocacy will ensure that your field benefits from the best support for software development
  • Add world-leading researchers to your professional network
  • Free attendance at training events for new tools and technologies
  • If you develop code, improve your knowledge of effective techniques for developing sustainable software
  • A great addition to your CV

Not bad eh? And you don’t have to be a professor or Principle Investigator to qualify – you just need to be in the know. The institute are looking for applicants from all disciplines, especially from those fields flagged as strategically important to UK research: the ageing population, environment and climate change, the digital economy, energy and food security.

The closing date for applications is 8th August 2011. If you’re interested, or would like to nominate someone else, why not find out more and apply at http://software.ac.uk/join-our-agents-network?

Plus, saying ‘I’m an Agent for the SSI’ sounds cool.

A unit test framework in MATLAB?

You may recall a while back I looked at test-driven development, and covered unit testing. Well, I received a related question asking whether there was a unit test framework for MATLAB, so let’s have a quick look at a few of these…

Arguably the most popular is the xUnit Test Framework, which is compatible with MATLAB 7.6 (R2008a) or later. You can write unit tests using the standard MATLAB function files, or xUnit-style subclasses (like Java JUnit) and it has comprehensive documentation. There is also a very good technical article on Automated Software Testing for MATLAB which is aimed at researchers wanting to do unit testing in xUnit, complete with examples and advice. Greg Wilson from the excellent Software Carpentry project contributed to the article, and the Software Carpentry site has many general tutorials on using MATLAB.

There are others, including mlunit_2008a, which may also be worth a look, as well as the interesting Doctest for MATLAB, which works like doctest in Python – you embed simple tests into the function’s help description in the code. For example, you could specify in a MATLAB function:

function sum = subtract2(value)
% subtracts 2 from a number
%
% subtract2(value)
% returns (value - 2)
%
% Examples:
% >> subtract2(3)
% ans =
% 1
%
% >>subtract2([8 5])
% ans =
% 6 2
if ~ isnumeric(value) 
 error('subtract2(value) requires value to be a number'); 
end
sum = value - 2;

Then you can run doctest subtract2 and have those embedded tests returned:

TAP version 1.3
1.3
ok 1 - "subtract2(3)"
ok 2 - "subtract2([8 5])"

It would depend on what the nature of the tests you wish to write, and how complex, as to whether this approach is suitable.

’til next time!

Powered by WordPress | Theme: Motion by 85ideas.