2020-01-11 - self-documenting code

throughout my carrier i went through different projects, coding styles, conventions… and documentation stiles. some better and some worse, but over time i did start extracting useful patterns out of it.

this time i'd like to talk a little bit about a concept of “self-documenting code”. there is a reason to do it now – some time ago i went across an article explaining why self-documenting code approach does not work in practice. the author does make a lot of valid points along the way. eg. comments stating obvious like:

return temperature; // returns temperature

are brain-dead in nature. no argue here.

in my experience however, the overall conclusion of the article (“there is no such thing as self-documenting code and you're only fooling yourself thinking it is so”) is wrong.

stuff rots

we all know that code rots. fun fact – the same thing happens with documentation! more interestingly – in my experience documentation rots WAY faster than the code itself. the reasons are fairly simple:

docs are often kept aside from code, so it is easy to miss that.
code does not easily trace back to document, so 1 might not even be aware that what was just be changed, requires a documentation update as well.
even if code has comments, ppl tend to skip them (TL;DR/boring/etc).
the more comments you have, the exponentially less the chance some1 will actually read it (note that this applies to user-facing documentation as well!).
when refactoring, ppl often make “quick changes here and there” to see/check some concept/improvement. this pretty much never involves comment updates, as the changes are a quick PoC by nature… and often such changes turned out for a better and, after a few adjustments, are kept… while comments remain to reflect the previous implementation.
compilers don't read comments – if these are out of sync, most likely you'll either never learn that (comment was useless) or learn the hard way (eg. when documented invariant will not hold in practice – then it becomes misleading).

long story short – the longer docs you have, the more chance there is it will get outdated. high-level descriptions tend to be more stable (see discussion at the bottom).

documenting

to make things clear: aside from trivial projects – code DOES need documentation! i think it's just required on a different level than comments, most of the time.

basically you can document your project in 3 basic ways. let's briefly go through them.

external documents

i typically tend to lean towards having top level design (TLD) document, that boils down whole (sub?)system into a 1-2 UML(-style) diagrams, showing basic interactions/dependencies and a few words of description what is where and why. IMHO this typically should take no more than a few pages (say: 1-3 pages). if you make it longer, you're going into too many details. you'll forgot to update it and ppl will tend not to read it anyway. and those who will – will likely get outdated pieces of information. so just comment really key aspects.

the idea here is to provide a fast-start for new ppl joining/maintaining things. they should be able to quickly make head and tails of the system design. this is the sole purpose of this document. once it is in place, there is little chance it will ever need a change. unless major system redesign of course, but then updating 1-3 pages of text will suffice. hell – you can write that from scratch in no time! :)

note that such documentation comes best served… as a code. think: LaTeX, markdown, etc… then it is always alongside with the code it describes, CI generates artifact out of it and is always at hand when you need it. you might even find it handy just to open as “text source” when browsing repo.

project structure

so you have just read TLD. before you dive into “the real code”, you see a project's overall structure: loads of directories and files. if you carefully layout these around, you provide a missing link between TLD and implementation details, hidden inside the source files.

example layout that does NOT provide you any information:

include/
class1.hpp
class2.hpp
…

source/

class1.cpp
class2.cpp
…

test/

class1.cpp
class2.cpp
…

example layout that DOES carry some quality pieces of information:

FrameGrabber/
class1.hpp
class1.cpp
class1.ut.cpp
…

FaceDetect/

class2.hpp
class2.cpp
class2.ut.cpp
…

Gui/

class3.hpp
class3.cpp
class3.ut.cpp
…

well now we see camera input module, face detection module and some GUI – most likely some software that takes a camera stream and displays input images with detected faces. note that this fairly apparent, even though i kept shitty file names!

code structure

now it is time for me to comment what strikes me in the article that triggered this blog post. author gives “well documented” code example, that goes like this:

/**
 * Returns the temperature in tenth degrees Celsius
 * in range [0..1000], or -1 in case of an error.
 *
 * The temperature itself is set in the periodically
 * executed read_temperature() function.
 *
 * Make sure to call init_adc() before calling this
 * function here, or you will get undefined data.
 */
int get_temperature(void)
{
  return temperature;
}

as Scott Meyers said: “the good API is easy to use correctly and hard to use incorrectly”. the above function clearly does not meet this expectation.

how to improve things? let's get rid of the comment and replace it with something meaningful.

return int

having int with a comment? really? how about:

struct Temperature
{
  explicit Temperature(float value): value_{value} { /* check your invariants here: assert? exceptions? depends on context. */ }
  auto get() const { return value_; }
private:
  float value_;
};

feeling lazy? too much writing? you can skip invariants checking and just go for:

struct Temperature
{
  float value_;
};

very often this is good enough, yet now you have a real type, that is not to/from int convertible and carries a meaning.

what about an error? std::optional is your friend!

so now we have the very same function, w/o top part of the comment:

/**
 * The temperature itself is set in the periodically
 * executed read_temperature() function.
 *
 * Make sure to call init_adc() before calling this
 * function here, or you will get undefined data.
 */
std::optional<Temperature> get_temperature()
{
  if( g_read == -1 )
    return {};
  return Temperature{g_read/10.0f};
}

read_temperature part

this is internal implementation of the function. it should not be exposed to the user of the function anyway. if you think it should, and this is important, maybe what you're really after is a callback?

so we're down to:

/**
 * Make sure to call init_adc() before calling this
 * function here, or you will get undefined data.
 */
std::optional<Temperature> get_temperature()
{
  if( g_read == -1 )
    return {};
  return Temperature{g_read/10.0f};
}

init_adc

so we have a call sequence anti-pattern here. this is common in much of embedded code, though it can be easily addressed in multiple ways.

first you can simply check if ADC is initialized and abort if not:

std::optional<Temperature> get_temperature()
{
  ASSERT( SOME_SPECIAL_REGISTER & ADC_ENABLED );
  if( g_read == -1 )
    return {};
  return Temperature{g_read/10.0f};
}

second way is to just enable it, if it is not:

std::optional<Temperature> get_temperature()
{
  if( not (SOME_SPECIAL_REGISTER & ADC_ENABLED) )
  {
    enable_adc();
    wait_for_first_read();
  }
  if( g_read == -1 )
    return {};
  return Temperature{g_read/10.0f};
}

third way is to make it a class, so that you're not able to call get_temperature() without initializing ADC first:

struct AdcTemperature
{
  AdcTemperature()
  {
    enable_adc();
    wait_for_first_read();
  }
  ~AdcTemperature()
  {
    disble_adc();
  }
  AdcTemperature(AdcTemperature const&) = delete;
  AdcTemperature& operator=(AdcTemperature const&) = delete;
 
  std::optional<Temperature> get_temperature() const
  {
    if( g_read == -1 )
      return {};
    return Temperature{g_read/10.0f};
  }
};

if you have some super-tiny µC (like 64B of RAM, or similar), note that the class can in fact be ref-counted and/or singleton, to avoid overhead of having *this pointer around.

we're not left w/o any need of a comment. pick your favorite way to get there! :) code is now self-documenting and as a free-lunch it is now not possible to use it incorrectly! in the original code just think how often ppl would forget to check if temperature is not -1 and instead just use it directly in some computations. same goes for init_adc() part.

code comments

inside the code – most of the time there should be no comments. if you write a comment, you basically say that you failed to express yourself with the code and thus there is a good chance that you'll also fail to do this in plain text (see: Uncle Bob).

there however ARE places, where comment is meaningful. one of the best examples i have seen recently was in my previous project. i got a code review, that involved wavelet tree. nodes were packed into an array (private class member) and it was commented with nodes order, designating how the tree is build. i do not have the code in front of me now, but it was sth like this:

/*
 *     ACTNG
 *    /    \
 *   AC    TNG
 *  / \    / \
 * A   C  TN  G
 *       / \
 *      T   N
 */
stuff wt[9];

just look how much pieces of information this little ASCII art adds! think how hard it would be to reasonably encode it into code itself.

comments are good, if used with care. if you read the code, and spot a single comment – you'll read it for sure. if you have a comment every 2nd line – no1 will. ever.

kung-fu levels

self-documenting code is not avoiding documentation. it is more about making things in such a way, that documentation is in the compilation loop.

writing code is generally hard.
writing code that works correctly is even harder.
writing understandable and/or maintainable code is way harder.

you can think of it as climbing a ladder of difficulty. first you get “sth that mostly works”, then you get “sth that works stable” and then you make it so that others can maintain/understand it as well. writing self documenting code falls into 3rd category – i.e. it is really hard and most programmers don't reach that point before “senior level” (and some never – life). ppl heavily commenting code are typically around 2nd level (1st level often has neither comments nor order ;)).

note that an excellent feedback loop regarding your coding style + code/documentation balance is a code review. during this process, another human being tries to understand what you meant. getting code reviews from some1 more experienced than yourself is of a great benefit here.

btw: i've recently added a note to my CV, regarding my ambition/goal:

Being a least-competent member of a high-end professionals team.

it's always good to be surrounded by ppl you can learn from.

blog/2020/01/11/2020-01-11_-_self-documenting_code.txt · Last modified: 2021/06/15 20:09 by 127.0.0.1

Back to top

Table of Contents