-
Notifications
You must be signed in to change notification settings - Fork 8
DesignHints
We all have opinions about good design. Here are a couple of good, short papers about software design:
The rest of this page is a collection of principles, which you might find useful. They'll be organized over time. They are written in FIFO.
All these hints have exceptions. However, please limit them for the sake of your users.
Loops that depend on a single exit can be problematic. Consider this loop:
while i:
i -= 1If i is negative when it enters the loop, it will not behave like
you expect. Rather, do this:
while i <= 0:
i -= 1This ensures the loop exits when i is outside the expected range.
Another issue is when you are depending on an external event. For example, this code:
while not srdb.runner_socket_path().exists():
time.sleep(0.1)When depending on an external event, you should always have a timeout or some other counter that terminates the loop. If it never occurs, the loop will never terminate. Write the code this way:
for _ in range(30):
if srdb.runner_socket_path().exists():
break
time.sleep(0.1)
else:
pkunit.pkfail('runner daemon did not start up')Here we ensure the loop exits after 3 seconds if the file never shows up.
Programmers need to know what a program is doing at various stages of deployment. This type of output goes by many names: print, debug, trace, logging, etc. We'll just call it output in this section.
The first type of output is when you are first developing software.
Programmers often put "naked print" statements. In Python this is
print or sys.stderr.write. This is a problem if you forget that
you left that print statement in the code. Then, in production,
you'll see random, useless output, and usually not know where it
came from.
The next type of output is when you want to have a permanent output in production. In Python, people recommend to use the logging module for this. While that's not a "naked print", it is contextless output by default. In order to get context, you have to create a Logger object that puts the code execution context in the output. And, with logging, you have to specify a "level", which is a traditional means by which people specify the importance of the output. Usually, only a particular level (or higher) is output by default.
Python's logging module is too complex for this last type of output: filtered, or what is sometimes called trace output. The idea is that you would turn on "debug" output for a whole system simply by changing a level from "info" to "debug" is silly. What you want is to know about something, e.g. show me all output related to socket I/O. You want to filter on the context with a regular expression or other complex filter, not a simple integer.
In PyKern, the pkdebug module defines these three cases clearly:
- pkdp is used for contextful output during development. It should never appear in commits, and you can easily write a grep assertion to validate that it doesn't appear in commits.
- pkdlog is for permanent log messages that will always come out.
- pkdc is for filtered log messages that only come out when the filter is turned on. Otherwise, they are almost no cost.
All three methods output the context (including process id and timestamp, if configured) of the output call. This means that you can always find the output message in your code. This is crucially important when you are trying to figure out a production problem in real-time.
When an error occurs, give more information than you think the user might need. I was running into a problem with JupyterHub, and it was printing:
docker: Operation not permitted
What operation? What parameter was causing the problem? It turned out it was an NFS issue with SELinux. The print statement didn't help with that.
Dump whatever data you might have. Catch exceptions on external operations and log (don't print!) the object and operation along with the error message.
https://www.robnagler.com/2015/08/24/Dispatch-Do-not-Decorate.html
When a data structure is misaligned with an algorithm, the code gets longer and more verbose. For example,
paths = []
for lf_prop in LOADED_FILE_PROPS:
paths.extend(
map(lambda file_name:
_pkg_relative_path_static(file_type + '/' + lf_prop.dir, file_name),
schema[lf_prop.source][file_type])
)
return pathsLOADED_FILE_PROPS happens to be only used for this purpose and looks like:
LOADED_FILE_PROPS = [
pkcollections.Dict(source='external', dir='ext'),
pkcollections.Dict(source='local', dir=''),
]A better data structure would be:
LOADED_FILE_PROPS = (
('external', 'ext'),
('local', ''),
)And the algorithm is shorter and clearer:
paths = []
for prop, path in LOADED_FILE_PROPS:
paths.extend(
map(
lambda p: _pkg_relative_path_static(file_type + '/' + path, p),
schema[prop][file_type],
)
)
return pathsBonus change: Replacing file_name to f allows the lambda to be on one
line, which makes it clearer. lambdas should be short. Ideally, _pkg_relative_path_static
would be shorter, and can be, but thats another topic.
_pkg_relative_path_static in the previous section is actually used in this only one place.
The use of the lambda means there's a problem with the definition of the private function.
Here's the definition:
def _pkg_relative_path_static(file_dir, file_name):
if file_name[0] == '/':
return file_name
root = str(pkresource.filename(''))
return str(static_file_path(file_dir, file_name))[len(root):]The function could easily be a loop, and hoisting the loop invariant:
def _pkg_static_paths(prefix, paths):
root_len = len(str(pkresource.filename(''))
return map(
lambda p: p if p[0] == '/' else str(static_file_path(prefix, p))[root_len:],
paths,
)Then the use becomes:
return list(
itertools.chain_from_iterable(
_pkg_static_paths(file_type + '/' + p, schema[k][file_type])
for k, p in LOADED_FILE_PROPS,
),
)Often there's an API for some code you are writing. It's good to use other APIs. Then again, there's the anti-pattern: a little copying is better than a little dependency. This is where judgment comes in.
In the previous section, we have:
def _pkg_static_paths(prefix, paths):
root_len = len(str(pkresource.filename(''))
return map(
lambda p: p if p[0] == '/' else str(static_file_path(prefix, p))[root_len:],
paths,
)static_file_path returns a
py.path.local,
which can be used to compute the relative path. Really, though, this is computing a route from
the pkresource so make a function for it, which describes what it is doing:
def _pkg_static_paths(prefix, paths):
return map(
lambda p: p if p[0] == '/' else _static_file_uri(prefix, p),
paths,
)
def _static_file_uri(prefix, path):
return '/' + pkresource.filename('').bestrelpath(static_file_path(prefix, path))This is another case of a function use mismatch. static_file_path was only used in this case
so really, that function could be eliminated. Also, pkresource.filename('') could be a
constant.
We often have to balance between doing the simplest thing (YAGNI) and not repeating yourself (OAOO). One reason for not OAOO is that you want to decrease the cognitive load on the reader. Here's an example of a sequence of code with a high cognitive load:
axes.x.values = plotting.linspace(fullDomain[0][0], fullDomain[0][1], json.x_range[2]);
axes.y.values = plotting.linspace(fullDomain[1][0], fullDomain[1][1], json.y_range[2]);
axes.x.scale.domain(fullDomain[0]);
axes.x.indexScale.domain(fullDomain[0]);
axes.y.scale.domain(fullDomain[1]);
axes.y.indexScale.domain(fullDomain[1]);The differences:
-
axes.xandaxes.y -
fulldomain[0]andfulldomain[1] -
x_rangeandy_range - rows neither alternate nor group
xandy
The problems with making programmatic:
-
0and1are alternate names forxandy -
x_rangeandy_rangearen't directly indexable
Here's a refactoring assuming no global variables:
var i = 0;
for (d in ['x', 'y']) {
var f = fullDomain[i++];
var a = axes[d];
a.values = plotting.linspace(f[0], f[1], json[d + "_range"][2]);
a.scale.domain(f);
a.indexScale.domain(f);
}While this is more lines, each line is different. It reduces the cognitive load on the reader, because differences are differentiated.
You Aren't Gonna Need It (YAGNI)
One way to bind values is through names. Constants are the most obvious example of this principle
Encapsulation is the more general concept, where you rely on a module to abstract a value via a name, which you do not make any assumptions about.
Use introspection to cascade values throughout a program. Do
not assume a type, for example, use type to extract it.
Brian Reid helped deal with the Stanford Breakins in 1986. He got a lot of hate mail (for being the messenger). He responded with an admonition: do not make screwdriver handles out of plastic explosives. This would seem obvious, of course, but it is a hard lesson to internalize.
A modern day example of a gelignite screwdriver is app.config['SECRET_KEY']
in Flask. Miguel Grinberg, author of Flask
and its gelignitely-named SECRET_KEY feature,
warns programmers:
Many times I hear people say that user sessions in Flask are encrypted, so it is safe to write private information in them. Sadly, this is a misconception that can have catastrophic consequences for your applications and, most importantly, for your users.
Names are important, and SECRET_KEY was a poor choice. A better one
would be COOKIE_HASH_PRIVATE_KEY. The word SECRET_KEY has nothing
to do with secrets except that the value should be kept secret (private).
The SECRET_KEY is a private value used to ensure the integrity of
the hash of the clear text cookie value. Flask cookies are sent in
the clear albeit obfuscated so it's pretty easy to assume that SECRET_KEY
is used to encrypt the cookie, which is merely compressed, not encrypted.
It's unclear why Flask does not encrypt the cookie to ensure integrity and privacy. If that was done, it would be a well-designed tool. Instead, not only is it is poorly designed, it tricks the programmer into thinking it does something it doesn't. All the blog entries in the world (including this one) won't prevent programmers from misusing it.
I propose in this section a new term for comments called POSIT, which documents
the implied connections or assumptions in the subsequent piece of code. It's
different than an assertion, which can be tested (asserted). When you posit
a state in a piece of software, you are documenting a connection with another
piece of code that cannot coupled expliclity through a bound reference. The
reason might be explained in the posited comment, but it is not necessary.
The purpose is merely to document the implicit connection between two or more
pieces of code so that the reader can find all places related to the POSIT.
So, when you are starting to make an assumption, you should write it like these two examples:
# POSIT: "/jupyter" in salt-conf/srv/pillar/jupyterhub/base.yml
local notebook_dir_base=jupyterand
# POSIT: notebook_dir_base in containers/radiasoft/beamsim-jupyter/build.sh
'bind': '/home/{{ pillar.jupyterhub.jupyter_guest_user }}/jupyter',When a value becomes unposited, you won't necessarily fail fast. You probably can't know how the posited value. Something might fail, but you might not know why. The definition and reference are unbound, which means anything can happen, which is what makes posited values dangerous and something that should be avoided.
Sometimes it is easier (lazier, cleaner?) to make an implied connection between two pieces of code. It's a hard problem to always create explicitly bound names for things to be used in common with disparate code. Sometimes these unbound connections are posited in two different programming languages, two different systems/repos, etc.
We make a lot of suppositions in software. Many are almost postulates. For example, we assume the following commands on "Unix" (another basket of assumptions) do the same thing:
cd
cd $HOME
cd ~
cd ~$USERNo one would argue that these are all mean the same thing. There are syntactic differences and an error (environment variable references should be quoted). Yet, the concept being named by these four commands is the same.
It's important to acknowledge that command sequences are names for things. They identify the action know as "switching the execution thread's current directory to the user's home directory." This concept is important. It's the premise of all programming. The web relies on this view of naming: A link/URL/URI/etc. is binds a remote procedure call and its arguments into a single, structure name. We save that name in bookmarks and include it in email messages and documents such as this one.
Links do get "broken", which is a good thing. You have a name for something, and that name no longer refers to it. You get an error message. It is not posited that the name shall stay in existence forever. A "404" is a really important part of the web, because it's fail fast, which is a very important principle in software.
The introduction of web APIs has eliminated the problem of random error messages and system exits from APIs. The implementer of a web API cannot print error messages to the console of the client nor can it force the client to exit. Ordinary programming libraries (aka APIs) should follow this same principle.
System exits and print statements are side effects that assume a particular execution path, which in turn assumes the end-user has all the context necessary to interpret the event (exit or print). Libraries can't know all their uses, and by making assumptions about execution context, library designers limit their applicability.
When a library executes a system exit, it assumes that the caller cannot continue after the event that caused the system exit. That is a big assumption, especially given the complexity of execution contexts that exist today. You might be calling the function from within a web API, for example, which makes the server's job more complicated, because a response is expected on the other end of the HTTP request, always. You can't know that.
Instead of a system exit, through a reasonable exception which contains the context describing the error. If the programming language doesn't have exceptions, return an error, which describes the error. Give enough structured context that it can be passed on programmatically to the end-user. Consider that it might have to pass through an internationalization layer or be translated to some other medium besides text.
A common use for print statements is to provide information about the state of the computation. This might be to display error messages or to announce progress has been made. The former is often for programmers' eyes only and the later is often not delivered in a way that can be practical to transmit to the user -- again, think of progress bars in GUIs.
Instead of print statements, use a logger to help provide the context to the programmer debugging a problem. The log function can collect the execution context, e.g. from whence the function is called, the process ID, and the user's browser. That context can be passed along with the error message to give the programmer a complete picture of what's going on.
Programming has been around long enough that we have some pretty good conventions to follow. Any organization should agree to a set of conventions and stick to them. The more conventions, the better. Why? Less thinking. Too much thought goes into arbitrary style decisions. Despite your brilliance, you aren't likely to think up something better than what 60+ years of programming high level languages.
Being consistent is hard work at first, but you'll get used to it. It saves a lot of time, because you can read other people's code and know where to find things. We can also build tools around the style so that it makes it easier to program.
In Python, we use the Google Python Guide for the most part. We use Sphinx Napoleon Google Style Docstrings. We follow PEP 8, of course.
There are always exceptions, but less is really more in this case.
The sh module dynamically imports names like this:
sh.ls('/')The ls name does not exist in the module sh. It's dynamically
generated through some quite complex code. The intention is to save
the user "boilerplate".
The purpose of sh is to run shell commands. /bin/sh does this,
too. If you find yourself writing a lot of shell commands in a row,
the best thing to do is use /bin/sh or alternatively use
itertools or a loop to reduce the repetion.
Magic names should be used very rarely, because they make reading code very difficult. You have to assume that people will want to chase references. Make it easy for them.
Another type of magic name is an operator overload. There
was a time when people were afraid that operator overloading
would corrupt languages like C++. That didn't really
happen, but it seeems to be pattern Python. SQLAlchemy uses
== to mean "create a join" to be evaluated when the statement
executes. That's arguably useful syntax, but it could be just as
easy to say x.join(y). That's polymorphism, too, but it's
easier to find when grepping and clearer what's going on.
It's important to understand that inline operators in math
are the way they are because they are used frequently, and
brevity is something valuable in this case. It's also
arguable that simply saying add is not exactly onerous.
Using overloaded operators has a subtle cost in long term
maintenance: you can't easily find all uses of a particular
method. For example, py.path uses / to mean concatenation,
but within a string, a / means something different (a literal).
This is significant, because:
>>> os.chdir('/tmp')
>>> py.path.local('a/b')
local('/tmp/a/b')
>>> py.path.local('a')/py.path.local('b')
local('/tmp/a/tmp/b')That's a problem. Is it a bug? If it is a bug and the semantics
of py.path changes such that it produces the same result,
all the existing uses of the / operator may need to be fixed.
(This is identical to the Python 3 change to /, btw.) How do you
find all uses of /? If the name of the method was join_paths,
it would be a lot easier to find all uses.
pkunit.work_dir creates a directory relative to the test that executes the function. The directory stays around after the test finishes. The name is constant. This means you can always find the "work" done during the test after the test runs either successfully or not. This facilitates debugging.
If you must have a temporary file or directory that disappears after a program runs, make sure you can override the removal in an obvious way. That's not how tempfile.TemporaryFile works. This tool makes it hard to debug code which generates other code, which is already hard enough to debug.
The purpose of from is bring names into another module without
further qualification. Any serious Python programmer never
imports with *, but even important module attributes
explicitly is often unnecessary or confusing. If you end up
using a reference from another module multiple times, the
Rule of Three would say to encapsulate that use to avoid
repeating yourself. If you don't, then the further qualification
necessary to use a module's attribute is not only
clearer, but also not much extra typing.
When maintaining lots of code, you are unlikely to know all the modules involved. You need an easy path to finding the code and/or documentation. Unqualified imports require an extra step for the reader to chase down the reference. The writer is increasing readability at the moment of writing, not for the reader who shows up two years later to figure out a bug.
When you specify a package, avoid specifying a version. With Docker, we will get the latest version of the package unless otherwise specified.
The problem with specifying a version is that you get version conflict very easily with other packages that require newer version of the package.
With local libraries, you can accidentally get your "editable versions" uninstalled. Unfortunately, an "editable version" has a version number specified, and that version doesn't increase as you edit the code. It stays the same. If a newer version of the library gets installed in pip, a pip install might install the library twice.
More info about setup.py:
- https://www.b-list.org/weblog/2018/apr/25/lets-talk-about-packages
- https://packaging.python.org/discussions/install-requires-vs-requirements/
- https://caremad.io/posts/2013/07/setup-vs-requirement/
Backpatching can sneak into a design pretty easily, especially emergent designs. It starts out small, which is fine, but sometimes it gets big, and turns into convoluted control flow. At this point, you want to swtich to multipass programming. Fundamentally, backpatching was an optimization technique. In emergent design, it is cause by deferred refactoring.
Example: RSConf was generating a file called /etc/postfix/local-host-names by having modules backpatch the file. It also was backpatching /etc/postfix/main.cf using postconf. The refactoring to multipass greatly simplified the code and explicitly coupled at build time instead of at run-time. This not only simplified the code it made it more robust.