Carl Kingsford and Phillip Compeau
In the Computational Biology Department in the School of Computer Science at Carnegie Mellon, we have piloted using Go as an introductory programming language in our course 02-201: Programming for Scientists. We chose Go for a number of reasons. Its well-defined syntax, similarity to C, Java, and related languages, its web-based playgrounds, its cross-platform compiler, and its easy-to-use build environment all make it an attractive language for an introductory course. It is also useful that Go has explicit pointers, which allow that important concept to be introduced, and the treatment of pointers is very nice — students can quickly learn to use pointers, understand them, and then mostly not worry about them. Go’s built-in parallelism allows us to get even novice programmers to experiment with parallel programming a bit by the end of the course. Students who take the course largely appreciate Go. Nevertheless, there are some aspects of Go that could be improved to increase its suitability as an introductory programming language.
Why should Go care about teachability? The first is natural selection: languages that are easier to learn have more programmers and often last longer. The second is Richard Feynman’s classic proverb that if one can’t reduce something “to the freshman level. That means we really don’t understand it.” A Go that maximizes teachability likely has had its rough edges well sanded. Two recent changes in Go were steps in this direction: a default GOPATH value, which eliminates a setup step that occurs early in the class and trips up some novices, and setting GOMAXPROCS to the number of processors by default — allowing parallel programs to actually run in parallel without an additional “make it work” step.
Challenges Using Go for Teaching
Below are a few of the main sticking points we’ve encountered that confuse students or make explaining Go more difficult than it needs to be. Most of these aspects of Go make sense when considering Go as a production language — generally, these are not “flaws” in the language. But they are suboptimal features when employing Go as an instructional language.
1. The “package main” statement. The first program a student writes is some kind of “hello, world” program. The first instruction of that program in Go is `package main`, and this requires the first statement of a teacher going through a short example program to be “don’t worry about packages, we’ll cover them later.” Starting your explanation with an unexplained mystery is immediately off-putting. Of course, a short diversion into the value of packages and what they do could be undertaken, but this would fall on deaf ears — packages solve a problem a truly beginner programmer doesn’t yet know exists.
2. The term “slice”. Slices are so fundamental in Go that, for interesting assignments, it makes sense to introduce slices early on. However, the name “slice” raises the question: slice of what? So that necessitates introducing arrays — which in our experience are rarely used directly in Go. The ideal situation is that slices can be introduced as an abstract list-like data structure and the implementation detail that they are based on arrays covered later. The name “slice” is the only thing stopping cleanly following that order now. An instructor is either forced to drop down one level of abstraction to explain how slices are backed by arrays or say something like “the name slice will make sense eventually.”
3. int vs. int32 vs. int64. The `int` type in Go is strange. It is an integer of undetermined size (depending on the machine) and not the same type as either int32 or int64. When first starting programming, it does not make sense to delve immediately into “bits” or word sizes, so naturally it makes sense to introduce the `int’ type in isolation without mentioning `int32`, etc. However, variables of type intXX creep into student code (usually due to the use of a standard library routine) and then confusion arises since `int` and `int64` (or int32) are not the same type.
4. Easy number-string conversions. Early on, we want students to write programs that take user input. Often this input is in the form of numbers (e.g. how many steps of a random walk to take) on the command line. This requires the introduction of the strconv package and the Atoi / Itoa functions. We C programmers from 20 years ago appreciate seeing these functions live on (though the “A” in Atoi is somewhat unintuitive to new programmers), and crucially they allow instructors to talk about the difference between a string representing “42” and the integer 42. But that topic is prematurely forced since `int(“42”)` doesn’t work, while string([]byte), float64(int), float32(float64(x)), and rune(int64(x)) all work fine. Note that `float64(int8)` may allocate new memory, `float32(float64(x))` may lose data, and `string([]byte)` fundamentally changes how a language construct (`for … range`) operates on the data, so data loss, memory allocation, and language semantics are not inviolate reasons to disallow `int(string)` or `string(int)`. For the later case, one might object that strings are large data structures, and the compiler can’t just create them without some explicit memory allocation like `new` or `make`. But this is not generally true in Go: `f(a)` where f expects an [x]int array type, and `s1 + s2` where s1 and s2 are strings both create new strings or arrays from existing data “behind the scenes”. In addition, the type assertion `x.(t)` may panic, so catastrophic failure of the conversion is not disqualifying in Go.
5. Lack of built-in precondition, postcondition, assertion and loop_invariant mechanisms. When getting students to think about breaking a big problem down into smaller steps, programmatically-checked preconditions and postconditions for functions are helpful. Several other courses at CMU use a variant of C called C0 that includes these features explicitly. Preconditions and assertions can be provided via a library. Postconditions are harder. Go’s defer mechanism is tantalizingly close; assuming assert() were defined:
defer func() {assert(final_value < 10)}()
provides a way to check postconditions. This has two problems: first, it’s long-winded. Second, it requires any variables involved in the postcondition to be declared before the defer statement executes, and named return values must be used if they are to be referred to. This is sometimes limiting, since postconditions often involve variables created during the course of the function (though C0 has the same limitations on its postconditions).
6. Assignment to fields in a map of structs. Currently, the following code is illegal in Go:
type point struct {
x,y int
}
m := make(map[int]point)
m[0] = point{}
m[0].x = 10 // *** illegal
This gives the error “cannot assign to struct field m[0].x in map”. Maps of structs arise when teaching using Go because we’d like to be able to have assignments that involve maps and data elements (like points) before teaching pointers. The strangeness above impedes this. Why? Non-addressability prohibits &m[0], but there is no reason that `m[0].x = 10` can’t desugar into
tmp := m[0]
tmp.x = 10
m[0] = tmp
This is the current workaround for the disallowed m[0].x=y, but it creates unnecessary ugliness in this common case.
7. Serialization of random number generation. A natural assignment when introducing parallelism is a Monte Carlo simulation of some sort, where each goroutine does one trial, making a series of calls to obtain a random number from “math/rand”. When using the convenience functions, the calls to (e.g.) rand.Int() are serialized because they access the same source of randomness. So each goroutine needs to create its own source of randomness using:
r := rand.New(rand.NewSource(time.Now().UnixNano())
which is somewhat unwieldy.
8. Linguistically, the statement `for x < max_x` is less than ideally clear. `For` is being used here in its C-derived meaning, not its English meaning. A small point, but when students are trying to grapple with flow control for the first time, it’s not clear that `for` introduces a loop, or what the statement `for x < max_x` means when read aloud. (A similar problem exists with the assignment operator “=” vs. “==”: to a novice `x = 10; y = 20; x = y` seems like a set of contradictory statements, and `if x = 30` seems completely reasonable. The use of = in this counterintuitive way is so ingrained now in programming that we have to live with it; we tell students that “equals-equals equals equals”). While it’s nice in some ways that there is only a single type of loop in Go, it would better for teaching if the syntax more closely matched the pseudocode that is introduced at the start of a programming course. Pseudocode invariably uses the term “while”, providing evidence that this term more closely matches the concept.
Suggested Changes to Go
We suggest the following modifications, none of which break existing Go programs:
1. A file with a `main()` function that omits the `package main` declaration is assumed to be `package main`. The omission of `package main` could also be a signal to enable other language modifications such as those discussed below if they are considered not appropriate for common use. It could also by default import several packages (e.g. “fmt” and “edu”) if they are used in the code. This also reduces the friction for Go’s use as a tool for 1-off scripts in a minor way.
2. Rename “slices” to “lists”. This is a documentation change only. An additional advantage of the name “list” is that it further abstracts what are now called slices from their implementation (as parts of arrays) — this encapsulation would allow the idea of lists to grow more independent of their implementation.
3. Make int be a type alias to intXX (where XX is the machine word size) similar to rune and byte. For consistency, uint should similarly be an alias to uintXXX. The behavior of programs that use `int` would be unchanged. `int` would still correspond to the word size of the machine and still corresponds to a undetermined size. Programs that rely on a known int size would still need to use intXX. On (say) 64-bit machines int and int64 would be interchangeable. This is more consistent with `byte`.
4.1. Support int(string) and floatXX(string) conversions that assume base-10 encoded integers. These would panic if the string could not be converted. `v = int(s)` would be syntactic sugar for the following code:
v, err = strconv.Atoi(s)
if err != nil {
panic(strconv.ErrSyntax)
}
The conversion v = floatXX(s) from string s would be equivalent to the code:
v, err = strconv.ParseFloat(s, XX)
if err != nil {
panic(strconv.ErrSyntax)
}
For consistency, intXX(string) should probably work as well.
4.2. Support `string(int)` and `string(floatXX)` conversions that assume base-10. These are syntactic sugar for the calls to `strconv.Itoa()` and `strconv.FormatFloat(f, ‘g’, -1, XX)`. For user-defined types that support Stringer, string(x) should call x.String().
Both the “string->number” and “number->string” conversions do not break any existing programs since they currently generate a compiler error. The downside for supporting `number(string)` is error handling: by panicking on error, it requires the programmer to ensure that the conversion will succeed before applying the conversion (it’s similar in this respect to a type assertion x.(t).). This conversion is not ultimately the right way to do number parsing from user input data since typically the error should be dealt with rather than raising a panic. In this respect it’s like `println` in Go: a feature that exists only as an onramp to the right way to do things.
5.1. Add package `edu` to the standard library with the following functions:
edu.Assert(bool, string, ...interface{})
edu.LoopInvariant(bool, string, ...interface{})
edu.Requires(bool, string, ...interface{})
edu.Ensures(bool, string, ...interface{})
If the first argument is false, these functions print the given message (formatted as in Printf), likely with some information obtained from runtime.Caller(1) and panics. The last of these functions allows post-conditions to be expressed using:
defer func() {edu.Ensures(...)}()
The package also should contain the following functions:
edu.AssertFunc(func()bool, string, ...interface{})
edu.LoopInvariantFunc(func()bool, string, ...interface{})
edu.RequiresFunc(func()bool, string, ...interface{})
edu.EnsuresFunc(func()bool, string, ...interface{})
That operate the same as the previous 4 functions if the passed-in function returns false.
5.2. Add a built-in function ensures(boolean, string, ...interface{})
that desugars into defer func() {edu.Ensures(boolean, string, ...interface{})}()
. As with other builtin functions, `ensures` would not be a reserved word. Programs that use `ensures` as an identifier now would continue to work (but obviously won’t be able to use the new ensures function in the same scope).
6. Allow `m[key].field = value` as a conceptual shorthand for `tmp:=m[key]; tmp.field=value; m[key]=tmp`. Since the original statement is illegal currently, no existing Go programs break. One might object that copying an entire struct to change a field is expensive, but this is (a) no more expensive than the current workaround, (b) no more expensive than passing a struct to a function now, and (c) the compiler is free to optimize this statement.
7. Add a convenience function to “math/rand” that creates a new, non-concurrency-safe random number generator using the current Unix time:
func NewFromTime() *Rand {
return rand.New(rand.NewSource(time.Now().UnixNano())
}
We also propose the following change, which does break existing Go programs:
8. Make `while` be a synonym for `for`. Alternatively, `while` could require that the initialization and increment parts of the loop be empty. I.e. `while i := 0; i < 10; i++` would be illegal, and only `while COND` would be supported. From a teaching perspective, it is sufficient if this change is enabled only when `package main` is omitted.
Conclusion
Go is a great teaching language — it grows from introductory course to production systems well. Many of the points above deal with allowing better topological sorting of the concepts of Go: being able to introduce things in a logical, linear order, while continuously expanding the interesting assignments that can be given to the students.