Matthew does an excellent job in bringing together two independent but highly interdependent concepts: cloud native computing and use of the Go language. The merger creates a must-read for anyone working (or wanting to work) in the cloud native application space.
Lee Atchison
Author, Architecting for Scale (O’Reilly)
Technical books can be hard to consume, but Matthew Titmus does an amazing job of telling a story that is easy to follow and includes working examples to help readers really understand the concepts being presented.
Celeste Stinger
Senior Site Reliability Engineer, Limit Break
This is the first book I’ve come across that covers such a breadth and depth of modern cloud native practices in such a practical way. The patterns presented here have clear examples to solve real problems that are faced by engineers on a daily basis.
Alvaro Atienza
Software Engineer, Aktos
Matt’s expertise in the art and science of building reliable systems in a fundamentally unreliable world are clearly (and humorously) captured in the pages within. Join him as he introduces you to the fundamental building blocks and system designs that enable large-scale, reliable systems to be constructed from the ephemeral and unreliable components that comprise the underlying cloud infrastructure of today’s modern computing environment.
David Nicponski
Former Principal Engineer, Robinhood
Over the past few years, two infrastructure trends have been happening: Go has been increasingly used for infrastructure, in addition to backend, and the infrastructure is moving to the cloud. This book summarizes the state of the art of the combination of the two.
Natalie Pistunovich
Lead Developer Advocate, Aerospike
I came in knowing next to nothing about Go and left feeling like an expert. I would go so far as to say that simply reading this book made me a better engineer.
James Quigley
Staff Site Reliability Engineer, Oscar Health
Second Edition
Building Reliable Services in Unreliable Environments
Indicates new terms, URLs, email addresses, filenames, and file extensions.
Constant widthUsed for program listings, as well as within paragraphs to refer to program elements such as variable or function names, databases, data types, environment variables, statements, and keywords.
Constant width boldShows commands or other text that should be typed literally by the user.
Constant width italicShows text that should be replaced with user-supplied values or by values determined by context.
1 Including CNCF Sandbox, Incubating, and Graduated code-based (nonspecification) projects, as of February 2024.
The most dangerous phrase in the language is, “We’ve always done it this way.”1
Grace Hopper, Computerworld (January 1976)
Cloud native technologies empower organizations to build and run scalable applications in modern, dynamic environments such as public, private, and hybrid clouds….
These techniques enable loosely coupled systems that are resilient, manageable, and observable. Combined with robust automation, they allow engineers to make high-impact changes frequently and predictably with minimal toil.6
Cloud Native Computing Foundation, CNCF Cloud Native Definition v1.0
A system can be vertically scaled (or scaled up) by upsizing (or downsizing) the hardware resources that are already allocated to it—for example, by adding memory or CPU to a database that’s running on a dedicated computing instance. Vertical scaling has the benefit of being technically relatively straightforward, but any given instance can be upsized only so much.
A system can be horizontally scaled (or scaled out) by adding (or removing) service instances. For example, this can be done by increasing the number of service nodes behind a load balancer or containers in Kubernetes, or another container orchestration system. This strategy has a number of advantages, including redundancy and freedom from the limits of available instance sizes. However, more replicas mean greater design and management complexity, and not all services can be horizontally scaled.
1 Esther Surden, “Privacy Laws May Usher in Defensive DP: Hopper,” Computerworld, January 26, 1976, 9.
2 Which is Go. Don’t get me wrong—this is still a Go book after all.
3 A kludge is “an awkward or inelegant solution.” It’s a fascinating word with a fascinating history.
4 Have you ever wondered why so many Kubernetes migrations fail?
5 Especially for me. I get to write this cool book.
6 Cloud Native Computing Foundation, “CNCF Cloud Native Definition v1.0”, GitHub, December 7, 2020.
7 Those of us who remember the Browser Wars of the 1990s will recall that this wasn’t always strictly true.
8 Or if every website required a different browser. That would stink, wouldn’t it?
9 If you’re interested in a complete academic treatment, I highly recommend Reliability and Availability Engineering by Kishor S. Trivedi and Andrea Bobbio (Cambridge University Press, 2017).
10 There are some wrong ones though.
11 Plus, they both start with M. Super confusing.
12 That time was the 1990s.
13 International Telecommunication Union (ITU), “Internet users per 100 inhabitants 1997 to 2007” and “Internet users per 100 inhabitants 2005 to 2017,” ICT Data and Statistics (IDS).
Any intelligent fool can make things bigger, more complex, and more violent. It takes a touch of genius—and a lot of courage—to move in the opposite direction.1
E. F. Schumacher, “Small Is Beautiful” (August 1973)
Code had become too hard to read. Unnecessary bookkeeping and repetition was compounded by functionally overlapping features that often encouraged cleverness over clarity.
Language construction and years of feature creep resulted in build times that ran for minutes or hours, even on large build clusters.
Many programmers responded to the aforementioned problems by adopting more fluid, dynamic languages, effectively trading efficiency and type safety for expressiveness.
Incompatibilities between even minor versions of a language, as well as any dependencies it may have (and its transitive dependencies!), often made updating an exercise in frustration.
typeShapeinterface{// Any Shape must have an AreaArea()float64}typeRectanglestruct{// Rectangle doesn't explicitlywidth,heightfloat64// declare itself to be a Shape}func(Rectangler)Area()float64{// Rectangle has an Area method; itreturnr.width*r.height// satisfies the Shape interface}
Do not communicate by sharing memory. Instead, share memory by communicating.
mtitmus:~/workspace/kubernetes[MASTER]$ time make make 67.02s user 32.51s system 247% cpu 40.231 total
| C++ | Go | Java | NodeJS | Python3 | Ruby | Rust | |
|---|---|---|---|---|---|---|---|
Fannkuch-Redux |
|
|
|
|
|
|
|
FASTA |
|
|
|
|
|
|
|
K-Nucleotide |
|
|
|
|
|
|
|
Mandelbrot |
|
|
|
|
|
|
|
N-Body |
|
|
|
|
|
|
|
Spectral norm |
|
|
|
|
|
|
|
a Isaac Gouy, The Computer Language Benchmarks Game website, June 20, 2023. | |||||||
my_variable=0whilemy_variable<10:my_varaible=my_variable+1# Typo! Infinite loop!
defsend(message,recipient):
1 E. F. Schumacher, “Small Is Beautiful,” The Radical Humanist (August 1973): 22.
2 These were “cloud native” services before the term cloud native was coined.
3 Of course, they weren’t called “cloud native” at the time; to Google they were just “services.”
4 Rob Pike, “Go at Google: Language Design in the Service of Software Engineering”, Google, 2012.
5 In languages that use duck typing, the type of an object is less important than the methods it defines. In other words, “If it walks like a duck and it quacks like a duck, then it must be a duck.”
6 Dave Cheney, “Clear Is Better than Clever”, The Acme of Foolishness (blog), July 19, 2019.
7 C. A. R. Hoare, “Communicating Sequential Processes”, Communications of the ACM 21, no. 8 (August 1978): 666–77.
8 At least among the “mainstream” languages, whatever that means.
9 Andrew Gerrand, “Concurrency Is Not Parallelism”, The Go Blog, January 16, 2013.
10 C++. We’re talking about C++.
11 Not counting comments; Openhub.net. “Kubernetes”, Open Hub, Black Duck Software, Inc., accessed June 20, 2023.
12 The Go Team, “Go 1 and the Future of Go Programs”, The Go Documentation.
13 Anybody remember Java 1.1? I remember Java 1.1. Sure, we didn’t have generics or autoboxing or enhanced for loops back then, but we were happy. Happy, I tell you.
14 We finally got generics! Go, fightin’ Parametric Polymorphics!
15 The Go Team. “Proposing Changes to Go”, GitHub, accessed August 7, 2019.
16 Rob Pike, “Sydney Golang Meetup—Rob Pike—Go 2 Draft Specifications”, YouTube, November 13, 2018.
17 Take that, Java.
18 Take that, Python.
19 Few arguments in programming generate as many snarky comments as static versus dynamic typing, except perhaps the Great Tabs versus Spaces Debate, on which Go’s unofficial position is “Shut up, who cares?”
20 I, too, have been lauded for my forgiving nature and somewhat English-like syntax.
21 Or for the same human after a few months of thinking about other things.
A language that doesn’t affect the way you think about programming is not worth knowing.1
Alan Perlis, ACM SIGPLAN Notices (September 1982)
and:=true&&falsefmt.Println(and)// "false"or:=true||falsefmt.Println(or)// "true"not:=!truefmt.Println(not)// "false"
int8, int16, int32, int64
uint8, uint16, uint32, uint64
float32, float64
varxcomplex64=3.1415ifmt.Println(x)// "(0+3.1415i)"
// The interpreted form"\"Hello\nworld!\""// The raw form`"Helloworld!"`
varnametype=expression
name:=expression
variintvarffloat64varbboolvarsstring
fmt.Printf("integer: %d\n",i)// integer: 0fmt.Printf("float: %f\n",f)// float: 0.000000fmt.Printf("boolean: %t\n",b)// boolean: falsefmt.Printf("string: %q\n",s)// string: ""
str:="world"_,err:=fmt.Printf("Hello %s\n",str)iferr!=nil{// Do something}
import_"github.com/lib/pq"
constlanguagestring="Go"varfavoritebool=truefuncmain(){consttext="Does %s rule? %t!"varoutput=fmt.Sprintf(text,language,favorite)fmt.Println(output)// "Does Go rule? true!"}
A fixed-length sequence of zero or more elements of a particular type
An abstraction around an array that can be resized at runtime
An associative data structure that allows distinct keys to be arbitrarily paired with, or “mapped to,” values
vara[3]int// Zero-value array of type [3]intfmt.Println(a)// "[0 0 0]"fmt.Println(a[1])// "0"a[1]=42// Update second indexfmt.Println(a)// "[0 42 0]"fmt.Println(a[1])// "42"i:=a[1]fmt.Println(i)// "42"
b:=[3]int{2,4,6}
b:=[...]int{2,4,6}
fmt.Println(len(b))// "3"fmt.Println(b[len(b)-1])// "6"
n:=make([]int,3)// Create an int slice with 3 elementsfmt.Println(n)// "[0 0 0]"fmt.Println(len(n))// "3"; len works for slices and arraysn[0]=8n[1]=16n[2]=32fmt.Println(n)// "[8 16 32]"
m:=[]int{1}// A literal []int declarationfmt.Println(m)// "[1]"
m=append(m,2)// Append 2 to mfmt.Println(m)// "[1 2]"
m=append(m,2)// Append to m from the previous snippetfmt.Println(m)// "[1 2]"m=append(m,3,4)fmt.Println(m)// "[1 2 3 4]"m=append(m,m...)// Append m to itselffmt.Println(m)// "[1 2 3 4 1 2 3 4]"
s0:=[]int{0,1,2,3,4,5,6}// A slice literalfmt.Println(s0)// "[0 1 2 3 4 5 6]"
s1:=s0[:4]fmt.Println(s1)// "[0 1 2 3]"s2:=s0[3:]fmt.Println(s2)// "[3 4 5 6]"
s0[3]=42// Change reflected in all 3 slicesfmt.Println(s0)// "[0 1 2 42 4 5 6]"fmt.Println(s1)// "[0 1 2 42]"fmt.Println(s2)// "[42 4 5 6]"
s:="foö"// UTF-8: f=0x66 o=0x6F ö=0xC3B6r:=[]rune(s)b:=[]byte(s)
fmt.Printf("%T %v\n",s,s)// "string foö"fmt.Printf("%T %v\n",r,r)// "[]int32 [102 111 246]"fmt.Printf("%T %v\n",b,b)// "[]uint8 [102 111 195 182]"
freezing:=make(map[string]float32)// Empty map of string to float32freezing["celsius"]=0.0freezing["fahrenheit"]=32.0freezing["kelvin"]=273.2fmt.Println(freezing["kelvin"])// "273.2"fmt.Println(len(freezing))// "3"delete(freezing,"kelvin")// Delete "kelvin"fmt.Println(len(freezing))// "2"
freezing:=map[string]float32{"celsius":0.0,"fahrenheit":32.0,"kelvin":273.2,// The trailing comma is required!}
foo:=freezing["no-such-key"]// Get non-existent keyfmt.Println(foo)// "0" (float32 zero value)
newton,ok:=freezing["newton"]// What about the Newton scale?fmt.Println(newton)// "0"fmt.Println(ok)// "false"
The address of a named variable can be retrieved by using the & operator. For example, the expression p := &a will obtain the address of a and assign it to p.
The variable p, which you can say “points to” a, has a type of *int, where the * indicates that it’s a pointer type that points to an int.
To retrieve the value of the value a from p, you can dereference it using a * before the pointer variable name, allowing us to indirectly read or update a.
p := &a gets the address of a and assigns it to p.varaint=10varp*int=&a// p of type *int points to afmt.Println(p)// "0x0001"fmt.Println(*p)// "10"*p=20// indirectly update afmt.Println(a)// "20"
varn*intvarx,yintfmt.Println(n)// "<nil>"fmt.Println(n==nil)// "true" (n is nil)fmt.Println(x==y)// "true" (x and y are both zero)fmt.Println(&x==&x)// "true" (&x is equal to itself)fmt.Println(&x==&y)// "false" (different vars)fmt.Println(&x==nil)// "false" (&x is not nil)
sum:=0fori:=0;i<10;i++{sum+=1}fmt.Println(sum)// "10"
sum,i:=0,0fori<10{// Equivalent to: for ; i < 10;sum+=ii++}fmt.Println(i,sum)// "10 45"
fmt.Println("For ever...")for{fmt.Println("...and ever")}
s:=[]int{2,4,8,16,32}// A slice of intsfori,v:=ranges{// range gets each index/valuefmt.Println(i,"->",v)// Output index and its value}
0 -> 2 1 -> 4 2 -> 8 3 -> 16 4 -> 32
a:=[]int{0,2,4,6,8}sum:=0for_,v:=rangea{sum+=v}fmt.Println(sum)// "20"
m:=map[int]string{1:"January",2:"February",3:"March",4:"April",}fork,v:=rangem{fmt.Println(k,"->",v)}
3 -> March 4 -> April 1 -> January 2 -> February
if7%2==0{fmt.Println("7 is even")}else{fmt.Println("7 is odd")}
if_,err:=os.Open("foo.ext");err!=nil{fmt.Println(err)}else{fmt.Println("All is fine.")}
_,err:=os.Open("foo.go")iferr!=nil{fmt.Println(err)}else{fmt.Println("All is fine.")}
i:=0switchi%3{case0:fmt.Println("Zero")fallthroughcase1:fmt.Println("One")case2:fmt.Println("Two")default:fmt.Println("Huh?")}
Zero One
hour:=time.Now().Hour()switch{casehour>=5&&hour<9:fmt.Println("I'm writing")casehour>=9&&hour<18:fmt.Println("I'm working")default:fmt.Println("I'm sleeping")}
switchhour:=time.Now().Hour();{// Empty expression means "true"casehour>=5&&hour<9:fmt.Println("I'm writing")casehour>=9&&hour<18:fmt.Println("I'm working")default:fmt.Println("I'm sleeping")}
file,err:=os.Open("somefile.ext")iferr!=nil{log.Fatal(err)returnerr}
typeerrorinterface{Error()string}
e1:=errors.New("error 42")e2:=fmt.Errorf("error %d",42)
typeNestedErrorstruct{MessagestringErrerror}func(e*NestedError)Error()string{returnfmt.Sprintf("%s\n contains: %s",e.Message,e.Err.Error())}
funcadd(xint,yint)int{returnx+y}funcmain(){sum:=add(10,5)fmt.Println(sum)// "15"}
funcfoo(iint,jint,astring,bstring){/* ... */}funcfoo(i,jint,a,bstring){/* ... */}
funcswap(x,ystring)(string,string){returny,x}
a,b:=swap("foo","bar")
funcfactorial(nint)int{ifn<1{return1}returnn*factorial(n-1)}funcmain(){fmt.Println(factorial(11))// "39916800"}
funcmain(){deferfmt.Println("cruel world")fmt.Println("goodbye")}
goodbye cruel world
funcmain(){file,err:=os.Create("/tmp/foo.txt")// Create an empty fileiferr!=nil{return}defercloseFile(file)// Ensure closeFile(file) is called_,err=fmt.Fprintln(file,"Your mother was a hamster")iferr!=nil{return}fmt.Println("File written to successfully")}funccloseFile(f*os.File){iferr:=f.Close();err!=nil{fmt.Println("Error closing file:",err.Error())}else{fmt.Println("File closed successfully")}}
File written to successfully File closed successfully
funcmain(){deferfmt.Println("world")deferfmt.Println("cruel")deferfmt.Println("goodbye")}
goodbye cruel world
funcmain(){x:=5zeroByValue(x)fmt.Println(x)// "5"zeroByReference(&x)fmt.Println(x)// "0"}funczeroByValue(xint){x=0}funczeroByReference(x*int){*x=0// Dereference x and set it to 0}
funcupdate(mmap[string]int){m["c"]=2}funcmain(){m:=map[string]int{"a":0,"b":1}fmt.Println(m)// "map[a:0 b:1]"update(m)fmt.Println(m)// "map[a:0 b:1 c:2]"}
funcPrintf(formatstring,a...any)(nint,errerror){}
constname,age="Kim",22fmt.Printf("%s is %d years old.\n",name,age)
funcproduct(factors...int)int{p:=1for_,n:=rangefactors{p*=n}returnp}funcmain(){fmt.Println(product())// "1"fmt.Println(product(2,2,2))// "8"}
m:=[]int{3,3,3}fmt.Println(product(m...))// "27"
funcsum(x,yint)int{returnx+y}funcproduct(x,yint)int{returnx*y}funcmain(){varffunc(int,int)int// Function variables have typesf=sumfmt.Println(f(3,5))// "8"f=product// Legal: product has same type as sumfmt.Println(f(3,5))// "15"}
funcincrementor()func()int{i:=0returnfunc()int{// Return an anonymous functioni++// "Closes over" parent function's ireturni}}
funcmain(){increment:=incrementor()fmt.Println(increment())// "1"fmt.Println(increment())// "2"fmt.Println(increment())// "3"increment2:=incrementor()fmt.Println(increment2())// "1"}
typeVertexstruct{X,Yfloat64}funcmain(){varvVertex// Structs are never nilfmt.Println(v)// "{0 0}"v=Vertex{}// Explicitly define an empty structfmt.Println(v)// "{0 0}"v=Vertex{1.0,2.0}// Defining fields, in orderfmt.Println(v)// "{1 2}"v=Vertex{Y:2.5}// Defining specific fields, by labelfmt.Println(v)// "{0 2.5}"}
funcmain(){v:=Vertex{X:1.0,Y:3.0}fmt.Println(v)// "{1 3}"v.X*=1.5v.Y*=2.5fmt.Println(v)// "{1.5 7.5}"}
funcmain(){varv*Vertex=&Vertex{1,3}fmt.Println(v)// &{1 3}v.X,v.Y=v.Y,v.Xfmt.Println(v)// &{3 1}}
func(v*Vertex)Square(){// Attach method to the *Vertex typev.X*=v.Xv.Y*=v.Y}funcmain(){vert:=&Vertex{3,4}fmt.Println(vert)// "&{3 4}"vert.Square()fmt.Println(vert)// "&{9 16}"}
typeMyMapmap[string]intfunc(mMyMap)Length()int{returnlen(m)}funcmain(){mm:=MyMap{"A":1,"B":2}fmt.Println(mm)// "map[A:1 B:2]"fmt.Println(mm["A"])// "1"fmt.Println(mm.Length())// "2"}
typeShapeinterface{Area()float64}
typeCirclestruct{Radiusfloat64}func(cCircle)Area()float64{returnmath.Pi*c.Radius*c.Radius}typeRectanglestruct{Width,Heightfloat64}func(rRectangle)Area()float64{returnr.Width*r.Height}
funcPrintArea(sShape){fmt.Printf("%T's area is %0.2f\n",s,s.Area())}funcmain(){r:=Rectangle{Width:5,Height:10}PrintArea(r)// "main.Rectangle's area is 50.00"c:=Circle{Radius:5}PrintArea(c)// "main.Circle's area is 78.54"}
varsShapes=Circle{}// s is an expression of Shapec:=s.(Circle)// Assert that s is a Circlefmt.Printf("%T\n",c)// "main.Circle"
typeReaderinterface{Read(p[]byte)(nint,errerror)}typeWriterinterface{Write(p[]byte)(nint,errerror)}
typeReadWriterinterface{ReaderWriter}
typeReadWriterstruct{*Reader*Writer}
varrw*bufio.ReadWriter=GetReadWriter()varbytes[]byte=make([]byte,1024)n,err:=rw.Read(bytes){// Do something}
funcUseReader(r*bufio.Reader){fmt.Printf("We got a %T\n",r)// "We got a *bufio.Reader"}funcmain(){varrw*bufio.ReadWriter=GetReadWriter()UseReader(rw.Reader)}
rw:=&bufio.ReadWriter{Reader:&bufio.Reader{},Writer:&bufio.Writer{}}
funcCount(ss[]string)map[string]int{m:=make(map[string]int)for_,s:=rangess{m[s]++}returnm}
funcCount[Tcomparable](ss[]T)map[T]int{m:=make(map[T]int)for_,s:=rangess{m[s]++}returnm}
ss:=[]string{"a","b","b","c","c","c"}m:=Count[string](ss)
typeTree[Tany]struct{left,right*Tree[T]valueT}func(t*Tree[T])Lookup(xT)*Tree[T]{...}
varstringTreeTree[string]
typeNumberinterface{int64|float64}
funcSquare[TNumber](vT)T{returnv*v}
funcSquare[Tint64|float64](vT)T{returnv*v}
varnint64=9s:=Square(n)// 81
foo()// Call foo() and wait for it to returngofoo()// Spawn a new goroutine that calls foo() concurrently
funcLog(wio.Writer,messagestring){gofunc(){fmt.Fprintln(w,message)}()// Don't forget the trailing parentheses!}
varchchanint=make(chanint)
ch<-val// Sending on a channelval=<-ch// Receiving on a channel and assigning it to val<-ch// Receiving on a channel and discarding the result
funcmain(){ch:=make(chanstring)// Allocate a string channelgofunc(){message:=<-ch// Blocking receive; assigns to messagefmt.Println(message)// "ping"ch<-"pong"// Blocking send}()ch<-"ping"// Send "ping"fmt.Println(<-ch)// "pong"}
ch:=make(chanstring,2)// Buffered channel with capacity 2ch<-"foo"// Two non-blocking sendsch<-"bar"fmt.Println(<-ch)// Two non-blocking receivesfmt.Println(<-ch)// The buffer is now emptyfmt.Println(<-ch)// The third receive will block
ch:=make(chanstring,10)ch<-"foo"close(ch)// One value left in the buffermsg,ok:=<-chfmt.Printf("%q, %v\n",msg,ok)// "foo", truemsg,ok=<-chfmt.Printf("%q, %v\n",msg,ok)// "", false
ch:=make(chanstring,3)ch<-"foo"// Send three (buffered) values to the channelch<-"bar"ch<-"baz"close(ch)// Close the channelfors:=rangech{// Range will continue to the "closed" flagfmt.Println(s)}
select{case<-ch1:// Discard received valuefmt.Println("Got something")casex:=<-ch2:// Assign received value to xfmt.Println(x)casech3<-y:// Send y to channelfmt.Println(y)default:fmt.Println("None of the above")}
varch=make(chanint)select{casem:=<-ch:// Read from ch; blocks foreverfmt.Println(m)case<-time.After(10*time.Second):// time.After returns a channelfmt.Println("Timed out")}
1 Alan Perlis, ACM SIGPLAN Notices 17, no. 9, (September 1982): 7–13.
2 Earlier versions of C, C++, and Python lacked a native Boolean type, instead representing them using the integers 0 (for false) or 1 (for true). Some languages like Perl, Lua, and Tcl still use a similar strategy.
3 See what I did there?
4 Why would you?
5 In fact, the freezing point of water on the Newton scale actually is 0.0, but that’s not important.
6 Clearly this code needs to be recalibrated.
7 Rob Pike, “Go Proverbs”, YouTube, December 1, 2015.
8 This example was borrowed from the article “An Introduction to Generics”, The Go Blog, March 22, 2022. It was just too perfect not to use.
9 This isn’t Java, after all.
10 One last time, if you haven’t read it yet, go read Learning Go by Jon Bodner (O’Reilly).
Progress is possible only if we train ourselves to think about programs without thinking of them as pieces of executable code.1
Edsger W. Dijkstra, August 1979
typeContextinterface{// Deadline returns the time when this Context should be canceled; it// returns ok==false if no deadline is set.Deadline()(deadlinetime.Time,okbool)// Done returns a channel that's closed when this Context is canceled.Done()<-chanstruct{}// Err indicates why this context was canceled after the Done channel is// closed. If Done is not yet closed, Err returns nil.Err()error// Value returns the value associated with this context for key, or nil// if no value is associated with key. Use with care.Value(keyany)any}
Background() ContextReturns an empty Context that’s never canceled, has no values, and has no deadline. It’s typically used by the main function, initialization, and tests and as the top-level Context for incoming requests.
TODO() ContextAlso provides an empty Context, but it’s intended to be used as a placeholder when it’s unclear which Context to use or when a parent Context is not yet
available.
WithDeadline(Context, time.Time) (Context, CancelFunc)Accepts a specific time at which the Context will be canceled and the Done channel will be closed.
WithTimeout(Context, time.Duration) (Context, CancelFunc)Accepts a duration after which the Context will be canceled and the Done channel will be closed.
WithCancel(Context) (Context, CancelFunc)Unlike the previous functions, WithCancel accepts nothing additional and only returns a function that can be called to explicitly cancel the Context.
WithDeadlineCause(Context, time.Time, error) (Context, CancelFunc)Introduced in Go 1.21. Behaves like WithDeadline but also sets the cause of the returned Context when the deadline is exceeded. The returned CancelFunc does not set the cause.
WithTimeoutCause(Context, time.Duration, error) (Context, CancelFunc)Introduced in Go 1.21. Behaves like WithTimeout but also sets the cause of the returned Context when the timeout expires. The returned CancelFunc does not set the cause.
WithCancelCause(Context) (Context, CancelCauseFunc)Introduced in Go 1.20. Behaves like WithCancel but returns a CancelCauseFunc instead of a CancelFunc. Calling cancel with a non-nil error (the “cause”) records that error in ctx; it can then be retrieved using Cause(ctx).
WithValue(parent Context, key, val any) ContextWithValue returns a derivation of parent in which key is associated with the value val.
funcStream(ctxcontext.Context,outchan<-Value)error{// Create a derived Context with a 10s timeout; dctx// will be canceled upon timeout, but ctx will not.// cancel is a function that will explicitly cancel dctx.dctx,cancel:=context.WithTimeout(ctx,10*time.Second)// Release resources if SlowOperation completes before timeoutdefercancel()res,err:=SlowOperation(dctx)// res is a Value channeliferr!=nil{// True if dctx times outreturnerr}for{select{caseout<-<-res:// Read from res; send to outcase<-ctx.Done():// Triggered if ctx is canceledreturnctx.Err()// but not if dctx is canceled}}}
Context and descriptions of where this pattern may be applied.
A listing of the components of the pattern and their roles.
A discussion of the solution and its implementation.
A demonstration of how the code may be implemented in Go.
The function that interacts with the service.
A closure with the same function signature as Circuit.
typeCircuitfunc(context.Context)(string,error)
funcBreaker(circuitCircuit,thresholdint)Circuit{varfailuresintvarlast=time.Now()varmsync.RWMutexreturnfunc(ctxcontext.Context)(string,error){m.RLock()// Establish a "read lock"d:=failures-thresholdifd>=0{shouldRetryAt:=last.Add((2<<d)*time.Second)if!time.Now().After(shouldRetryAt){m.RUnlock()return"",errors.New("service unavailable")}}m.RUnlock()// Release read `lock`response,err:=circuit(ctx)// Issue the request properm.Lock()// Lock around shared resourcesdeferm.Unlock()last=time.Now()// Record time of attemptiferr!=nil{// Circuit returned an error,failures++// so we count the failurereturnresponse,err// and return}failures=0// Reset failures counterreturnresponse,nil}}
The function to regulate
A closure with the same function signature as Circuit
typeCircuitfunc(context.Context)(string,error)
funcmyFunction(ctxcontext.Context)(string,error){/* ... */}wrapped:=Breaker(Debounce(myFunction))response,err:=wrapped(ctx)
funcDebounceFirst(circuitCircuit,dtime.Duration)Circuit{varthresholdtime.Timevarresultstringvarerrerrorvarmsync.Mutexreturnfunc(ctxcontext.Context)(string,error){m.Lock()deferm.Unlock()iftime.Now().Before(threshold){returnresult,err}result,err=circuit(ctx)threshold=time.Now().Add(d)returnresult,err}}
funcDebounceFirstContext(circuitCircuit,dtime.Duration)Circuit{varthresholdtime.Timevarmsync.MutexvarlastCtxcontext.ContextvarlastCancelcontext.CancelFuncreturnfunc(ctxcontext.Context)(string,error){m.Lock()iftime.Now().Before(threshold){lastCancel()}lastCtx,lastCancel=context.WithCancel(ctx)threshold=time.Now().Add(d)m.Unlock()result,err:=circuit(lastCtx)returnresult,err}}
typeCircuitfunc(context.Context)(string,error)funcDebounceLast(circuitCircuit,dtime.Duration)Circuit{varmsync.Mutexvartimer*time.Timervarcctxcontext.Contextvarcancelcontext.CancelFuncreturnfunc(ctxcontext.Context)(string,error){m.Lock()iftimer!=nil{timer.Stop()cancel()}cctx,cancel=context.WithCancel(ctx)ch:=make(chanstruct{resultstringerrerror},1)timer=time.AfterFunc(d,func(){r,e:=circuit(cctx)ch<-struct{resultstringerrerror}{r,e}})m.Unlock()select{caseres:=<-ch:returnres.result,res.errcase<-cctx.Done():return"",cctx.Err()}}}
The function that interacts with the service.
A function that accepts Effector and returns a closure with the same function signature as Effector.
typeEffectorfunc(context.Context)(string,error)
funcRetry(effectorEffector,maxRetriesint,delaytime.Duration)Effector{returnfunc(ctxcontext.Context)(string,error){forr:=0;;r++{response,err:=effector(ctx)iferr==nil||r>=maxRetries{returnresponse,err}log.Printf("Attempt %d failed; retrying in %v",r+1,delay)select{case<-time.After(delay):case<-ctx.Done():return"",ctx.Err()}}}}
varcountintfuncEmulateTransientError(ctxcontext.Context)(string,error){count++ifcount<=3{return"intentional fail",errors.New("error")}else{return"success",nil}}funcmain(){r:=Retry(EmulateTransientError,5,2*time.Second)res,err:=r(context.Background())fmt.Println(res,err)}
The function to regulate
A function that accepts Effector and returns a closure with the same function signature as Effector
This is the most basic strategy and is common when you’re only trying to restrict unreasonable or potentially abusive numbers of client requests. A RESTful service adopting this strategy might respond with a status 429 (Too Many Requests).
This strategy can be useful when a service or expensive function call is likely to provide an identical result if called too soon. It’s commonly used in the JavaScript world.
This approach can be useful when you want to eventually handle all requests, but it’s also more complex and may require taking care to ensure that memory isn’t exhausted.
typeEffectorfunc(context.Context)(string,error)funcThrottle(eEffector,maxuint,refilluint,dtime.Duration)Effector{vartokens=maxvaroncesync.Oncevarmsync.Mutexreturnfunc(ctxcontext.Context)(string,error){ifctx.Err()!=nil{return"",ctx.Err()}once.Do(func(){ticker:=time.NewTicker(d)gofunc(){deferticker.Stop()for{select{case<-ctx.Done():returncase<-ticker.C:m.Lock()t:=tokens+refillift>max{t=max}tokens=tm.Unlock()}}}()})m.Lock()deferm.Unlock()iftokens<=0{return"",fmt.Errorf("too many calls")}tokens--returne(ctx)}}
The client who wants to execute SlowFunction
The long-running function that implements the functionality desired by Client
A wrapper function around SlowFunction that implements the timeout logic
ctx,cancel:=context.WithTimeout(context.Background(),10*time.Second)defercancel()result,err:=SomeFunction(ctx)
typeSlowFunctionfunc(string)(string,error)
typeWithContextfunc(context.Context,string)(string,error)funcTimeout(fSlowFunction)WithContext{returnfunc(ctxcontext.Context,argstring)(string,error){ch:=make(chanstruct{resultstringerrerror},1)gofunc(){res,err:=f(arg)ch<-struct{resultstringerrerror}{res,err}}()select{caseres:=<-ch:returnres.result,res.errcase<-ctx.Done():return"",ctx.Err()}}}
funcmain(){ctx,cancel:=context.WithTimeout(context.Background(),time.Second)defercancel()timeout:=Timeout(Slow)res,err:=timeout(ctx,"some input")fmt.Println(res,err)}
A set of one or more input channels with the same type. Accepted by Funnel.
An output channel of the same type as Sources. Created and provided by Funnel.
Accepts Sources and immediately returns Destination. Any input from any Sources will be output by Destination.
funcFunnel(sources...<-chanint)<-chanint{dest:=make(chanint)// The shared output channelwg:=sync.WaitGroup{}// Used to automatically close dest// when all sources are closedwg.Add(len(sources))// Set size of the WaitGroupfor_,ch:=rangesources{// Start a goroutine for each sourcegofunc(ch<-chanint){deferwg.Done()// Notify WaitGroup when ch closesforn:=rangech{dest<-n}}(ch)}gofunc(){// Start a goroutine to close destwg.Wait()// after all sources closeclose(dest)}()returndest}
funcmain(){varsources[]<-chanint// Declare an empty channel slicefori:=0;i<3;i++{ch:=make(chanint)sources=append(sources,ch)// Create a channel; add to sourcesgofunc(){// Run a toy goroutine for eachdeferclose(ch)// Close ch when the routine endsfori:=1;i<=5;i++{ch<-itime.Sleep(time.Second)}}()}dest:=Funnel(sources...)ford:=rangedest{fmt.Println(d)}}
An input channel. Accepted by Split.
An output channel of the same type as Source. Created and provided by Split.
A function that accepts Source and immediately returns Destinations. Any input from Source will be output to a Destination.
funcSplit(source<-chanint,nint)[]<-chanint{vardests[]<-chanint// Declare the dests slicefori:=0;i<n;i++{// Create n destination channelsch:=make(chanint)dests=append(dests,ch)gofunc(){// Each channel gets a dedicateddeferclose(ch)// goroutine that competes for readsforval:=rangesource{ch<-val}}()}returndests}
funcmain(){source:=make(chanint)// The input channeldests:=Split(source,5)// Retrieve 5 output channelsgofunc(){// Send the number 1..10 to sourcefori:=1;i<=10;i++{// and close it when we're donesource<-i}close(source)}()varwgsync.WaitGroup// Use WaitGroup to wait untilwg.Add(len(dests))// the output channels all closefori,d:=rangedests{gofunc(iint,d<-chanint){deferwg.Done()forval:=ranged{fmt.Printf("#%d got %d\n",i,val)}}(i,d)}wg.Wait()}
funcConcurrentInverse(mMatrix)<-chanMatrix{out:=make(chanMatrix)gofunc(){out<-BlockingInverse(m)close(out)}()returnout}
funcInverseProduct(a,bMatrix)Matrix{inva:=ConcurrentInverse(a)invb:=ConcurrentInverse(b)returnProduct(<-inva,<-invb)}
returnProduct(<-ConcurrentInverse(a),<-ConcurrentInverse(b))
The interface that is received by the consumer to retrieve the eventual result
A wrapper function around some function to be asynchronously executed; provides Future
Satisfies the Future interface; includes an attached method that contains the result access logic
typeFutureinterface{Result()(string,error)}
typeInnerFuturestruct{oncesync.Oncewgsync.WaitGroupresstringerrerrorresCh<-chanstringerrCh<-chanerror}func(f*InnerFuture)Result()(string,error){f.once.Do(func(){f.wg.Add(1)deferf.wg.Done()f.res=<-f.resChf.err=<-f.errCh})f.wg.Wait()returnf.res,f.err}
funcSlowFunction(ctxcontext.Context)Future{resCh:=make(chanstring)errCh:=make(chanerror)gofunc(){select{case<-time.After(2*time.Second):resCh<-"I slept for 2 seconds"errCh<-nilcase<-ctx.Done():resCh<-""errCh<-ctx.Err()}}()return&InnerFuture{resCh:resCh,errCh:errCh}}
funcmain(){ctx:=context.Background()future:=SlowFunction(ctx)// Do stuff while SlowFunction chugs along in the background.res,err:=future.Result()iferr!=nil{fmt.Println("error:",err)return}fmt.Println(res)}
An abstraction around one or more Shards providing read and write access as if the Shards were a single map
An individually lockable collection representing a single data partition
typeShard[Kcomparable,Vany]struct{sync.RWMutex// Compose from sync.RWMutexitemsmap[K]V// m contains the shard's data}typeShardedMap[Kcomparable,Vany][]*Shard[K,V]
funcNewShardedMap[Kcomparable,Vany](nshardsint)ShardedMap[K,V]{shards:=make([]*Shard[K,V],nshards)// Initialize a *Shards slicefori:=0;i<nshards;i++{shard:=make(map[K]V)shards[i]=&Shard[K,V]{items:shard}// A ShardedMap IS a slice!}returnshards}
func(mShardedMap[K,V])getShardIndex(keyK)int{str:=reflect.ValueOf(key).String()// Get string representation of keyhash:=fnv.New32a()// Get a hash implementationhash.Write([]byte(str))// Write bytes to the hashsum:=int(hash.Sum32())// Get the resulting checksumreturnsum%len(m)// Mod by len(m) to get index}func(mShardedMap[K,V])getShard(keyK)*Shard[K,V]{index:=m.getShardIndex(key)returnm[index]}
func(mShardedMap[K,V])Get(keyK)V{shard:=m.getShard(key)shard.RLock()defershard.RUnlock()returnshard.items[key]}func(mShardedMap[K,V])Set(keyK,valueV){shard:=m.getShard(key)shard.Lock()defershard.Unlock()shard.items[key]=value}
func(mShardedMap[K,V])Keys()[]K{varkeys[]K// Declare an empty keys slicevarmutexsync.Mutex// Mutex for write safety to keysvarwgsync.WaitGroup// Create a wait group and add awg.Add(len(m))// wait value for each slicefor_,shard:=rangem{// Run a goroutine for each slice in mgofunc(s*Shard[K,V]){s.RLock()// Establish a read lock on sdeferwg.Done()// Release of the read lockdefers.RUnlock()// Tell the WaitGroup it's doneforkey,_:=ranges.items{// Get the slice's keysmutex.Lock()keys=append(keys,key)mutex.Unlock()}}(shard)}wg.Wait()// Block until all goroutines are donereturnkeys// Return combined keys slice}
funcmain(){m:=NewShardedMap[string,int](5)keys:=[]string{"alpha","beta","gamma"}fori,k:=rangekeys{m.Set(k,i+1)fmt.Printf("%5s: shard=%d value=%d\n",k,m.getShardIndex(k),m.Get(k))}fmt.Println(m.Keys())}
alpha: shard=3 value=1 beta: shard=2 value=2 gamma: shard=0 value=3 [gamma beta alpha]
A function that does some work on items from Jobs and sends the results to Results
A channel from which Worker receives the raw data to be worked on
A channel into which Worker sends the results of its work
funcworker(idint,jobs<-chanint,resultschan<-int){forj:=rangejobs{fmt.Println("Worker",id,"started job",j)time.Sleep(time.Second)results<-j*2}}
funcmain(){jobs:=make(chanint,10)results:=make(chanint)wg:=sync.WaitGroup{}forw:=1;w<=3;w++{// Spawn 3 workers processesgoworker(w,jobs,results)}forj:=1;j<=10;j++{// Send jobs to workerswg.Add(1)jobs<-j}gofunc(){wg.Wait()close(jobs)close(results)}()forr:=rangeresults{fmt.Println("Got result:",r)wg.Done()}}
A set of one or more input channels with the same type. Accepted by Chord.
An output channel of the same type as Sources. Created and provided by Chord.
Accepts Sources and immediately returns Destination. Any input from any Sources will be output by Destination.
funcChord(sources...<-chanint)<-chan[]int{typeinputstruct{// Used to send inputsidx,inputint// between goroutines}dest:=make(chan[]int)// The output channelinputs:=make(chaninput)// An intermediate channelwg:=sync.WaitGroup{}// Used to close channels whenwg.Add(len(sources))// all sources are closedfori,ch:=rangesources{// Start goroutine for each sourcegofunc(iint,ch<-chanint){deferwg.Done()// Notify WaitGroup when ch closesforn:=rangech{inputs<-input{i,n}// Transfer input to next goroutine}}(i,ch)}gofunc(){wg.Wait()// Wait for all sources to closeclose(inputs)// then close inputs channel}()gofunc(){res:=make([]int,len(sources))// Slice for incoming inputssent:=make([]bool,len(sources))// Slice to track sent statuscount:=len(sources)// Counter for channelsforr:=rangeinputs{res[r.idx]=r.input// Update incoming inputif!sent[r.idx]{// First input from channel?sent[r.idx]=truecount--}ifcount==0{c:=make([]int,len(res))// Copy and send inputs slicecopy(c,res)dest<-ccount=len(sources)// Reset counterclear(sent)// Clear status tracker}}close(dest)}()returndest}
funcmain(){ch1:=make(chanint)ch2:=make(chanint)ch3:=make(chanint)gofunc(){forn:=1;n<=4;n++{ch1<-nch1<-n*2// Writing twice to ch1!ch2<-nch3<-ntime.Sleep(time.Second)}close(ch1)// Closing all input channelsclose(ch2)// causes res to be closed asclose(ch3)// as well}()res:=Chord(ch1,ch2,ch3)fors:=rangeres{// Read resultsfmt.Println(s)}}
[2 1 1] [4 2 2] [6 3 3] [8 4 4]
1 Spoken August 1979. Attested to by Vicki Almstrum, Tony Hoare, Niklaus Wirth, Wim Feijen, and Rajeev Joshi. In Pursuit of Simplicity: A Symposium Honoring Professor Edsger Wybe Dijkstra, May 12–13, 2000.
2 L (yes, his legal name is L) is a brilliant and fascinating human being. Look him up some time.
3 Pun unavoidable.
4 Erich Gamma et al., Design Patterns: Elements of Reusable Object-Oriented Software, 1st ed. (Addison-Wesley Professional, 1994).
5 If you prefer boring names.
6 While these terms are often used interchangeably, they can also have shades of meaning depending on their context. I know. Please don’t write me any angry letters about this.
7 See the article, “Share Memory by Communicating”, The Go Blog.
8 If you’re into that kind of thing.
9 Java developers, you know what I’m talking about.
10 Which is why we call it a “worker pool” instead of a “thread pool.”
11 Did I leave out your favorite? Let me know, and I’ll try to include it in the next edition!
Life was simple before World War II. After that, we had systems.1
Grace Hopper, OCLC Newsletter (1987)
What if you make a request to a service but get no response? You’ll probably try again. But what if it heard you the first time?4 If the service method is idempotent, then no harm done. But if it’s not, you could have a problem. This scenario is more common than you might think. Networks are unreliable. Responses can be delayed; packets can get dropped.
Idempotent operations are more self-contained and easier to implement. Compare, for example, an idempotent PUT method that just adds a key-value pair into a backing data store, and a similar but nonidempotent CREATE method that returns an error if the data store already contains the key. The PUT logic is simple: receive request, set value. The CREATE, on the other hand, requires additional layers of error checking and handling, and possibly even distributed locking and coordination among any service replicas, making its service harder to scale.
Building an idempotent API encourages the designer to focus on end states, encouraging the production of methods that are more declarative: they allow users to tell a service what needs to be done, instead of telling it how to do it. This may seem to be a fine point, but declarative methods—as opposed to imperative methods—free users from having to deal with low-level constructs, allowing them to focus on their goals and minimizing potential side effects.
For now we can implement this with a plain map, but what kind? For the sake of simplicity, we’ll limit ourselves to keys and values that are strings, though we can choose to allow arbitrary types later. We’ll just use a map[string]string as our core data structure.
In this initial iteration, we’ll create a bare-bones Go API that we can call to perform the basic modification operations. Partitioning the functionality from the code that uses it will make it easier to test and easier to update in future iterations.
varstore=make(map[string]string)
funcPut(key,valuestring)error{store[key]=valuereturnnil}
varErrorNoSuchKey=errors.New("no such key")funcGet(keystring)(string,error){value,ok:=store[key]if!ok{return"",ErrorNoSuchKey}returnvalue,nil}funcDelete(keystring)error{delete(store,key)returnnil}
iferrors.Is(err,ErrorNoSuchKey){http.Error(w,err.Error(),http.StatusNotFound)return}
packagemainimport("fmt""log""net/http")funchelloGoHandler(whttp.ResponseWriter,r*http.Request){fmt.Fprintln(w,"Hello net/http!")}funcmain(){http.HandleFunc("/",helloGoHandler)log.Fatal(http.ListenAndServe(":8080",nil))}
typeHandlerFuncfunc(http.ResponseWriter,*http.Request)
$ go run .
$ curl http://localhost:8080 Hello net/http!
packagemainimport("fmt""log""net/http""github.com/gorilla/mux")funchelloMuxHandler(whttp.ResponseWriter,r*http.Request){fmt.Fprintln(w,"Hello gorilla/mux!")}funcmain(){r:=mux.NewRouter()r.HandleFunc("/",helloMuxHandler)log.Fatal(http.ListenAndServe(":8080",r))}
$ go run .
main.go:8:2: no required module provides package github.com/gorilla/mux; to add:
go get github.com/gorilla/mux
$ go mod init example.com/gorilla go: creating new go.mod: module example.com/gorilla
$ cat go.mod module example.com/gorilla go 1.20
$ go mod tidy go: finding module for package github.com/gorilla/mux go: found github.com/gorilla/mux in github.com/gorilla/mux v1.8.0
$ cat go.mod module example.com/gorilla go 1.20 require github.com/gorilla/mux v1.8.0
$ go run .
$ curl http://localhost:8080 Hello gorilla/mux!
r:=mux.NewRouter()r.HandleFunc("/products/{key}",ProductHandler)r.HandleFunc("/articles/{category}/",ArticlesCategoryHandler)r.HandleFunc("/articles/{category}/{id:[0-9]+}",ArticleHandler)
vars:=mux.Vars(request)category:=vars["category"]
r:=mux.NewRouter()r.HandleFunc("/products",ProductsHandler).Host("www.example.com").// Only match a specific domainMethods("GET","PUT").// Only match GET+PUT methodsSchemes("http")// Only match the http scheme
| Functionality | Method | Possible statuses |
|---|---|---|
Put a key-value pair into the store |
|
|
Read a key-value pair from the store |
|
|
Delete a key-value pair |
|
|
// putHandler expects to be called with a PUT request for// the "/v1/key/{key}" resource.funcputHandler(whttp.ResponseWriter,r*http.Request){vars:=mux.Vars(r)// Retrieve "key" from the requestkey:=vars["key"]value,err:=io.ReadAll(r.Body)// The request body has our valueiferr!=nil{// If we have an error, report ithttp.Error(w,err.Error(),http.StatusInternalServerError)return}deferr.Body.Close()err=Put(key,string(value))// Store the value as a stringiferr!=nil{// If we have an error, report ithttp.Error(w,err.Error(),http.StatusInternalServerError)return}w.WriteHeader(http.StatusCreated)// Success! Return StatusCreated}
funcmain(){r:=mux.NewRouter()// Register putHandler as the handler function for PUT// requests matching "/v1/key/{key}"r.HandleFunc("/v1/key/{key}",putHandler).Methods("PUT")log.Fatal(http.ListenAndServe(":8080",r))}
$ curl -X PUT -d 'Hello, key-value store!' -v http://localhost:8080/v1/key/key-a
> PUT /v1/key/key-a HTTP/1.1 < HTTP/1.1 201 Created
$ curl -X GET -v http://localhost:8080/v1/key/key-a > GET /v1/key/key-a HTTP/1.1 < HTTP/1.1 405 Method Not Allowed
funcgetHandler(whttp.ResponseWriter,r*http.Request){vars:=mux.Vars(r)// Retrieve "key" from the requestkey:=vars["key"]value,err:=Get(key)// Get value for keyiferrors.Is(err,ErrorNoSuchKey){http.Error(w,err.Error(),http.StatusNotFound)return}iferr!=nil{http.Error(w,err.Error(),http.StatusInternalServerError)return}fmt.Fprint(w,value)// Write the value to the response}
funcmain(){r:=mux.NewRouter()r.HandleFunc("/v1/key/{key}",putHandler).Methods("PUT")r.HandleFunc("/v1/key/{key}",getHandler).Methods("GET")log.Fatal(http.ListenAndServe(":8080",r))}
$ curl -X PUT -d 'Hello, key-value store!' -v http://localhost:8080/v1/key/key-a > PUT /v1/key/key-a HTTP/1.1 < HTTP/1.1 201 Created $ curl -v http://localhost:8080/v1/key/key-a > GET /v1/key/key-a HTTP/1.1 < HTTP/1.1 200 OK Hello, key-value store!
$ curl -X PUT -d 'Hello, key-value store!' -v http://localhost:8080/v1/key/key-a > PUT /v1/key/key-a HTTP/1.1 < HTTP/1.1 201 Created $ curl -v http://localhost:8080/v1/key/key-a > GET /v1/key/key-a HTTP/1.1 < HTTP/1.1 200 OK Hello, key-value store!
$ curl -X PUT -d 'Hello, again, key-value store!' \
-v http://localhost:8080/v1/key/key-a
> PUT /v1/key/key-a HTTP/1.1
< HTTP/1.1 201 Created
$ curl -v http://localhost:8080/v1/key/key-a
> GET /v1/key/key-a HTTP/1.1
< HTTP/1.1 200 OK
Hello, again, key-value store!
typeLockableMapstruct{sync.RWMutexmmap[string]string}varmyMap=LockableMap{m:make(map[string]string),}
myMap.Lock()// Take a write lockdefermyMap.Unlock()// Release the write lockmyMap.m["some_key"]="some_value"
myMap.RLock()// Take a read lockdefermyMap.RUnlock()// Release the read lockvalue:=myMap.m["some_key"]fmt.Println("some_key:",value)
varstore=LockableMap{m:make(map[string]string),}
funcGet(keystring)(string,error){store.RLock()deferstore.RUnlock()value,ok:=store.m[key]if!ok{return"",ErrorNoSuchKey}returnvalue,nil}funcPut(keystring,valuestring)error{store.Lock()deferstore.Unlock()store.m[key]=valuereturnnil}
A unique record ID, in monotonically increasing order.
A descriptor of the type of action taken; this can be PUT or DELETE.
A string containing the key affected by this transaction.
If the event is a PUT, the value of the transaction.
typeTransactionLoggerinterface{WriteDelete(keystring)WritePut(key,valuestring)}
There’s no dependency on an external service that could fail or that we can lose access to.
The logic isn’t especially sophisticated. We can be up and running quickly.
You’ll need some additional way to distribute your state between nodes when you want to scale out.
These logs have to be stored on disk, so you can’t let them grow forever. You’ll need some way of periodically compacting them.
typeFileTransactionLoggerstruct{// Something, something, fields}func(l*FileTransactionLogger)WritePut(key,valuestring){// Something, something, logic}func(l*FileTransactionLogger)WriteDelete(keystring){// Something, something, logic}
typeEventstruct{Sequenceuint64// A unique record IDEventTypeEventType// The action takenKeystring// The key affected by this transactionValuestring// The value of the transaction}
const(EventDeletebyte=1EventPutbyte=2)
typeEventTypebyteconst(_=iota// iota == 0; ignore the zero valueEventDeleteEventType=iota// iota == 1EventPut// iota == 2; implicitly repeat)
typeFileTransactionLoggerstruct{eventschan<-Event// Write-only channel for sending eventserrors<-chanerror// Read-only channel for receiving errorslastSequenceuint64// The last used event sequence numberfile*os.File// The location of the transaction log}func(l*FileTransactionLogger)WritePut(key,valuestring){l.events<-Event{EventType:EventPut,Key:key,Value:value}}func(l*FileTransactionLogger)WriteDelete(keystring){l.events<-Event{EventType:EventDelete,Key:key}}func(l*FileTransactionLogger)Err()<-chanerror{returnl.errors}
funcNewFileTransactionLogger(filenamestring)(TransactionLogger,error){file,err:=os.OpenFile(filename,os.O_RDWR|os.O_APPEND|os.O_CREATE,0755)iferr!=nil{returnnil,fmt.Errorf("cannot open transaction log file: %w",err)}return&FileTransactionLogger{file:file},nil}
os.O_RDWROpens the file in read/write mode.
os.O_APPENDAny writes to this file will append, not overwrite.
os.O_CREATEIf the file doesn’t exist, creates it.
func(l*FileTransactionLogger)Run(){events:=make(chanEvent,16)// Make an events channell.events=eventserrors:=make(chanerror,1)// Make an errors channell.errors=errorsgofunc(){fore:=rangeevents{// Retrieve the next Eventl.lastSequence++// Increment sequence number_,err:=fmt.Fprintf(// Write the event to the logl.file,"%d\t%d\t%s\t%s\n",l.lastSequence,e.EventType,e.Key,e.Value)iferr!=nil{errors<-errreturn}}}()}
func(l*FileTransactionLogger)ReadEvents()(<-chanEvent,<-chanerror){scanner:=bufio.NewScanner(l.file)// Create a Scanner for l.fileoutEvent:=make(chanEvent)// An unbuffered Event channeloutError:=make(chanerror,1)// A buffered error channelgofunc(){vareEventdeferclose(outEvent)// Close the channels when thedeferclose(outError)// goroutine endsforscanner.Scan(){line:=scanner.Text()iferr:=fmt.Sscanf(line,"%d\t%d\t%s\t%s",&e.Sequence,&e.EventType,&e.Key,&e.Value);err!=nil{outError<-fmt.Errorf("input parse error: %w",err)return}// Sanity check! Are the sequence numbers in increasing order?ifl.lastSequence>=e.Sequence{outError<-fmt.Errorf("transaction numbers out of sequence")return}l.lastSequence=e.Sequence// Update last used sequence #outEvent<-e// Send the event along}iferr:=scanner.Err();err!=nil{outError<-fmt.Errorf("transaction log read failure: %w",err)return}}()returnoutEvent,outError}
typeTransactionLoggerinterface{WriteDelete(keystring)WritePut(key,valuestring)Err()<-chanerrorReadEvents()(<-chanEvent,<-chanerror)Run()}
varloggerTransactionLogger
funcinitializeTransactionLog()error{varerrerrorlogger,err=NewFileTransactionLogger("transaction.log")iferr!=nil{returnfmt.Errorf("failed to create event logger: %w",err)}events,errors:=logger.ReadEvents()e:=Event{}ok:=trueforok&&err==nil{select{caseerr,ok=<-errors:// Retrieve any errorscasee,ok=<-events:switche.EventType{caseEventDelete:// Got a DELETE event!err=Delete(e.Key)caseEventPut:// Got a PUT event!err=Put(e.Key,e.Value)}}}logger.Run()returnerr}
Less need to worry about distributed state and closer to “cloud native.”
Not having to share data between replicas makes scaling out easier (but not easy).
What if you had to scale way up? What if all replicas had to read from the database at once?
Creates a dependency on another resource that might fail.
What if the Transactions table doesn’t exist?
Yet another thing to manage and configure.
import("database/sql"_"github.com/lib/pq"// Anonymously import the driver package)
typeTransactionLoggerinterface{WriteDelete(keystring)WritePut(key,valuestring)Err()<-chanerrorReadEvents()(<-chanEvent,<-chanerror)Run()}
typePostgresTransactionLoggerstruct{eventschan<-Event// Write-only channel for sending eventserrors<-chanerror// Read-only channel for receiving errorsdb*sql.DB// The database access interface}func(l*PostgresTransactionLogger)WritePut(key,valuestring){l.events<-Event{EventType:EventPut,Key:key,Value:value}}func(l*PostgresTransactionLogger)WriteDelete(keystring){l.events<-Event{EventType:EventDelete,Key:key}}func(l*PostgresTransactionLogger)Err()<-chanerror{returnl.errors}
funcNewPostgresTransactionLogger(host,dbName,user,passwordstring)(TransactionLogger,error){...}
typePostgresDBParamsstruct{dbNamestringhoststringuserstringpasswordstring}
logger,err=NewPostgresTransactionLogger(PostgresDBParams{host:"localhost",dbName:"kvs",user:"test",password:"hunter2"})
funcNewPostgresTransactionLogger(configPostgresDBParams)(TransactionLogger,error){connStr:=fmt.Sprintf("host=%s dbname=%s user=%s password=%s",config.host,config.dbName,config.user,config.password)db,err:=sql.Open("postgres",connStr)iferr!=nil{returnnil,fmt.Errorf("failed to open db: %w",err)}err=db.Ping()// Test the database connectioniferr!=nil{returnnil,fmt.Errorf("failed to open db connection: %w",err)}logger:=&PostgresTransactionLogger{db:db}exists,err:=logger.verifyTableExists()iferr!=nil{returnnil,fmt.Errorf("failed to verify table exists: %w",err)}if!exists{iferr=logger.createTable();err!=nil{returnnil,fmt.Errorf("failed to create table: %w",err)}}returnlogger,nil}
func(l*PostgresTransactionLogger)Run(){events:=make(chanEvent,16)// Make an events channell.events=eventserrors:=make(chanerror,1)// Make an errors channell.errors=errorsgofunc(){// The INSERT queryquery:=`INSERT INTO transactions(event_type, key, value)VALUES ($1, $2, $3)`fore:=rangeevents{// Retrieve the next Event_,err:=l.db.Exec(// Execute the INSERT queryquery,e.EventType,e.Key,e.Value)iferr!=nil{errors<-err}}}()}
func(l*PostgresTransactionLogger)ReadEvents()(<-chanEvent,<-chanerror){outEvent:=make(chanEvent)// An unbuffered events channeloutError:=make(chanerror,1)// A buffered errors channelgofunc(){deferclose(outEvent)// Close the channels when thedeferclose(outError)// goroutine endsquery:=`SELECT sequence, event_type, key, valueFROM transactionsORDER BY sequence`rows,err:=l.db.Query(query)// Run query; get result setiferr!=nil{outError<-fmt.Errorf("sql query error: %w",err)return}deferrows.Close()// This is important!e:=Event{}// Create an empty Eventforrows.Next(){// Iterate over the rowserr=rows.Scan(// Read the values from the&e.Sequence,&e.EventType,// row into the Event.&e.Key,&e.Value)iferr!=nil{outError<-fmt.Errorf("error reading row: %w",err)return}outEvent<-e// Send e to the channel}err=rows.Err()iferr!=nil{outError<-fmt.Errorf("transaction log read failure: %w",err)}}()returnoutEvent,outError}
logger,err=NewFileTransactionLogger("transaction.log")
logger,err=NewPostgresTransactionLogger(PostgresDBParams{host:"localhost",dbName:"db-name",user:"db-user",password:"db-password"})
The server certificate (including the CA-signed public key)
A private key, not to be shared
funcmain(){http.HandleFunc("/",helloGoHandler)// Add a root path handlerhttp.ListenAndServe(":8080",nil)// Start the HTTP server}
funcListenAndServeTLS(addr,certFile,keyFilestring,handlerHandler)error{}
http.ListenAndServeTLS(":8080","cert.pem","key.pem",nil)
$ curl -X PUT -d 'Hello, key-value store!' -v https://localhost:8080/v1/key/key-a * SSL certificate problem: self signed certificate curl: (60) SSL certificate problem: self signed certificate
$ curl -X PUT -d 'Hello, key-value store!' --insecure -v \
https://localhost:8080/v1/key/key-a
* SSL certificate verify result: self signed certificate (18), continuing anyway.
> PUT /v1/key/key-a HTTP/2
< HTTP/2 201
Unlike virtual machines that are saddled with an entire operating system and a colossal memory footprint, containers boast image sizes in the megabyte range and startup times that measure in milliseconds. This is particularly true of Go applications, whose binaries have few, if any, dependencies.
This was hinted at previously but bears repeating. Containers virtualize CPU, memory, storage, and network resources at the operating-system-level, providing developers with a sandboxed view of the OS that is logically isolated from other applications.
Containers let you package an application alongside its dependencies, such as specific versions of language runtimes and libraries, as a single distributable binary, making your deployments reproducible, predictable, and versionable.
Sophisticated container orchestration systems like Kubernetes provide a huge number of benefits. By containerizing your application(s), you’re taking the first step toward being able to take advantage of them.
# The parent image. At build time, this image will be pulled and# subsequent instructions run against it.FROMubuntu:22.04# Update apt cache and install nginx without an approval prompt.RUNapt-getupdate&&apt-getinstall--yesnginx# Tell Docker this image's containers will use port 80.EXPOSE80# Run Nginx in the foreground. This is important: without a# foreground process the container will automatically stop.CMD["nginx","-g","daemon off;"]
FROMSpecifies a base image that this build will extend, and will typically be a common Linux distribution, such as ubuntu or alpine. At build time, this image is pulled and run, and the subsequent commands applied to it.
RUNWill execute any commands on top of the current image. The result will be used for the next step in the Dockerfile.
EXPOSETells Docker which port(s) the container will use. See “What’s the Difference Between Exposing and Publishing Ports?” for more information on exposing ports.
CMDThe command to execute when the container is executed. There can be only one CMD in a Dockerfile.
$dockerbuild--tagmy-nginx.
$ docker images REPOSITORY TAG IMAGE ID CREATED SIZE my-nginx latest 64ea3e21a388 29 seconds ago 159MB ubuntu 22.04 f63181f19b2f 3 weeks ago 72.9MB
$ docker run --detach --publish 8080:80 --name nginx my-nginx 61bb4d01017236f6261ede5749b421e4f65d43cb67e8e7aa8439dc0f06afe0f3
$ docker ps CONTAINER ID IMAGE STATUS PORTS NAMES 4cce9201f484 my-nginx Up 4 minutes 0.0.0.0:8080->80/tcp nginx
CONTAINER IDThe first 12 characters of the container ID. You’ll notice it matches the output of your docker run.
IMAGEThe name (and tag, if specified) of this container’s source image. No tag implies latest.
COMMAND (not shown)The command running inside the container. Unless overridden in docker run, this will be the same as the CMD instruction in the Dockerfile. In our case, this will be nginx -g 'daemon off;'.
CREATED (not shown)How long ago the container was created.
STATUSThe current state of the container (up, exited, restarting, etc.) and how long it’s been in that state. If the state changed, then the time will differ from CREATED.
PORTSLists all exposed and published ports (see “What’s the Difference Between Exposing and Publishing Ports?”). In our case, we’ve published 0.0.0.0:8080 on the host and mapped it to 80 on the container so that all requests to host port 8080 are forwarded to container port 80.
NAMESThe name of the container. Docker will randomly set this if it’s not explicitly defined. Two containers with the same name, regardless of state, cannot exist on the same host at the same time. To reuse a name, you’ll first have to delete the unwanted container.
$ curl localhost:80 curl: (7) Failed to connect to localhost port 80: Connection refused
nginx!$ docker run --detach --publish 8081:80 --name httpd httpd
$ curl localhost:8081 <html><body><h1>It works!</h1></body></html>
$dockerstop4cce# "docker stop nginx" will work too4cce
$ docker ps CONTAINER ID IMAGE STATUS PORTS NAMES 4cce9201f484 my-nginx Exited (0) 3 minutes ago nginx
$dockerrm4cce# "docker rm nginx" will work too4cce
# We use a "scratch" image, which contains no distribution files. The# resulting image and containers will have only the service binary.FROMscratch# Copy the existing binary from the host.COPYkvs.# Copy in your PEM files.COPY*.pem.# Tell Docker we'll be using port 8080.EXPOSE8080# Tell Docker to execute this command on a `docker run`.CMD["/kvs"]
$CGO_ENABLED=0GOOS=linuxgobuild-a-okvs
CGO_ENABLED=0Tells the compiler to disable cgo and statically link any C bindings. We won’t go into what this is, other than that it enforces static linking, but I encourage you to look at the cgo documentation if you’re curious.
GOOS=linuxInstructs the compiler to generate a Linux binary, cross-compiling if necessary.
-aForces the compiler to rebuild any packages that are already up-to-date.
-o kvsSpecifies that the binary will be named kvs.
$ file kvs kvs: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, not stripped
$ docker build --tag kvs . ...output omitted. $ docker images REPOSITORY TAG IMAGE ID CREATED SIZE kvs latest 7b1fb6fa93e3 About a minute ago 6.88MB openjdk 22 834f87a06187 42 hours ago 505MB node 20 250e9c100ea2 47 hours ago 1.1GB python 3.11 c0e63845ae98 5 weeks ago 1.01GB
$ docker run --detach --publish 8080:8080 kvs 4a05617539125f7f28357d3310759c2ef388f456b07ea0763350a78da661afd3 $ curl -X PUT -d 'Hello, key-value store!' -v http://localhost:8080/v1/key/key-a > PUT /v1/key/key-a HTTP/1.1 < HTTP/1.1 201 Created $ curl http://localhost:8080/v1/key/key-a Hello, key-value store!
FROMgolang:1.20asbuild
COPY--from=build/src/kvs.
# Stage 1: Compile the binary in a containerized Golang environment#FROMgolang:1.20asbuild# Copy the source files from the hostCOPY./src# Set the working directory to the same place we copied the codeWORKDIR/src# Build the binary!RUNCGO_ENABLED=0GOOS=linuxgobuild-okvs# Stage 2: Build the Key-Value Store image proper## Use a "scratch" image, which contains no distribution filesFROMscratch# Copy the binary from the build containerCOPY--from=build/src/kvs.# If you're using TLS, copy the .pem files tooCOPY--from=build/src/*.pem.# Tell Docker we'll be using port 8080EXPOSE8080# Tell Docker to execute this command on a "docker run"CMD["/kvs"]
$ docker build --tag kvs:multipart . ...output omitted. $ docker images REPOSITORY TAG IMAGE ID CREATED SIZE kvs latest 7b1fb6fa93e3 2 hours ago 6.88MB kvs multipart b83b9e479ae7 4 minutes ago 6.56MB
1 Philip Schieber, “The Wit and Wisdom of Grace Hopper,” OCLC Newsletter, March/April 1987, no. 167.
2 For some definition of “love.”
3 If it does, something is very wrong.
4 Or, like my son, was only pretending not to hear you.
5 “Cloud native is not a synonym for microservices…. If cloud native has to be a synonym for anything, it would be idempotent, which definitely needs a synonym.” —Holly Cummins, Cloud Native London, 2018.
6 Isn’t this exciting?
7 It’s a good thing too. Mutexes can be pretty tedious to implement correctly!
8 Didn’t I tell you that we’d make it more complicated?
9 That’s you.
10 That’s a lie. There are probably lots of better names.
11 What makes a transaction log “good” anyway?
12 Naming is hard.
13 After all this time, I still think that’s pretty neat.
14 You’re welcome.
15 Ideally written by somebody who knows more than I do about security.
16 This is a gross oversimplification, but it’ll do for our purposes. I encourage you to learn more about this and correct me, though.
17 You don’t know where that key has been.
18 SSL 2.0 was released in 1995, and TLS 1.0 was released in 1999. Interestingly, SSL 1.0 had some pretty profound security flaws and was never publicly released.
19 Public keys only, please.
20 Containers are not virtual machines. They virtualize the operating system instead of hardware.
21 Repetition intended. This is an important point.
22 Yup. I said it. Again.
23 The initial draft had several more, but this chapter is already pretty lengthy.
24 To be fair, these images are “only” 234 MB and 363 MB compressed, respectively.
25 I’m intentionally ignoring solutions like Amazon’s Elastic Block Store, which can help but have issues of their own.
The most important property of a program is whether it accomplishes the intention of its user.1
C. A. R. Hoare, Communications of the ACM (October 1969)
The dependability of a computer system is its ability to avoid failures that are more frequent or more severe, and outage durations that are longer, than is acceptable to the user(s).8
Fundamental Concepts of Computer System Dependability (2001)
The ability of a system to perform its intended function at a random moment in time. This is usually expressed as the probability that a request made of the system will be successful, defined as uptime divided by total time.
The ability of a system to perform its intended function for a given time interval. This is often expressed as either the mean time between failures (MTBF: total time divided by the number of failures) or failure rate (number of failures divided by total time).
The ability of a system to undergo modifications and repairs. There are a variety of indirect measures for maintainability, ranging from calculations of cyclomatic complexity to tracking the amount of time required to change a system’s behavior to meet new requirements or to restore it to a functional state.
Fault prevention techniques are used during system construction to prevent the occurrence or introduction of faults.
Fault tolerance techniques are used during system design and implementation to prevent service failures in the presence of faults.
Fault removal techniques are used to reduce the number and severity of faults.
Fault forecasting techniques are used to identify the presence, creation, and consequences of faults.
Automated, rule-based code analysis performed without actually executing programs. Static analysis is useful for providing early feedback, enforcing consistent practices, and finding common errors and security holes without depending on human knowledge or effort.
Verifying the correctness of a system or subsystem by executing it under controlled conditions and evaluating its behavior. More commonly referred to simply as “testing.”
One codebase tracked in revision control, many deploys.
The Twelve-Factor App
Explicitly declare and isolate (code) dependencies.
The Twelve-Factor App
Store configuration in the environment.
The Twelve-Factor App
name:=os.Getenv("NAME")place:=os.Getenv("CITY")fmt.Printf("%s lives in %s.\n",name,place)
viper.BindEnv("id")// Will be uppercased automaticallyviper.SetDefault("id","13")// Default value is "13"id1:=viper.GetInt("id")fmt.Println(id1)// 13os.Setenv("ID","50")// Typically done outside of the app!id2:=viper.GetInt("id")fmt.Println(id2)// 50
Treat backing services as attached resources.
The Twelve-Factor App
Strictly separate build and run stages.
The Twelve-Factor App
In the build stage, an automated process retrieves a specific version of the code, fetches dependencies, and compiles an executable artifact we call a build. Every build should always have a unique identifier, typically a timestamp or an incrementing build number.
In the release stage, a specific build is combined with a configuration specific to the target deployment. The resulting release is ready for immediate execution in the execution environment. Like builds, releases should also have a unique identifier. Importantly, producing releases with the same version of a build shouldn’t involve a rebuild of the code: to ensure environment parity, each environment-specific configuration should use the same build artifact.
In the run stage, the release is delivered to the deployment environment and executed by launching the service’s processes.
Execute the app as one or more stateless processes.
The Twelve-Factor App
Each service manages its own data.
Cloud Native (“Data Isolation”)
Scale out via the process model.
The Twelve-Factor App
Maximize robustness with fast startup and graceful shutdown.
The Twelve-Factor App
Keep development, staging, and production as similar as possible.
The Twelve-Factor App
Development branches should be small and short-lived and should be tested and deployed into production as quickly as possible. This minimizes functional differences between environments and reduces the risk of both deploys and rollbacks.
Rather than having different components for development and production (say, SQLite on macOS versus MySQL on Linux), environments should remain as similar as possible. Lightweight containers are an excellent tool for this. This minimizes the possibility that inconvenient differences between almost-but-not-quite-the-same implementations will emerge to ruin your day.
Once it was common to have programmers who wrote code and operators who deployed code, but that arrangement created conflicting incentives and counterproductive adversarial relationships. Keeping code authors involved in deploying their work and responsible for its behavior in production helps break down development/operations silos and aligns incentives around stability and velocity.
Treat logs as event streams.
The Twelve-Factor App
Run administrative/management tasks as one-off processes.
The Twelve-Factor App
1 C. A. R. Hoare, “An Axiomatic Basis for Computer Programming”, Communications of the ACM, 12, no. 10 (October 20, 1969): 576–583.
2 When Edsger W. Dijkstra coined the expression “GOTO considered harmful,” he was referencing Hoare’s work in structured programming.
3 Tony Hoare, “Null References: The Billion Dollar Mistake”, InfoQ.com, August 25, 2009.
4 Holly Cummins, Cloud Native Is About Culture, Not Containers, Cloud Native London, 2018.
5 If you ever have a chance to see her speak, I strongly recommend you take it.
6 Remember what Walt did to Jane that time? That was so messed up.
7 Jean-Claude Laprie, “Dependable Computing and Fault Tolerance: Concepts and Terminology”, FTCS-15 The 15th Int’l Symposium on Fault-Tolerant Computing, June 1985, 2–11.
8 Algirdas Avižienis et al., “Fundamental Concepts of Computer System Dependability”, Research Report No. 1145, LAAS-CNRS, April 2001.
9 If you haven’t, start with Site Reliability Engineering: How Google Runs Production Systems by Betsey Beyer et al. (O’Reilly, 2016). It really is very good.
10 Many organizations use service-level objectives (SLOs) for precisely this purpose.
11 Application state is hard, and when done wrong, it’s poison to scalability.
12 Adam Wiggins, The Twelve-Factor App, 2017.
13 Although it was for too long!
14 The world’s worst configuration language (except for all the other ones).
15 Adam Wiggins, “Port Binding”, The Twelve-Factor App, 2017.
16 Probably at three in the morning.
17 Baking is a term sometimes used to refer to the process of creating a new container or server image.
Some of the best programming is done on paper, really. Putting it into the computer is just a minor detail.1
Max Kanat-Alexander, Code Simplicity: The Fundamentals of Software
A system can be vertically scaled (or scaled up) by increasing its resource allocations. In a public cloud, an existing server can be vertically scaled fairly easily just by changing its instance size, but only until you run out of larger instance types (or money).
A system can be horizontally scaled (or scaled out) by duplicating the system or service to limit the burden on any individual server. Systems using this strategy can typically scale to handle greater amounts of load, but as you’ll see in “State and Statelessness”, the presence of state can make this strategy difficult or impossible for some systems.
The number of operations per unit of time that can be performed by a system’s central processor and a common bottleneck for many systems. Scaling strategies for CPU include caching the results of expensive deterministic operations (at the expense of memory) or simply increasing the size or number of processors (at the expense of network I/O if scaling out).
The amount of data that can be stored in main memory. While today’s systems can store incredible amounts of data on the order of tens or hundreds of gigabytes, even this can fall short, particularly for data-intensive systems that lean on memory to circumvent disk I/O speed limits. Scaling strategies include offloading data from memory to disk (at the expense of disk I/O) or an external dedicated cache (at the expense of network I/O), or simply increasing the amount of available memory.
The speed at which data can be read from and written to a hard disk or other persistent storage medium. Disk I/O is a common bottleneck on highly parallel systems that read and write heavily to disk, such as databases. Scaling strategies include caching data in RAM (at the expense of memory) or using an external dedicated cache (at the expense of network I/O).
The speed at which data can be sent across a network, either from a particular point or in aggregate. Network I/O translates directly into how much data the network can transmit per unit of time. Scaling strategies for network I/O are often limited,5 but network I/O is particularly amenable to various optimization strategies that we’ll discuss shortly.
The most visible and most often cited benefit is that stateless applications can handle each request or interaction independent of previous requests. This means that any service replica can handle any request, allowing applications to grow, shrink, or be restarted without losing data required to handle any in-flight sessions or requests. This is especially important when autoscaling your service, because the instances, nodes, or pods hosting the service can (and usually will) be created and destroyed unexpectedly.
Data that lives in exactly one place (such as a single service replica) can (and, at some point, will) get lost when that replica goes away for any reason. Remember: everything in “the cloud” evaporates eventually.
Without any application state, stateless services are freed from the need to…well…manage their state.7 Not being burdened with having to maintain service-side state synchronization, consistency, and recovery logic8 makes stateless APIs less complex and therefore easier to design, build, and maintain.
APIs provided by stateless services are relatively easy to design for cacheability. If a service knows that the result of a particular request will always be the same, regardless of who’s making it or when, the result can be safely set aside for easy retrieval later, increasing efficiency and reducing response time.
// New creates an LRU cache with the given capacity.funcNew[Kcomparable,Vany](sizeint)(*Cache[K,V],error){}// NewWithEvict creates an LRU cache with the given capacity and also accepts// an "eviction callback" function that's called when an eviction occurs.funcNewWithEvict[Kcomparable,Vany](sizeint,onEvictedfunc(keyK,valueV))(*Cache[K,V],error){}
// Add adds a value to the cache and returns true if an eviction occurred.func(c*Cache[K,V])Add(keyK,valueV)bool{}// Check if a key is in the cache (without updating the recent-ness).func(c*Cache[K,V])Contains(keyK)bool{}// Get looks up a key's value and returns (value, true) if it exists.// If the value doesn't exist, it returns (nil, false).func(c*Cache[K,V])Get(keyK)(V,bool){}// Len returns the number of items in the cache.func(c*Cache[K,V])Len()int{}// Remove removes the provided key from the cache.func(c*Cache[K,V])Remove(keyK)bool{}
packagemainimport("fmt"lru"github.com/hashicorp/golang-lru/v2")varcache*lru.Cache[int,string]funcinit(){cache,_=lru.NewWithEvict(2,func(keyint,valuestring){fmt.Printf("Evicted: key=%d value=%s\n",key,value)},)}funcmain(){cache.Add(1,"a")// adds 1cache.Add(2,"b")// adds 2; cache is now at capacityfmt.Println(cache.Get(1))// "a true"; 1 now most recently usedcache.Add(3,"c")// adds 3, evicts key 2fmt.Println(cache.Get(2))// " false" (not found)}
$ go run lru.go a true Evicted: key=2 value=b false
typeResourcestruct{urlstringpollingboollastPolledint64}typeResourcesstruct{data[]*Resourcelock*sync.Mutex}
funcPoller(res*Resources){for{// Get the least recently polled Resource and mark it as being polledres.lock.Lock()varr*Resourcefor_,v:=rangeres.data{ifv.polling{continue}ifr==nil||v.lastPolled<r.lastPolled{r=v}}ifr!=nil{r.polling=true}res.lock.Unlock()ifr==nil{continue}// Poll the URL// Update the Resource's polling and lastPolledres.lock.Lock()r.polling=falser.lastPolled=time.Nanoseconds()res.lock.Unlock()}}
typeResourcestringfuncPoller(in,outchan*Resource){forr:=rangein{// Poll the URL// Send the processed Resource to outout<-r}}
fori:=0;i<numPollers;i++{goPoller(in,out)}
ch:=make(chantype,capacity)
typeFileTransactionLoggerstruct{eventschan<-Event// Write-only channel for sending eventslastSequenceuint64// The last used event sequence number}func(l*FileTransactionLogger)WritePut(key,valuestring){l.events<-Event{EventType:EventPut,Key:key,Value:value}}func(l*FileTransactionLogger)Run(){l.events=make(chanEvent,16)// Make an events channelgofunc(){fore:=rangeevents{// Retrieve the next Eventl.lastSequence++// Increment sequence number}}()}
varcache=struct{sync.RWMutexdatamap[string]string}{data:make(map[string]string)}
funcThreadSafeWrite(key,valuestring){cache.Lock()// Establish write lockcache.data[key]=valuecache.Unlock()// Release write lock}
funcleaky(){ch:=make(chanstring)gofunc(){s:=<-chfmt.Println("Message:",s)}()}
functimely(){timer:=time.NewTimer(5*time.Second)ticker:=time.NewTicker(1*time.Second)done:=make(chanbool)gofunc(){for{select{case<-ticker.C:fmt.Println("Tick!")case<-done:return}}}()<-timer.Cfmt.Println("It's time!")close(done)}
functimelyFixed(){timer:=time.NewTimer(5*time.Second)ticker:=time.NewTicker(1*time.Second)deferticker.Stop()// Be sure to stop the ticker!done:=make(chanbool)gofunc(){for{select{case<-ticker.C:fmt.Println("Tick!")case<-done:return}}}()<-timer.Cfmt.Println("It's time!")close(done)}
Perhaps the most obvious benefit of serverless architectures is that there’s considerably less operational overhead.19 There are no servers to provision and maintain, no licenses to buy, and no software to install.
When using serverless functions, it’s the provider—not the user—who’s responsible for scaling capacity to meet demand. As such, the implementor can spend less time and effort considering and implementing scaling rules.
FaaS providers typically use a “pay-as-you-go” model, charging for the time and memory allocated only when the function is run. This can be considerably more cost-effective than deploying traditional services to (likely underutilized) servers.
In a FaaS model, the unit of work is an event-driven function. This model tends to encourage a “function first” mindset, resulting in code that’s often simpler, more readable, and easier to test.
When a function is first called, it has to be “spun up” by the cloud provider. This typically takes less than a second, but in some cases can add 10 or more seconds to the initial requests. This is known as the cold start delay. What’s more, if the function isn’t called for several minutes—the exact time varies between providers—it’s “spun down” by the provider so that it has to endure another cold start when it’s called again. This usually isn’t a problem if your function doesn’t have enough idle time to get spun down but can be a significant issue if your load is particularly “bursty.”
While most of the cloud vendors provide some basic monitoring for their FaaS offerings, it’s usually quite rudimentary. While third-party providers have been working to fill the void, the quality and quantity of data available from your ephemeral functions is often less than desired.
While unit testing tends to be pretty straightforward for serverless functions, integration testing is quite hard. It’s often difficult or impossible to simulate the serverless environment, and mocks are approximations at best.
Although the pay-as-you-go model can be considerably cheaper when demand is lower, there is a point at which this is no longer true. In fact, extremely high levels of load can grow to be quite expensive.
1 Max Kanat-Alexander, Code Simplicity: The Fundamentals of Software (O’Reilly, 2012).
2 Honestly, if we had autoscaling in place, I probably wouldn’t even remember that this happened.
3 If you want to know more about cloud native infrastructure and architecture, a bunch of excellent books on the subject have already been written. I particularly recommend Cloud Native Infrastructure by Justin Garrison and Kris Nova, and Cloud Native Transformation by Pini Reznik et al. (both O’Reilly).
4 This is my definition. I acknowledge that it diverges from other common definitions.
5 Some cloud providers impose lower network I/O limits on smaller instances. Increasing the size of the instance may increase these limits in some cases.
6 If you have a better definition, let me know. I’m already thinking about the third edition.
7 I know I said the word state a bunch of times there. Writing is hard.
8 See also: idempotence.
9 However, if you’re interested in learning more about high-performance caching in Go, take a look at Manish Rai Jain’s excellent post on the subject, “The State of Caching in Go”, on the Dgraph Blog.
10 Andrew Gerrand, “Share Memory by Communicating”, The Go Blog, July 13, 2010. Portions of this section are modifications based on work created and shared by Google and used according to terms described in the Creative Commons 4.0 Attribution License.
11 You could probably shoehorn channels into a solution for interacting with a cache, but you might find it difficult to make it simpler than locking.
12 If you are, let me know!
13 Dave Cheney wrote an excellent article on this topic called “Why Is a Goroutine’s Stack Infinite?” that I recommend you take a look at if you’re interested in the dynamics of goroutine memory allocation.
14 There’s a good article by Vincent Blanchon on the subject of goroutine recycling entitled “Go: How Does Go Recycle Goroutines?”.
15 Dave Cheney, “Never Start a Goroutine without Knowing How It Will Stop”, dave.cheney.net, December 22, 2016.
16 Not that they’ve gone away.
17 Yes, even Go.
18 Yefim Natis et al., “Hype Cycle for Cloud Platform Services, 2022.” Gartner, Gartner Research, August 2022.
19 It’s right in the name!
20 Sorry, there’s no such thing as NoOps.
We build our computers the way we build our cities—over time, without a plan, on top of ruins.1
Ellen Ullman, “The Dumbing-Down of Programming” (May 1998)
In client-side discovery, the client is responsible for determining the location of service instances. The client queries a service registry, retrieves the list of available instances, and selects one to use, typically using a load-balancing algorithm. Examples include Netflix Eureka and Consul.
In server-side discovery, the client makes a request to a service discovery server, which then queries the registry and forwards the request to an appropriate service instance. The client remains unaware of the service instance details. Examples are AWS Elastic Load Balancer (ELB) and Kubernetes.
When a service starts, that service registers itself with the service registry, which maintains an up-to-date mapping of service names to their corresponding instances and addresses. This registration can also include metadata such as service health, version, and load metrics, which can be used for intelligent routing and load balancing.
When a service instance shuts down, it deregisters itself from the registry. Heartbeat mechanisms generally ensure that stale entries are also removed automatically in case of less graceful failures.
Clients or discovery servers query the service registry to get a list of available service instances. Depending on the mechanism (client-side or server-side), the actual service instance is selected and the request is routed accordingly.
A two-way message exchange in which a requester (the client) issues a request of a receiver (the service) and waits for a response. A textbook example is HTML.
A one-way message exchange in which a requester (the publisher) issues a message, often via some kind of messaging middleware, which can be retrieved asynchronously and acted on by one or more services (consumers).
You’re likely already familiar with REST, which we discussed in some detail in “Building an HTTP Server with net/http”. REST has some things going for it. It’s human-readable and easy to implement, making it a good choice for outward-facing services (which is why we chose it in Chapter 5). We’ll discuss a little more in “Issuing HTTP requests with net/http”.
Remote procedure call (RPC) frameworks allow programs to execute procedures in a different address space, often on another computer. Go provides a standard Go-specific RPC implementation in the form of net/rpc. There are also two big language-agnostic RPC players: Apache Thrift and gRPC. While both are similar in design and usage goals, gRPC seems to have taken the lead with respect to adoption and community support. We’ll discuss gRPC in much more detail in “Remote procedure calls with gRPC”.
A relative newcomer on the scene, GraphQL is a query and manipulation language generally considered an alternative to REST and is particularly powerful when working with complex datasets. We don’t discuss GraphQL in much detail in this book, but I encourage you to look into it the next time you’re designing an outward-facing API.
// Get issues a GET to the specified URLfuncGet(urlstring)(*http.Response,error){}// Head issues a HEAD to the specified URLfuncHead(urlstring)(*http.Response,error){}
typeResponsestruct{Statusstring// e.g. "200 OK"StatusCodeint// e.g. 200// Header maps header keys to values.HeaderHeader// Body represents the response body.Bodyio.ReadCloser// ContentLength records the length of the associated content. The// value -1 indicates that the length is unknown.ContentLengthint64// Request is the request that was sent to obtain this Response.Request*Request}
packagemainimport("fmt""io""net/http")funcmain(){resp,err:=http.Get("http://example.com")// Send an HTTP GETiferr!=nil{panic(err)}deferresp.Body.Close()// Close your response!body,err:=io.ReadAll(resp.Body)// Read body as []byteiferr!=nil{panic(err)}fmt.Println(string(body))}
// Post issues a POST to the specified URLfuncPost(url,contentTypestring,bodyio.Reader)(*Response,error){}// PostForm issues a POST to the specified URL, with data's keys// and values URL-encoded as the request bodyfuncPostForm(urlstring,dataurl.Values)(*Response,error){}
packagemainimport("fmt""io""net/http""strings")constjson=`{ "name":"Matt", "age":44 }`// This is our JSONfuncmain(){in:=strings.NewReader(json)// Wrap JSON with an io.Reader// Issue HTTP POST, declaring our content-type as "text/json"resp,err:=http.Post("http://example.com/upload","text/json",in)iferr!=nil{panic(err)}deferresp.Body.Close()// Close your response!message,err:=io.ReadAll(resp.Body)iferr!=nil{panic(err)}fmt.Printf(string(message))}
Its messages are more compact, consuming less network I/O.
Its binary exchange format is much faster to marshal and unmarshal.
It’s natively strongly typed, eliminating a lot of boilerplate and removing a common source of errors.
It has a number of built-in features such as authentication, encryption, timeout, and compression (to name a few) that you would otherwise have to implement yourself.
gRPC’s contracts make it less suitable for external-facing services.
gRPC data isn’t human-readable, making it harder to inspect and debug.
$aptinstall-yprotobuf-compiler$protoc--version
$brewinstallprotobuf$protoc--version
$goinstallgoogle.golang.org/protobuf/cmd/protoc-gen-go
.proto file; the message blocks define remote
procedure payloadssyntax="proto3";optiongo_package="github.com/cloud-native-go/ch08/point";// Point represents a labeled position on a 2-dimensional surfacemessagePoint{int32x=1;int32y=2;stringlabel=3;}// Line contains start and end PointsmessageLine{Pointstart=1;Pointend=2;stringlabel=3;}// Polyline contains any number (including zero) of PointsmessagePolyline{repeatedPointpoint=1;stringlabel=2;}
PointContains x and y integer values, and a label string
LineContains exactly two Point values
PolylineUses the repeated keyword to indicate that it can contain any number of Point values
keyvalue.proto—the messages that will be passed to and from our
key-value service proceduressyntax="proto3";optiongo_package="github.com/cloud-native-go/ch08/keyvalue";// GetRequest represents a request to the key-value store for the// value associated with a particular keymessageGetRequest{stringkey=1;}// GetResponse represents a response from the key-value store for a// particular valuemessageGetResponse{stringvalue=1;}// PutRequest represents a request to the key-value store for the// value associated with a particular keymessagePutRequest{stringkey=1;stringvalue=2;}// PutResponse represents a response from the key-value store for a// Put action.messagePutResponse{}// DeleteRequest represents a request to the key-value store to delete// the record associated with a keymessageDeleteRequest{stringkey=1;}// DeleteResponse represents a response from the key-value store for a// Delete action.messageDeleteResponse{}
keyvalue.proto—the procedures for our key-value serviceserviceKeyValue{rpcGet(GetRequest)returns(GetResponse);rpcPut(PutRequest)returns(PutResponse);rpcDelete(DeleteRequest)returns(DeleteResponse);}
GetAccepts a GetRequest and returns a GetResponse
PutAccepts a PutRequest and returns a PutResponse
DeleteAccepts a DeleteRequest and returns a DeleteResponse
$protoc--proto_path=$SOURCE_DIR\--go_out=$DEST_DIR--go_opt=paths=source_relative\--go-grpc_out=$DEST_DIR--go-grpc_opt=paths=source_relative\$SOURCE_DIR/keyvalue.proto
typeKeyValueServerinterface{Get(context.Context,*GetRequest)(*GetResponse,error)Put(context.Context,*PutRequest)(*PutResponse,error)Delete(context.Context,*DeleteRequest)(*PutResponse,error)}
typeUnimplementedKeyValueServerstruct{}func(*UnimplementedKeyValueServer)Get(context.Context,*GetRequest)(*GetResponse,error){returnnil,status.Errorf(codes.Unimplemented,"method not implemented")}
packagemainimport("context""log""net"pb"github.com/cloud-native-go/ch08/keyvalue""google.golang.org/grpc")// server is used to implement KeyValueServer. It MUST embed the generated// struct pb.UnimplementedKeyValueServertypeserverstruct{pb.UnimplementedKeyValueServer}func(s*server)Get(ctxcontext.Context,r*pb.GetRequest)(*pb.GetResponse,error){log.Printf("Received GET key=%v",r.Key)// The local Get function is implemented back in Chapter 5value,err:=Get(r.Key)// Return expects a GetResponse pointer and an errreturn&pb.GetResponse{Value:value},err}funcmain(){// Create a gRPC server and register our KeyValueServer with its:=grpc.NewServer()pb.RegisterKeyValueServer(s,&server{})// Open a listening port on 50051lis,err:=net.Listen("tcp",":50051")iferr!=nil{log.Fatalf("failed to listen: %v",err)}// Start accepting connections on the listening portiferr:=s.Serve(lis);err!=nil{log.Fatalf("failed to serve: %v",err)}}
typeKeyValueClientinterface{Get(ctxcontext.Context,in*GetRequest,opts...grpc.CallOption)(*GetResponse,error)Put(ctxcontext.Context,in*PutRequest,opts...grpc.CallOption)(*PutResponse,error)Delete(ctxcontext.Context,in*DeleteRequest,opts...grpc.CallOption)(*PutResponse,error)}
packagemainimport("context""log""os""strings""time"pb"github.com/cloud-native-go/ch08/keyvalue""google.golang.org/grpc")funcmain(){// Set up a connection to the gRPC serverconn,err:=grpc.Dial("localhost:50051",grpc.WithInsecure(),grpc.WithBlock(),grpc.WithTimeout(time.Second))iferr!=nil{log.Fatalf("did not connect: %v",err)}deferconn.Close()// Get a new instance of our clientclient:=pb.NewKeyValueClient(conn)varaction,key,valuestring// Expect something like "set foo bar"iflen(os.Args)>2{action,key=os.Args[1],os.Args[2]value=strings.Join(os.Args[3:]," ")}// Use context to establish a 1-second timeout.ctx,cancel:=context.WithTimeout(context.Background(),time.Second)defercancel()// Call client.Get() or client.Put() as appropriate.switchaction{case"get":r,err:=client.Get(ctx,&pb.GetRequest{Key:key})iferr!=nil{log.Fatalf("could not get value for key %s: %v\n",key,err)}log.Printf("Get %s returns: %s",key,r.Value)case"put":_,err:=client.Put(ctx,&pb.PutRequest{Key:key,Value:value})iferr!=nil{log.Fatalf("could not put key %s: %v\n",key,err)}log.Printf("Put %s",key)default:log.Fatalf("Syntax: go run [get|put] KEY VALUE...")}}
WithInsecureDisables transport security for this ClientConn. Don’t use insecure connections in production.
WithBlockMakes Dial block until a connection is established, otherwise the connection will occur in the background.
WithTimeoutMakes a blocking Dial return an error if it takes longer than the specified amount of time.
A plug-in is a Go main package with one or more exported functions and variables that has been built with the -buildmode=plugin build flag. It’s represented in the plugin package by the Plugin type.
Opening a plug-in is the process of loading it into memory, validating it, and discovering its exposed symbols. A plug-in at a known location in the file system can be opened using the Open function, which returns a *Plugin value:
funcOpen(pathstring)(*Plugin,error){}
A plug-in symbol is any variable or function that’s exported by the plug-in’s package. Symbols can be retrieved by “looking them up” and are represented in the plugin package by the Symbol type:
typeSymbolany
Looking up describes the process of searching for and retrieving a symbol exposed by a plug-in. The plugin package’s Lookup method provides that functionality and returns a Symbol value:
func(p*Plugin)Lookup(symNamestring)(Symbol,error){}
~/cloud-native-go/ch08/go-plugin
├── duck
│ └── duck.go
├── frog
│ └── frog.go
└── main
└── main.go
typeSayerinterface{Says()string}
packagemaintypeduckstruct{}func(dduck)Says()string{return"quack!"}// Animal is exported as a symbol.varAnimalduck
$gobuild-buildmode=plugin-oduck/duck.soduck/duck.go
$fileduck/duck.soduck/duck.so:Mach-O64-bitdynamicallylinkedsharedlibraryx86_64
import("fmt""log""os""plugin")
iflen(os.Args)!=2{log.Fatal("usage: run main/main.go animal")}// Get the animal name, and build the path where we expect to// find the corresponding shared object (.so) file.name:=os.Args[1]module:=fmt.Sprintf("./%s/%s.so",name,name)
// Open our plugin and get a *plugin.Plugin.p,err:=plugin.Open(module)iferr!=nil{log.Fatal(err)}
// Lookup searches for a symbol named "Animal" in plug-in p.symbol,err:=p.Lookup("Animal")iferr!=nil{log.Fatal(err)}
typeSymbolany
// Asserts that the symbol interface holds a Sayer.animal,ok:=symbol.(Sayer)if!ok{log.Fatal("that's not a Sayer")}// Now we can use our loaded plug-in!fmt.Printf("A %s says: %q\n",name,animal.Says())
$gorunmain/main.goduckAducksays:"quack!"$gorunmain/main.gofrogAfrogsays:"ribbit!"
$gorunmain/main.gofoxAfoxsays:"ring-ding-ding-ding-dingeringeding!"
Because they’re separate processes, a panic in a plug-in doesn’t automatically crash the plug-in consumer.
Go plug-ins are famously version-specific. HashiCorp plug-ins are far less so, expecting only that plug-ins adhere to a contract. HashiCorp plug-ins also support explicit protocol versioning.
HashiCorp plug-ins have access to only the interfaces and parameters passed to them, as opposed to the entire memory space of the consuming process.
HashiCorp plug-ins require more boilerplate than Go plug-ins.
Because all data exchange with HashiCorp plug-ins occurs over RPC, they’re generally less performant than Go plug-ins.
~/cloud-native-go/ch08/hashicorp-plugin
├── commons
│ └── commons.go
├── duck
│ └── duck.go
└── main
└── main.go
typeSayerinterface{Says()string}
typeSayerPluginstruct{ImplSayer}func(SayerPlugin)Client(b*plugin.MuxBroker,c*rpc.Client)(any,error){return&SayerRPC{client:c},nil}func(p*SayerPlugin)Server(*plugin.MuxBroker)(any,error){return&SayerRPCServer{Impl:p.Impl},nil}
typeSayerRPCstruct{client*rpc.Client}func(g*SayerRPC)Says()string{varrespstringerr:=g.client.Call("Plugin.Says",new(any),&resp)iferr!=nil{panic(err)}returnresp}
varHandshakeConfig=plugin.HandshakeConfig{ProtocolVersion:1,MagicCookieKey:"BASIC_PLUGIN",MagicCookieValue:"hello",}
typeSayerRPCServerstruct{ImplSayer// Impl contains our actual implementation}func(s*SayerRPCServer)Says(argsany,resp*string)error{*resp=s.Impl.Says()returnnil}
packagemain
import("github.com/cloud-native-go/ch08/hashicorp-plugin/commons""github.com/hashicorp/go-plugin")
typeDuckstruct{}func(g*Duck)Says()string{return"Quack!"}
funcmain(){// Create and initialize our service implementation.sayer:=&Duck{}// pluginMap is the map of plug-ins we can dispense.varpluginMap=map[string]plugin.Plugin{"sayer":&commons.SayerPlugin{Impl:sayer},}plugin.Serve(&plugin.ServeConfig{HandshakeConfig:handshakeConfig,Plugins:pluginMap,})}
packagemainimport("fmt""log""os""os/exec""github.com/cloud-native-go/ch08/hashicorp-plugin/commons""github.com/hashicorp/go-plugin")
funcmain(){iflen(os.Args)!=2{log.Fatal("usage: run main/main.go animal")}// Get the animal name, and build the path where we expect to// find the corresponding executable file.name:=os.Args[1]module:=fmt.Sprintf("./%s/%s",name,name)// Does the file exist?_,err:=os.Stat(module)ifos.IsNotExist(err){log.Fatal("can't find an animal named",name)}}
// pluginMap is the map of plug-ins we can dispense.varpluginMap=map[string]plugin.Plugin{"sayer":&commons.SayerPlugin{},}// Launch the plugin process!client:=plugin.NewClient(&plugin.ClientConfig{HandshakeConfig:commons.HandshakeConfig,Plugins:pluginMap,Cmd:exec.Command(module),})deferclient.Kill()// Connect to the plugin via RPCrpcClient,err:=client.Client()iferr!=nil{log.Fatal(err)}
HandshakeConfigThe handshake configuration. This has to match the plug-in’s own handshake configuration or we’ll get an error in the next step.
PluginsA map that specifies the name and type of plug-in we want.
CmdAn *exec.Cmd value that represents the command for starting the plug-in
subprocess.
// Request the plug-in from the clientraw,err:=rpcClient.Dispense("sayer")iferr!=nil{log.Fatal(err)}// We should have a Sayer now! This feels like a normal interface// implementation, but is actually over an RPC connection.sayer:=raw.(commons.Sayer)// Now we can use our loaded plug-in!fmt.Printf("A %s says: %q\n",name,sayer.Says())}
The application proper, represented by the hexagon. This contains all of the business logic but has no direct reference to any technology, framework, or real-world device. The business logic shouldn’t depend on whether it exposes a REST or a gRPC API, or whether it gets data from a database or a .csv file. Its only view of the world should be through ports.
The ports and adapters are represented on the edge of the hexagon. Ports allow different kinds of actors to “plug in” and interact with the core service. Adapters can “plug into” a port and translate signals between the core application and an actor.
For example, your application might have a “data port” into which a “data adapter” might plug. One data adapter might write to a database, while another might use an in-memory datastore or automated test harness.
The actors can be anything in the environment that interacts with the core application (users, upstream services, etc.) or that the core application interacts with (storage devices, downstream services, etc.). They exist outside the hexagon.
Back in “Generation 1: The Monolith”, we implemented a REST frontend, and then in “Remote procedure calls with gRPC”, we implemented a separate gRPC frontend. We can describe these with a single “driver” port into which we’ll be able to plug either (or both!) as adapters.
In “What’s a Transaction Log?”, we created two implementations of a transaction log. These seem like a natural choice for a “driven” port and adapters.
~/cloud-native-go/ch08/hexarch/
├── core
│ └── core.go
├── frontend
│ ├── grpc.go
│ └── rest.go
├── main.go
└── transact
├── filelogger.go
└── pglogger.go
coreThe core key-value application logic. Importantly, it has no dependencies outside of the Go standard libraries.
frontendContains the REST and gRPC frontend driver adapters. These have a dependency on core.
transactContains the file and PostgreSQL transaction logger driven adapters. These also have a dependency on core.
main.goMakes the core application instance, into which it passes the driven components, and which it passes to the driver adapters.
typeTransactionLoggerinterface{WriteDelete(keystring)WritePut(key,valuestring)}
packagecoreimport("errors""log""sync")typeKeyValueStorestruct{mmap[string]stringtransactTransactionLogger}funcNewKeyValueStore(tlTransactionLogger)*KeyValueStore{return&KeyValueStore{m:make(map[string]string),transact:tl,}}func(store*KeyValueStore)Delete(keystring)error{delete(store.m,key)store.transact.WriteDelete(key)returnnil}func(store*KeyValueStore)Put(keystring,valuestring)error{store.m[key]=valuestore.transact.WritePut(key,value)returnnil}
funcNewTransactionLogger(loggerstring)(core.TransactionLogger,error){switchlogger{case"file":returnNewFileTransactionLogger(os.Getenv("TLOG_FILENAME"))case"postgres":returnNewPostgresTransactionLogger(PostgresDbParams{dbName:os.Getenv("TLOG_DB_HOST"),host:os.Getenv("TLOG_DB_DATABASE"),user:os.Getenv("TLOG_DB_USERNAME"),password:os.Getenv("TLOG_DB_PASSWORD"),})case"":returnnil,fmt.Errorf("transaction logger type not defined")default:returnnil,fmt.Errorf("no such transaction logger %s",s)}}
packagefrontendtypeFrontEndinterface{Start(kv*core.KeyValueStore)error}
packagefrontendimport("net/http""github.com/cloud-native-go/examples/ch08/hexarch/core""github.com/gorilla/mux")// restFrontEnd contains a reference to the core application logic,// and complies with the contract defined by the FrontEnd interface.typerestFrontEndstruct{store*core.KeyValueStore}// deleteHandler handles the logic for the DELETE HTTP method.func(f*restFrontEnd)deleteHandler(whttp.ResponseWriter,r*http.Request){vars:=mux.Vars(r)key:=vars["key"]err:=f.store.Delete(key)iferr!=nil{http.Error(w,err.Error(),http.StatusInternalServerError)return}}// ...other handler functions omitted for brevity.// Start includes the setup and start logic that previously// lived in a main function.func(f*restFrontEnd)Start(store*core.KeyValueStore)error{// Remember our core application reference.f.store=storer:=mux.NewRouter()r.HandleFunc("/v1/{key}",f.getHandler).Methods("GET")r.HandleFunc("/v1/{key}",f.putHandler).Methods("PUT")r.HandleFunc("/v1/{key}",f.deleteHandler).Methods("DELETE")returnhttp.ListenAndServe(":8080",r)}
packagemainimport("log""github.com/cloud-native-go/examples/ch08/hexarch/core""github.com/cloud-native-go/examples/ch08/hexarch/frontend""github.com/cloud-native-go/examples/ch08/hexarch/transact")funcmain(){// Create our TransactionLogger. This is an adapter that will plug// into the core application's TransactionLogger port.tl,err:=transact.NewTransactionLogger(os.Getenv("TLOG_TYPE"))iferr!=nil{log.Fatal(err)}// Create Core and tell it which TransactionLogger to use.// This is an example of a "driven agent"store:=core.NewKeyValueStore(tl)store.Restore()// Create the front end.// This is an example of a "driving agent."fe,err:=frontend.NewFrontEnd(os.Getenv("FRONTEND_TYPE"))iferr!=nil{log.Fatal(err)}log.Fatal(fe.Start(store))}
1 Ellen Ullman, “The Dumbing-Down of Programming”, Salon, May 12, 1998.
2 This is actually a pretty nuanced discussion. See “Service Architectures”.
3 Get off my lawn.
4 In XML, no less. We didn’t know any better at the time.
5 At Google, even the acronyms are recursive.
6 If you’re into that kind of thing.
7 If you wanted to be creative, this could be a FileListener or even a stdio stream.
8 Yes, I know the animal thing has been done before. Sue me.
9 So, naturally, we’re building a duck. Obviously.
10 Good riddance.
11 I’m sorry.
Safety work is today recognized as an economic necessity. It is the study of the right way to do things.
Robert W. Campbell, addressing the Third National Safety Council Congress & Expo (1914)
For want of a nail, the shoe was lost,
for want of a shoe, the horse was lost;
for want of a horse, the rider was lost;
all for want of care about a horse-shoe nail.
Benjamin Franklin, The Way to Wealth (1758)
Throttling is a relatively straightforward strategy that kicks in when requests come in faster than some predetermined frequency, typically by just refusing to handle them. This is often used as a preventative measure by ensuring that no particular user consumes more resources than they would reasonably require.
Load shedding is a little more adaptive. Services using this strategy intentionally drop (“shed”) some proportion of load as they approach overload conditions by either refusing requests or falling back into a degraded mode.
// Effector is the function that you want to subject to throttling.typeEffectorfunc(context.Context)(string,error)// Throttled wraps an Effector. It accepts the same parameters, plus a// "UID" string that represents a caller identity. It returns the same,// plus a bool that's true if the call is not throttled.typeThrottledfunc(context.Context,string)(bool,string,error)// A bucket tracks the requests associated with a UID.typebucketstruct{tokensuinttimetime.Time}// Throttle accepts an Effector function, and returns a Throttled// function with a per-UID token bucket with a capacity of max// that refills at a rate of refill tokens every d.funcThrottle(eEffector,maxuint,refilluint,dtime.Duration)Throttled{// buckets maps UIDs to specific bucketsbuckets:=map[string]*bucket{}returnfunc(ctxcontext.Context,uidstring)(bool,string,error){b:=buckets[uid]// This is a new entry! It passes. Assumes that capacity >= 1.ifb==nil{buckets[uid]=&bucket{tokens:max-1,time:time.Now()}str,err:=e(ctx)returntrue,str,err}// Calculate how many tokens we now have based on the time// passed since the previous request.refillInterval:=uint(time.Since(b.time)/d)tokensAdded:=refill*refillIntervalcurrentTokens:=b.tokens+tokensAdded// We don't have enough tokens. Return false.ifcurrentTokens<1{returnfalse,"",nil}// If we've refilled our bucket, we can restart the clock.// Otherwise, we figure out when the most recent tokens were added.ifcurrentTokens>max{b.time=time.Now()b.tokens=max-1}else{deltaTokens:=currentTokens-b.tokensdeltaRefills:=deltaTokens/refilldeltaTime:=time.Duration(deltaRefills)*db.time=b.time.Add(deltaTime)b.tokens=currentTokens-1}str,err:=e(ctx)returntrue,str,err}}
varthrottled=Throttle(getHostname,1,1,time.Second)funcgetHostname(ctxcontext.Context)(string,error){ifctx.Err()!=nil{return"",ctx.Err()}returnos.Hostname()}functhrottledHandler(whttp.ResponseWriter,r*http.Request){ok,hostname,err:=throttled(r.Context(),r.RemoteAddr)iferr!=nil{http.Error(w,err.Error(),http.StatusInternalServerError)return}if!ok{http.Error(w,"Too many requests",http.StatusTooManyRequests)return}w.WriteHeader(http.StatusOK)w.Write([]byte(hostname))}funcmain(){r:=mux.NewRouter()r.HandleFunc("/hostname",throttledHandler)log.Fatal(http.ListenAndServe(":8080",r))}
constMaxQueueDepth=1000// Middleware function, which will be called for each request.// If queue depth is exceeded, it returns HTTP 503 (service unavailable).funcloadSheddingMiddleware(nexthttp.Handler)http.Handler{returnhttp.HandlerFunc(func(whttp.ResponseWriter,r*http.Request){// CurrentQueueDepth is fictional and for example purposes only.ifCurrentQueueDepth()>MaxQueueDepth{log.Println("load shedding engaged")http.Error(w,err.Error(),http.StatusServiceUnavailable)return}next.ServeHTTP(w,r)})}funcmain(){r:=mux.NewRouter()// Register middlewarer.Use(loadSheddingMiddleware)log.Fatal(http.ListenAndServe(":8080",r))}
res,err:=SendRequest()forerr!=nil{res,err=SendRequest()}
res,err:=SendRequest()forerr!=nil{time.Sleep(2*time.Second)res,err=SendRequest()}
res,err:=SendRequest()base,cap:=time.Second,time.Minuteforbackoff:=base;err!=nil;backoff<<=1{ifbackoff>cap{backoff=cap}time.Sleep(backoff)res,err=SendRequest()}
res,err:=SendRequest()base,cap:=time.Second,time.Minuteforbackoff:=base;err!=nil;backoff<<=1{ifbackoff>cap{backoff=cap}jitter:=rand.Int63n(int64(backoff*3))sleep:=base+time.Duration(jitter)time.Sleep(sleep)res,err=SendRequest()}
funcUserName(ctxcontext.Context,idint)(string,error){constquery="SELECT username FROM users WHERE id=?"dctx,cancel:=context.WithTimeout(ctx,15*time.Second)defercancel()varusernamestringerr:=db.QueryRowContext(dctx,query,id).Scan(&username)returnusername,err}
funcUserGetHandler(whttp.ResponseWriter,r*http.Request){vars:=mux.Vars(r)id:=vars["id"]// Get the request's context. This context is canceled when// the client's connection closes, the request is canceled// (with HTTP/2), or when the ServeHTTP method returns.rctx:=r.Context()ctx,cancel:=context.WithTimeout(rctx,10*time.Second)defercancel()username,err:=UserName(ctx,id)switch{caseerrors.Is(err,sql.ErrNoRows):http.Error(w,"no such user",http.StatusNotFound)caseerrors.Is(err,context.DeadlineExceeded):http.Error(w,"database timeout",http.StatusGatewayTimeout)caseerr!=nil:http.Error(w,err.Error(),http.StatusInternalServerError)default:w.Write([]byte(username))}}
varclient=&http.Client{Timeout:10*time.Second,}response,err:=client.Get(url)
func(c*Client)Get(urlstring)(resp*Response,errerror){req,err:=NewRequest("GET",url,nil)iferr!=nil{returnnil,err}returnc.Do(req)}
typeClientContextstruct{http.Client}func(c*ClientContext)GetContext(ctxcontext.Context,urlstring)(resp*http.Response,errerror){req,err:=http.NewRequestWithContext(ctx,"GET",url,nil)iferr!=nil{returnnil,err}returnc.Do(req)}
funcmain(){client:=&ClientContext{}ctx,cancel:=context.WithTimeout(context.Background(),5*time.Second)defercancel()response,err:=client.GetContext(ctx,"http://www.example.com")iferr!=nil{log.Fatal(err)}bytes,_:=io.ReadAll(response.Body)fmt.Println(string(bytes))}
$ go run . 2020/08/25 14:03:16 Get "http://www.example.com": context deadline exceeded exit status 1
opts:=[]grpc.DialOption{grpc.WithInsecure(),grpc.WithBlock(),grpc.WithTimeout(5*time.Second),}conn,err:=grpc.Dial(serverAddr,opts...)
funcTimeoutKeyValueGet()*pb.Response{// Use context to set a 5-second timeout.ctx,cancel:=context.WithTimeout(context.Background(),5*time.Second)defercancel()// We can still set other options as desired.opts:=[]grpc.DialOption{grpc.WithInsecure(),grpc.WithBlock()}conn,err:=grpc.DialContext(ctx,serverAddr,opts...)iferr!=nil{log.Fatalf(err)}deferconn.Close()client:=pb.NewKeyValueClient(conn)// We can reuse the same Context in the client calls.response,err:=client.Get(ctx,&pb.GetRequest{Key:key})iferr!=nil{log.Fatalf(err)}returnresponse}
varstore=make(map[string]string)funcCreate(key,valuestring)error{if_,ok:=store[key];ok{returnerrors.New("duplicate key")}store[key]=valuereturnnil}funcUpdate(key,valuestring)error{if_,ok:=store[key];!ok{returnerrors.New("no such key")}store[key]=valuereturnnil}funcDelete(keystring)error{if_,ok:=store[key];ok{returnerrors.New("no such key")}delete(store,key)returnnil}
varstore=make(map[string]string)funcSet(key,valuestring){store[key]=value}funcDelete(keystring){delete(store,key)}
{"credit":{"accountID":12345,"amount":500}}
{"credit":{"accountID":12345,"amount":500,"transactionID":789}}
A liveness probe determines whether an application inside a container is alive and functioning correctly. Failure indicates that the application is in a broken state and must be restarted. Liveness probes are used to catch deadlocks or other issues where an application is running but is unable to make progress.
A readiness probe checks whether an application is fully initialized and ready to serve traffic. If a readiness probe fails, the orchestration system won’t send traffic to that container until it passes the probe again, though the container isn’t restarted. Readiness probes are particularly useful for applications with long initialization times.
Do little more than return a “success” signal. They make no additional attempt to determine the status of the service, and say nothing about the service except that it’s listening and reachable. Then again, sometimes this is enough. We’ll talk more about reachability checks in “Reachability checks”.
Go further than reachability checks by verifying that the service instance is likely to be able to function. These health checks test only local resources, so they’re unlikely to fail on many instances simultaneously, but they can’t say for certain whether a particular request service instance will be successful. We’ll wade into shallow health checks in “Shallow health checks”.
Provide a much better understanding of instance health, since they actually inspect the ability of a service instance to perform its function, which also exercises downstream resources like databases. While thorough, they can be expensive and are susceptible to false positives. We’ll dig into deep health checks in “Deep health checks”.
funchealthReachabilityHandler(whttp.ResponseWriter,r*http.Request){w.WriteHeader(http.StatusOK)w.Write([]byte("OK"))})funcmain(){r:=mux.NewRouter()http.HandleFunc("/health",healthReachabilityHandler)log.Fatal(http.ListenAndServe(":8080",r))}
funchealthShallowHandler(whttp.ResponseWriter,r*http.Request){// Create our test file.// This will create a filename like /tmp/shallow-123456tmpFile,err:=os.CreateTemp(os.TempDir(),"shallow-")iferr!=nil{http.Error(w,err.Error(),http.StatusServiceUnavailable)return}deferos.Remove(tmpFile.Name())// Make sure that we can write to the file.text:=[]byte("Check.")if_,err=tmpFile.Write(text);err!=nil{http.Error(w,err.Error(),http.StatusServiceUnavailable)return}// Make sure that we can close the file.iferr:=tmpFile.Close();err!=nil{http.Error(w,err.Error(),http.StatusServiceUnavailable)return}// We got this far -- we're healthy.w.WriteHeader(http.StatusOK)}funcmain(){r:=mux.NewRouter()http.HandleFunc("/health",healthShallowHandler)log.Fatal(http.ListenAndServe(":8080",r))}
funchealthDeepHandler(whttp.ResponseWriter,r*http.Request){// Retrieve the context from the request and add a 5-second timeoutctx,cancel:=context.WithTimeout(r.Context(),5*time.Second)defercancel()// service.GetUser is a hypothetical method on a service interface// that executes a database queryiferr:=service.GetUser(ctx,0);err!=nil{http.Error(w,err.Error(),http.StatusServiceUnavailable)return}// All good -- return OKw.WriteHeader(http.StatusOK)}funcmain(){r:=mux.NewRouter()http.HandleFunc("/health",healthDeepHandler)log.Fatal(http.ListenAndServe(":8080",r))}
SIGTERM (Signal terminate)Sent to a process to request its termination, allowing for a graceful shutdown. Can be caught and acted upon or simply ignored.
SIGINT (Signal interrupt)Sent to a process by its controlling terminal when a user wishes to interrupt the process, such as by pressing Ctrl+C.
SIGQUIT (Signal quit)Sent to a process by its controlling terminal when the user requests that the process quit and perform a core dump.
SIGKILL (Signal kill)Sent to cause a process to terminate immediately. SIGKILL cannot be caught or ignored, and the receiving process cannot perform any clean-up upon receiving this signal.
SIGHUP (Signal hangup)Originally designed to notify a process of a serial line drop (a hang-up), but now many services interpret this signal as a request to reload their configuration files and flush their log files.
// Create the signal channelc:=make(chanos.Signal,1)// Relay incoming SIGTERM, SIGINT, and SIGQUIT signals to the channelsignal.Notify(c,syscall.SIGTERM,syscall.SIGINT,syscall.SIGQUIT)// Wait for a signal<-c
funcNotifyContext(parentcontext.Context,signals...os.Signal)(context.Context,context.CancelFunc)
funcmain(){// Get a context that closes on SIGTERM, SIGINT, or SIGQUITctx,cancel:=signal.NotifyContext(context.Background(),syscall.SIGTERM,syscall.SIGINT,syscall.SIGQUIT)defercancel()server:=&http.Server{Addr:":8080"}// Register a cleanup function to be automatically// called when the server is shut downserver.RegisterOnShutdown(doCleanup)// Register the readiness and liveness probes.http.Handle("/ready",handleReadiness(ctx))http.Handle("/health",handleLiveness())// This goroutine will respond to context closure// by shutting down the servergofunc(){// Read from the context's Done channel// This operation will block until the context closes<-ctx.Done()log.Println("Got shutdown signal.")// Wait for the readiness probe to detect the failure<-time.After(5*time.Second)// Issue the shutdown proper. Don't pass the// already-closed Context value to it!iferr:=server.Shutdown(context.Background());err!=nil{log.Printf("Error while stopping HTTP listener: %s",err)}}()// Begin listening on :8080log.Println(server.ListenAndServe())}funchandleReadiness(ctxcontext.Context)http.Handler{f:=func(whttp.ResponseWriter,r*http.Request){select{case<-ctx.Done():w.WriteHeader(http.StatusServiceUnavailable)default:w.WriteHeader(http.StatusOK)}}returnhttp.HandlerFunc(f)}
1 “Summary of the Amazon DynamoDB Service Disruption and Related Impacts in the US-East Region”, Amazon AWS, September 2015.
2 Richard I. Cook, “How Complex Systems Fail”, 1998.
3 If you’re interested in a complete academic treatment, I highly recommend Reliability and Availability Engineering by Kishor S. Trivedi and Andrea Bobbio (Cambridge University Press, 2017).
4 Importantly, many faults are evident only in retrospect.
5 See? We eventually got there.
6 Go on, ask me how I know this.
7 Especially if the service is available on the open sewer that is the public internet.
8 Wikipedia contributors, “Token bucket”, Wikipedia, June 5, 2019.
9 Available in the associated GitHub repository is the code used to simulate all data in this section.
10 Doing that here felt redundant, but I’ll admit that I may have gotten a bit lazy.
11 And, technically, request-scoped values, but the correctness of this functionality is debatable.
12 Roy Fielding et al., “Hypertext Transfer Protocol—HTTP/1.1”, Proposed Standard, RFC 2068, June 1997.
13 Tim Berners-Lee et al., “Hypertext Transfer Protocol—HTTP/1.0”, Informational, RFC 1945, May 1996.
14 Roy Fielding, “Architectural Styles and the Design of Network-Based Software Architectures”, PhD dis., University of California, Irvine, 2000, pp. 76–106.
15 You monster.
16 Building Secure and Reliable Systems: Best Practices for Designing, Implementing, and Maintaining Systems by Heather Adkins—and a host of other authors (O’Reilly)—is one excellent example.
17 Brace yourself. We’re going in.
18 This assumes that the failure rates of the components are absolutely independent, which is very unlikely in the real world. Treat as you would spherical cows in a vacuum.
19 Cindy Sridharan (@copyconstruct), “Health checks are like bloom filters…”, Twitter (now X), August 5, 2018.
20 Though I’ve seen it happen.
21 It’s an imaginary function, so let’s just agree that that’s true.
22 There’s a good reason why this book is subtitled “Building Reliable Services in Unreliable Environments.”
Everyone knows that debugging is twice as hard as writing a program in the first place. So if you’re as clever as you can be when you write it, how will you ever debug it?1
Brian Kernighan, The Elements of Programming Style (1978)
It’s important that setting up and configuring a system—and each of its components—should be easily configurable for optimal availability and performance. Some systems need regular or real-time control, so having the right “knobs and levers” is absolutely fundamental. This is where we’ll focus most of our attention in this chapter.
These functions keep track of the system’s ability to do its job and are critical to effective system management. After all, without them, how would we know when our system requires management? As vital as these features are to manageability, we won’t discuss them in this chapter. Instead, they get an entire chapter of their own in Chapter 11.
Even in the absence of code changes, the ability to easily deploy, update, roll back, and scale system components is valuable, especially when there are many systems to manage. Obviously, this is useful during the initial deployment, but it comes into effect throughout a system’s lifetime any time it has to be updated. Fortunately, its lack of external runtimes and singular executable artifacts make this an area in which Go excels.
A key feature of cloud native systems is their distributed nature. It’s critical that components be able to quickly and accurately detect one another, a function called service discovery. Since service discovery is an architectural feature rather than a programmatic one, we won’t go too deeply into it in this book.
Store configuration in the environment.
Configuration—anything that’s likely to vary between environments—should always be cleanly separated from the code. While configuration can vary substantially across deploys, code does not. Configuration shouldn’t be baked into the code. Ever.
Storing configurations in version control—separately from the code—allows you to quickly roll back a configuration change if necessary and aids system re-creation and restoration. Some deployment frameworks, like Kubernetes, make this distinction naturally and relatively seamlessly by providing configuration primitives like the ConfigMap.
Yes, I’m repeating myself, but this bears repeating. Configuration files should be stored in version control before being deployed to the system. This makes it possible to review them before deployment, to quickly reference them afterward, and to quickly roll back a change if necessary. It’s also helpful if (and when) you need to re-create and restore your system.
Write your configuration files using a standard format like JSON, YAML, or TOML. We’ll cover some of these later in the chapter. If you must roll your own format, be sure that you’re comfortable with the idea of maintaining it—and forcing any future maintainers to deal with it—forever.
Don’t use nonzero default values unnecessarily. This is actually a good rule in general; there’s even a “Go proverb” about it.8 Whenever possible, the behavior that results from an undefined configuration should be acceptable, reasonable, and unsurprising. A simple, minimal configuration makes errors less likely.
name:=os.Getenv("NAME")place:=os.Getenv("CITY")fmt.Printf("%s lives in %s.\n",name,place)
ifval,ok:=os.LookupEnv(key);ok{fmt.Printf("%s=%s\n",key,val)}else{fmt.Printf("%s not set\n",key)}
packagemainimport("flag""fmt")funcmain(){// Declare a string flag with a default value "foo"// and a short description. It returns a string pointer.strp:=flag.String("string","foo","a string")// Declare number and Boolean flags, similar to the string flag.intp:=flag.Int("number",42,"an integer")boolp:=flag.Bool("boolean",false,"a boolean")// Call flag.Parse() to execute command-line parsing.flag.Parse()// Print the parsed options and trailing positional arguments.fmt.Println("string:",*strp)fmt.Println("integer:",*intp)fmt.Println("boolean:",*boolp)fmt.Println("args:",flag.Args())}
$ go run . -help
Usage of /var/folders/go-build618108403/exe/main:
-boolean
a boolean
-number int
an integer (default 42)
-string string
a string (default "foo")
$ go run . -boolean -number 27 -string "A string." Other things. string: A string. integer: 27 boolean: true args: [Other things.]
packagemainimport("fmt""os""github.com/spf13/cobra")varstrpstringvarintpintvarboolpboolvarrootCmd=&cobra.Command{Use:"flags",Long:"A simple flags experimentation command, built with Cobra.",Run:flagsFunc,}funcinit(){rootCmd.Flags().StringVarP(&strp,"string","s","foo","a string")rootCmd.Flags().IntVarP(&intp,"number","n",42,"an integer")rootCmd.Flags().BoolVarP(&boolp,"boolean","b",false,"a boolean")}funcflagsFunc(cmd*cobra.Command,args[]string){fmt.Println("string:",strp)fmt.Println("integer:",intp)fmt.Println("boolean:",boolp)fmt.Println("args:",args)}funcmain(){iferr:=rootCmd.Execute();err!=nil{fmt.Println(err)os.Exit(1)}}
$ go run . --help A simple flags experimentation command, built with Cobra. Usage: flags [flags] Flags: -b, --boolean a boolean -h, --help help for flags -n, --number int an integer (default 42) -s, --string string a string (default "foo")
$ go run . --boolean --number 27 --string "A string." Other things. string: A string. integer: 27 boolean: true args: [Other things.]
varflagsCmd=&cobra.Command{Use:"flags",Short:"Experiment with flags",Long:"A simple flags experimentation command, built with Cobra.",Run:flagsFunc,}varrootCmd=&cobra.Command{Use:"cng",Long:"A super simple command.",}
funcinit(){flagsCmd.Flags().StringVarP(&strp,"string","s","foo","a string")flagsCmd.Flags().IntVarP(&intp,"number","n",42,"an integer")flagsCmd.Flags().BoolVarP(&boolp,"boolean","b",false,"a boolean")rootCmd.AddCommand(flagsCmd)}
$ go run . --help A super simple command. Usage: cng [command] Available Commands: flags Experiment with flags help Help about any command Flags: -h, --help help for cng Use "cng [command] --help" for more information about a command.
$ go run . help flags A simple flags experimentation command, built with Cobra. Usage: cng flags [flags] Flags: -b, --boolean a boolean -h, --help help for flags -n, --number int an integer (default 42) -s, --string string a string (default "foo")
Can be mapped to corresponding fields in a specific struct type. For example, a configuration that contains the attribute host: localhost, could be unmarshalled into a struct type that has a Host string field.
Can be decoded and unmarshalled into one or more, possibly nested, maps of type map[string]any. This can be convenient when you’re working with arbitrary configurations, but it’s awkward to work with.
typeConfigstruct{HoststringPortuint16Tagsmap[string]string}
funcMarshal(vany)([]byte,error){}
c:=Config{Host:"localhost",Port:1313,Tags:map[string]string{"env":"dev"},}bytes,err:=json.Marshal(c)fmt.Println(string(bytes))
{"Host":"localhost","Port":1313,"Tags":{"env":"dev"}}
funcMarshalIndent(vany,prefix,indentstring)([]byte,error){}
bytes,err:=json.MarshalIndent(c,""," ")fmt.Println(string(bytes))
{"Host":"localhost","Port":1313,"Tags":{"env":"dev"}}
funcUnmarshal(data[]byte,vany)error{}
c:=Config{}
bytes:=[]byte(`{"Host":"127.0.0.1","Port":1234,"Tags":{"foo":"bar"}}`)err:=json.Unmarshal(bytes,&c)
{127.0.0.1 1234 map[foo:bar]}
c:=Config{}bytes:=[]byte(`{"Host":"127.0.0.1", "Food":"Pizza"}`)err:=json.Unmarshal(bytes,&c)
{127.0.0.1 0 map[]}
{"Host":"","Port":0,"Tags":null}
typeUserstruct{Namestring`example:"name"`}
By default, a struct field will case-sensitively map to a JSON key of the exact same name. A tag overrides this default name by setting the first (or only) value in the tag’s options list.
Example: CustomKey string `json:"custom_key"`
By default, a field will always appear in the JSON, even if it’s empty. Using the omitempty option will cause fields to be skipped if they contain a zero-value. Note the leading comma in front of omitempty!
Example: OmitEmpty string `json:",omitempty"`
Fields using the - (dash) option will always be completely ignored during encoding and decoding.
Example: IgnoredName string `json:"-"`
typeTaggedstruct{// CustomKey will appear in JSON as the key "custom_key".CustomKeystring`json:"custom_key"`// OmitEmpty will appear in JSON as "OmitEmpty" (the default),// but will only be written if it contains a nonzero value.OmitEmptystring`json:",omitempty"`// IgnoredName will always be ignored.IgnoredNamestring`json:"-"`// TwoThings will appear in JSON as the key "two_things",// but only if it isn't empty.TwoThingsstring`json:"two_things,omitempty"`}
funcMarshal(vany)([]byte,error){}
c:=Config{Host:"localhost",Port:1313,Tags:map[string]string{"env":"dev"},}bytes,err:=yaml.Marshal(c)
host:localhostport:1313tags:env:dev
funcUnmarshal(data[]byte,vany)error{}
// Caution: Indent this YAML with spaces, not tabs.bytes:=[]byte(`host: 127.0.0.1port: 1234tags:foo: bar`)c:=Config{}err:=yaml.Unmarshal(bytes,&c)
{127.0.0.1 1234 map[foo:bar]}
Fields using the flow option will be marshalled using the flow style, which can be useful for structs, sequences, and maps.
Example: Flow map[string]string `yaml:"flow"`
The inline option causes all of a struct or map fields or keys to be processed as if they were part of the outer struct. For maps, keys must not conflict with the keys of other struct fields.
Example: Inline map[string]string `yaml:",inline"`
typeTaggedMorestruct{// Flow will be marshalled using a "flow" style// (useful for structs, sequences and maps).Flowmap[string]string`yaml:"flow"`// Inlines a struct or a map, causing all of its fields// or keys to be processed as if they were part of the outer// struct. For maps, keys must not conflict with the yaml// keys of other struct fields.Inlinemap[string]string`yaml:",inline"`}
varconfigConfig
funcloadConfiguration(filepathstring)(Config,error){dat,err:=os.ReadFile(filepath)// Ingest file as []byteiferr!=nil{returnConfig{},err}config:=Config{}err=yaml.Unmarshal(dat,&config)// Do the unmarshaliferr!=nil{returnConfig{},err}returnconfig,nil}
funcstartListening(updates<-chanstring,errors<-chanerror){for{select{casefilepath:=<-updates:c,err:=loadConfiguration(filepath)iferr!=nil{log.Println("error loading config:",err)continue}config=ccaseerr:=<-errors:log.Println("error watching config:",err)}}}
funcinit(){updates,errors,err:=watchConfig("config.yaml")iferr!=nil{panic(err)}gostartListening(updates,errors)}
funcwatchConfig(filepathstring)(<-chanstring,<-chanerror,error){}
funccalculateFileHash(filepathstring)(string,error){file,err:=os.Open(filepath)// Open the file for readingiferr!=nil{return"",err}deferfile.Close()// Be sure to close your file!hash:=sha256.New()// Use the Hash in crypto/sha256if_,err:=io.Copy(hash,file);err!=nil{return"",err}sum:=fmt.Sprintf("%x",hash.Sum(nil))// Get encoded hash sumreturnsum,nil}
funcwatchConfig(filepathstring)(<-chanstring,<-chanerror,error){errs:=make(chanerror)changes:=make(chanstring)hash:=""gofunc(){ticker:=time.NewTicker(time.Second)forrangeticker.C{newhash,err:=calculateFileHash(filepath)iferr!=nil{errs<-errcontinue}ifhash!=newhash{hash=newhashchanges<-filepath}}}()returnchanges,errs,nil}
funcwatchConfigNotify(filepathstring)(<-chanstring,<-chanerror,error){changes:=make(chanstring)watcher,err:=fsnotify.NewWatcher()// Get an fsnotify.Watcheriferr!=nil{returnnil,nil,err}err=watcher.Add(filepath)// Tell watcher to watchiferr!=nil{// our config filereturnnil,nil,err}gofunc(){changes<-filepath// First is ALWAYS a changeforevent:=rangewatcher.Events{// Range over watcher eventsifevent.Op&fsnotify.Write==fsnotify.Write{changes<-event.Name}}}()returnchanges,watcher.Errors,nil}
This takes precedence over all other methods and can be useful during testing.
Viper is designed to be a companion to Cobra, which we introduced in “The Cobra command-line parser”.
Viper has full support for environment variables. Importantly, Viper treats environment variables as case-sensitive!
Out of the box, Viper supports JSON and YAML with the packages we introduced previously, as well as TOML, HCL, INI, envfile, and Java Properties files. It can also write configuration files to help bootstrap your configurations and even optionally supports live watching and rereading of configuration files.
Viper can access key-value stores like etcd or Consul and can watch them for changes.
viper.Set("Verbose",true)viper.Set("LogFile",LogFile)
varrootCmd=&cobra.Command{/* omitted for brevity */}funcinit(){rootCmd.Flags().IntP("number","n",42,"an integer")viper.BindPFlag("number",rootCmd.Flags().Lookup("number"))}
n:=viper.GetInt("number")
viper.BindEnv("id")// Bind "id" to var "ID"viper.BindEnv("port","SERVICE_PORT")// Bind "port" to var "SERVICE_PORT"id:=viper.GetInt("id")id:=viper.GetInt("port")
viper.SetConfigName("config")// Optional if the config has a file extensionviper.SetConfigType("yaml")viper.AddConfigPath("/etc/service/")viper.AddConfigPath("$HOME/.service")viper.AddConfigPath(".")iferr:=viper.ReadInConfig();err!=nil{panic(fmt.Errorf("fatal error reading config: %w",err))}
viper.WatchConfig()viper.OnConfigChange(func(efsnotify.Event){fmt.Println("Config file changed:",e.Name)})
import_"github.com/spf13/viper/remote"
funcAddRemoteProvider(provider,endpoint,pathstring)error{}
viper.AddRemoteProvider("etcd","http://127.0.0.1:4001","/config/service.json")viper.SetConfigType("json")err:=viper.ReadRemoteConfig()
viper.BindEnv("id")// Will be upper-cased automaticallyviper.SetDefault("id","13")// Default value is "13"id1:=viper.GetInt("id")fmt.Println(id1)// 13os.Setenv("ID","50")// Explicitly set the envvarid2:=viper.GetInt("id")fmt.Println(id2)// 50
funcgetHandler(whttp.ResponseWriter,r*http.Request){vars:=mux.Vars(r)// Retrieve "key" from the requestkey:=vars["key"]value,err:=Get(key)// Get value for keyiferr!=nil{// Unexpected error!http.Error(w,err.Error(),http.StatusInternalServerError)return}w.Write([]byte(value))// Write the value to the response}
// Set to true if you're working on the new storage backendconstuseNewStoragebool=false;funcgetHandler(whttp.ResponseWriter,r*http.Request){vars:=mux.Vars(r)key:=vars["key"]varvaluestringvarerrerrorifuseNewStorage{value,err=NewGet(key)}else{value,err=Get(key)}iferr!=nil{http.Error(w,err.Error(),http.StatusInternalServerError)return}w.Write([]byte(value))}
funcgetHandler(whttp.ResponseWriter,r*http.Request){vars:=mux.Vars(r)key:=vars["key"]varvaluestringvarerrerrorifFeatureEnabled("use-new-storage",r){value,err=NewGet(key)}else{value,err=Get(key)}iferr!=nil{http.Error(w,err.Error(),http.StatusInternalServerError)return}w.Write([]byte(value))}funcFeatureEnabled(flagstring,r*http.Request)bool{returnviper.GetBool(flag)}
typeEnabledfunc(flagstring,r*http.Request)(bool,error)
// The list of CIDR ranges associated with internal networks.varprivateCIDRs[]*net.IPNet// We use an init function to load the privateCIDRs slice.funcinit(){for_,cidr:=range[]string{"10.0.0.0/8","172.16.0.0/12","192.168.0.0/16",}{_,block,_:=net.ParseCIDR(cidr)privateCIDRs=append(privateCIDRs,block)}}// fromPrivateIP receives the flag name (which it ignores) and the// request. If the request's remote IP is in a private range per// RFC1918, it returns true.funcfromPrivateIP(flagstring,r*http.Request)(bool,error){// Grab the host portion of the request's remote addressremoteIP,_,err:=net.SplitHostPort(r.RemoteAddr)iferr!=nil{returnfalse,err}// Turn the remote address string into a *net.IPNetip:=net.ParseIP(remoteIP)ifip==nil{returnfalse,errors.New("couldn't parse ip")}// Loopbacks are considered "private."ifip.IsLoopback(){returntrue,nil}// Search the CIDRs list for the IP; return true if found.for_,block:=rangeprivateCIDRs{ifblock.Contains(ip){returntrue,nil}}returnfalse,nil}
varenabledFunctionsmap[string]Enabledfuncinit(){enabledFunctions=map[string]Enabled{}enabledFunctions["use-new-storage"]=fromPrivateIP}
funcFeatureEnabled(flagstring,r*http.Request)bool{// Explicit flags take precedenceifviper.IsSet(flag){returnviper.GetBool(flag)}// Retrieve the flag function, if any. If none exists,// return falseenabledFunc,exists:=enabledFunctions[flag]if!exists{returnfalse}// We now have the flag function: call it and return// the resultresult,err:=enabledFunc(flag,r)iferr!=nil{log.Println(err)returnfalse}returnresult}
1 Brian W. Kernighan and P.J. Plauger, The Elements of Programming Style (McGraw-Hill, 1978).
2 Staff, America’s Test Kitchen, Perfect Pie: Your Ultimate Guide to Classic and Modern Pies, Tarts, Galettes, and More, America’s Test Kitchen, 2019.
3 They’re doing some pretty amazing things with genetic engineering. Don’t stop believing.
4 “Systems and Software Engineering: Vocabulary”, ISO/IEC/IEEE 24765:2010(E), December 15, 2010.
5 Radle Byron et al., “What Is Manageability?”, NI, National Instruments, July 1, 2024.
6 Or so I told my editors. Hi, Melissa!
7 This makes me sad. These are important topics, but we have to focus.
8 Rob Pike, “Go Proverbs”, Gopherfest, YouTube, November 18, 2015.
9 Neat trick, huh?
10 Well, like you.
11 Seriously, that really is what it stands for.
12 Also, I just love JSON so much.
13 Or that feature just hasn’t been implemented. I don’t know.
14 I’ve also seen “feature switch,” “feature flipper,” “feature toggle,” “conditional feature,” and more. The industry seems to have settled on “feature flag,” probably because the other names are just a little silly.
Data is not information, information is not knowledge, knowledge is not understanding, understanding is not wisdom.1
Clifford Stoll, High-Tech Heretic: Reflections of a Computer Contrarian
Distributed tracing (or simply tracing) follows a request as it propagates through a (typically distributed) system, allowing the entire end-to-end request flow to be reconstructed as a directed acyclic graph (DAG) called a trace. Analysis of these traces can provide insight into how a system’s components interact, making it possible to pinpoint failures and performance issues.
Distributed tracing will be discussed in more detail in “Distributed Tracing”.
Metrics involves the collection of numerical data points representing the state of various aspects of a system at specific points in time. Collections of data points, representing observations of the same subject at various times, are particularly useful for visualization and mathematical analysis and can be used to highlight trends, identify anomalies, and predict future behavior.
We’ll discuss more about metrics in “Metrics”.
Logging is the process of appending records of noteworthy events to an immutable record—the log—for later review or analysis. A log can take a variety of forms, from a continuously appended file on disk to a full-text search engine like Elasticsearch. Logs provides valuable, context-rich insight into application-specific events emitted by processes. However, it’s important that log entries are properly structured; not doing so can sharply limit their utility.
We’ll dive into logging in more detail in “Logging”.
The OpenTelemetry specifications describe the requirements and expectations for all OpenTelemetry APIs, SDKs, and data protocols.
Language-specific interfaces and implementations based on the specifications that can be used to add OpenTelemetry instrumentation to an application.
The concrete OpenTelemetry implementations that sit between the APIs and the Exporters, providing functionality such as (for example) state tracking and batching data for transmission. An SDK also offers a number of configuration options for behaviors such as request filtering and transaction sampling.
In-process SDK plug-ins that are capable of sending data to a specific destination, which may be local (such as a log file or stdout), or remote (such as Jaeger, or a commercial solution like Honeycomb or ServiceNow).7 Exporters decouple the instrumentation from the backend, making it possible to change destinations without having to reinstrument your code.
An optional, but very useful, vendor-agnostic service that can receive and process telemetry data before forwarding it to one or more destinations. It can be run either as a sidecar process alongside your application or as a standalone proxy elsewhere, providing greater flexibility for sending the application telemetry. This can be particularly useful in the kind of tightly controlled environments that are common in the enterprise.
A span describes a unit of work performed by a request, such as a fork in the execution flow or hop across the network, as it propagates through a system. Each span has an associated name, a start time, and a duration. They can be (and typically are) nested and ordered to model causal relationships.
A trace represents all of the events—individually represented as spans—that make up a request as it flows through a system. A trace may be thought of as a directed acyclic graph (DAG) of spans, or more concretely as a “stack trace” in which each span represents the work done by one component.
stdExporter,err:=stdouttrace.New(stdouttrace.WithPrettyPrint(),)
constjaegerEndpoint="localhost:4317"otlpExporter,err:=otlptracegrpc.New(context.Background(),otlptracegrpc.WithEndpoint(jaegerEndpoint),otlptracegrpc.WithInsecure(),)
otlptracegrpc.WithEndpointUsed to define the URL that points to the target Jaeger’s collector endpoint.
otlptracegrpc.WithInsecureDisables client transport security for the gRPC connection, just like grpc.WithInsecure back in “Implementing the gRPC client”. Don’t use insecure connections in production.
tp:=sdktrace.NewTracerProvider(sdktrace.WithResource(res),sdktrace.WithSyncer(stdExporter),sdktrace.WithSyncer(jaegerExporter),)
otel.SetTracerProvider(tp)
gtp:=otel.GetTracerProvider()
tr:=otel.GetTracerProvider().Tracer("fibonacci")
typeTracerinterface{Start(ctxcontext.Context,spanNamestring,opts...trace.SpanOption)(context.Context,trace.Span)}
constserviceName="foo"funcmain(){// EXPORTER SETUP OMITTED FOR BREVITY// Retrieve the Tracer from the otel TracerProvider.tr:=otel.GetTracerProvider().Tracer(serviceName)// Start the root span; receive a child context (which now// contains the trace ID), and a trace.Span.ctx,sp:=tr.Start(context.Background(),"main")defersp.End()// End completes the span.SomeFunction(ctx)}
funcSomeFunction(ctxcontext.Context){tr:=otel.GetTracerProvider().Tracer(serviceName)_,sp:=tr.Start(ctx,"SomeFunction")defersp.End()// Do something MAGICAL here!}
ctx,sp:=tr.Start(ctx,"attributesAtCreation",trace.WithAttributes(attribute.String("greeting","hello"),attribute.String("foo","bar"),))defersp.End()
answer:=LifeTheUniverseAndEverything()span.SetAttributes(attribute.Int("answer",answer))
serviceName="Fibonacci"serviceVersion="0.0.2"resources:=resource.NewWithAttributes(semconv.SchemaURL,semconv.ServiceName(serviceName),semconv.ServiceVersion(serviceVersion),)
resources,err:=resource.Merge(resource.Default(),resources)iferr!=nil{returnnil,fmt.Errorf("failed to merge resources: %w",err)}
provider:=sdktrace.NewTracerProvider(...sdktrace.WithResource(resources),)
span.AddEvent("Acquiring mutex lock")mutex.Lock()// Do something amazing.span.AddEvent("Releasing mutex lock")mutex.Unlock()
span.AddEvent("Canceled by external signal",attribute.Int("pid",1234),attribute.String("signal","SIGHUP"))
funcmain(){http.HandleFunc("/",helloGoHandler)log.Fatal(http.ListenAndServe(":3000",nil))}
funcNewHandler(handlerhttp.Handler,operationstring,opts...Option)http.Handler
funcmain(){http.Handle("/",otelhttp.NewHandler(http.HandlerFunc(helloGoHandler),"root"))log.Fatal(http.ListenAndServe(":3000",nil))}
funcmain(){r:=mux.NewRouter()r.Handle("/",otelhttp.NewHandler(http.HandlerFunc(helloGoHandler),"root"))log.Fatal(http.ListenAndServe(":3000",r))}
s:=grpc.NewServer()pb.RegisterKeyValueServer(s,&server{})
s:=grpc.NewServer(grpc.UnaryInterceptor(otelgrpc.UnaryServerInterceptor()),grpc.StreamInterceptor(otelgrpc.StreamServerInterceptor()),)pb.RegisterKeyValueServer(s,&server{})
funcprintSpanHandler(whttp.ResponseWriter,req*http.Request){ctx:=req.Context()// Get the request Contextspan:=trace.SpanFromContext(ctx)// Get the current spanfmt.Printf("current span: %v\n",span)// Why not print the span?}
This will do the Fibonacci computation proper—at the request of the service handler—by recursively calling itself, with each call generating its own span.
This is an HTTP handler function as defined by the net/http package, which will be used just like in “Building an HTTP Server with net/http” to receive the client request, call the service API, and return the result in the response.
In the main function, the OpenTelemetry exporters are created and registered, the service handler function is provided to the HTTP framework, and the HTTP server is started.
constserviceName="Fibonacci"funcFibonacci(ctxcontext.Context,nint)int{ctx,sp:=otel.GetTracerProvider().Tracer(serviceName).Start(ctx,"Fibonacci",trace.WithAttributes(attribute.Int("fibonacci.n",n)),)defersp.End()result:=1ifn>1{a:=Fibonacci(ctx,n-1)b:=Fibonacci(ctx,n-2)result=a+b}sp.SetAttributes(attribute.Int("fibonacci.result",result))returnresult}
funcfibHandler(whttp.ResponseWriter,req*http.Request){ctx:=req.Context()// Get the Span associated with the current context and// attach the parameter and result as attributes.sp:=trace.SpanFromContext(ctx)args:=req.URL.Query()["n"]iflen(args)!=1{msg:="wrong number of arguments"sp.SetStatus(codes.Error,msg)http.Error(w,msg,http.StatusBadRequest)return}sp.SetAttributes(attribute.String("fibonacci.argument",args[0]))n,err:=strconv.Atoi(args[0])iferr!=nil{msg:=fmt.Sprintf("couldn't parse index n: %s",err.Error())sp.SetStatus(codes.Error,msg)http.Error(w,msg,http.StatusBadRequest)return}sp.SetAttributes(attribute.Int("fibonacci.parameter",n))// Call the child function, passing it the request context.result:=Fibonacci(ctx,n)sp.SetAttributes(attribute.Int("fibonacci.result",result))// Finally, send the result back in the response.fmt.Fprintln(w,result)}
constjaegerEndpoint="localhost:4317"funcnewTracerProvider(ctxcontext.Context)(*sdktrace.TracerProvider,error){// Create and configure the OTLP exporter for JaegerotlpExporter,err:=otlptracegrpc.New(ctx,otlptracegrpc.WithEndpoint(jaegerEndpoint),otlptracegrpc.WithInsecure(),)iferr!=nil{returnnil,fmt.Errorf("failed to build OtlpExporter: %w",err)}// Create and configure the TracerProvider exporter using the// newly created exporters.tp:=sdktrace.NewTracerProvider(sdktrace.WithResource(res),sdktrace.WithBatcher(otlpExporter),)returntp,nil}
funcmain(){ctx,cancel:=context.Background()tp,err:=newTracerProvider(ctx)iferr!=nil{slog.ErrorContext(ctx,err.Error())return}// Handle shutdown properly so nothing leaksdeferfunc(){tp.Shutdown(ctx)}()// Registers tp as the global trace provider to allow// auto-instrumentation to access itotel.SetTracerProvider(tp)fmt.Println("Browse to localhost:3000?n=6")http.Handle("/",otelhttp.NewHandler(http.HandlerFunc(fibHandler),"root"))iferr:=http.ListenAndServe(":3000",nil);err!=nil{slog.ErrorContext(ctx,err.Error())return}}
$dockerrun-d--namejaeger\-p4317:4317\-p16686:16686\jaegertracing/all-in-one:1.52
$gorun.
$curllocalhost:3000?n=613
{"Name":"root","SpanContext":{"TraceID":"2fa1365e3fe3b603ab5268a64725c647","SpanID":"bc7f17daf0dc845c","TraceFlags":"01"},"Parent":{"TraceID":"00000000000000000000000000000000","SpanID":"0000000000000000","TraceFlags":"00"},"StartTime":"2024-01-01T13:44:24.600967-05:00","EndTime":"2024-01-01T13:44:24.601064604-05:00","Attributes":[{"Key":"fibonacci.parameter","Value":{"Type":"INT64","Value":6}},{"Key":"fibonacci.result","Value":{"Type":"INT64","Value":13}}]}
From an engineering standpoint, in reality, the question of push versus pull largely doesn’t matter. In either case, there’s advantages and disadvantages, and with engineering effort, you can work around both cases.14
prometheusExporter,err:=prometheus.New()
// Now we can register it as the otel meter provider.mp:=sdkmetric.NewMeterProvider(sdkmetric.WithReader(exporter))// Set it as the global meter provider.otel.SetMeterProvider(mp)
http.HandleTo register a handler function that implements the http.Handler interface
http.ListenAndServeTo start the server listening
// Register the exporter as the handler for the "/metrics" pattern.http.Handle("/metrics",promhttp.Handler())// Start the HTTP server listening on port 3000.log.Fatal(http.ListenAndServe(":3000",nil))
meter:=otel.GetMeterProvider().Meter("fibonacci")
Explicitly invoked to record a metric, as we’ll see in “Synchronous instruments”.
Also called observers, can monitor a specific property and are asynchronously called by the SDK during collection. We’ll demonstrate in “Asynchronous instruments”.
Track a non-negative value that can only increase,15 like the number of requests served, tasks completed, or errors observed.
Track a value that can be either incremented or decremented. They’re used for sums that can go up and down, like the current number of concurrent requests or active threads.
Track a value that can go arbitrarily up or down. They’re useful for measured values like temperatures or current memory usage.
Capture a distribution of observations, like request durations or response sizes, and count them in ranged buckets. They also provide a sum of all observed values.
| Instrument type | Synchronous | Asynchronous |
|---|---|---|
Counter |
|
|
Up-down counter |
|
|
Gauge |
|
|
Histogram |
|
|
func(mMeter)Int64Counter(namestring,options...Int64CounterOption)(Int64Counter,error)
// The requests counter instrument. As a synchronous instrument,// we'll need to keep it so we can use it later to record data.varrequestsmetric.Int64CounterfuncbuildRequestsCounter(metermetric.Meter)error{varerrerror// Get an Int64Counter for a metric called "fibonacci_requests_total".requests,err=meter.Int64Counter("fibonacci_requests_total",metric.WithDescription("Total number of Fibonacci requests."),)returnerr}
// Define our attributes here so that we can easily reuse them.varattributes=[]attribute.KeyValue{attribute.Key("application").String(serviceName),attribute.Key("container_id").String(os.Getenv("HOSTNAME")),}funcFibonacci(ctxcontext.Context,nint)chanint{// Use the Add method on our metric.Int64Counter instance// to increment the counter value.requests.Add(ctx,1,metric.WithAttributes(attributes...))// The rest of the function...}
func(mMeter)Int64UpDownSumObserver(namestring,options...Int64UpDownSumObserverOption)(Int64UpDownSumObserver,error)
typeFloat64Callbackfunc(context.Context,Float64Observer)errortypeInt64Callbackfunc(context.Context,metric.Int64Observer)error
funcbuildRuntimeObservers(metermetric.Meter)error{varerrerrorvarmruntime.MemStats_,err=meter.Int64ObservableUpDownCounter("fibonacci_memory_usage_bytes",metric.WithInt64Callback(func(_context.Context,resultmetric.Int64Observer)error{runtime.ReadMemStats(&m)result.Observe(int64(m.Sys),metric.WithAttributes(attributes...))returnnil}),metric.WithDescription("Amount of memory used."),)iferr!=nil{returnerr}_,err=meter.Int64ObservableGauge("fibonacci_num_goroutines",metric.WithInt64Callback(func(_context.Context,ometric.Int64Observer)error{o.Observe(int64(runtime.NumGoroutine()),metric.WithAttributes(attributes...))returnnil}),metric.WithDescription("Number of running goroutines."),)iferr!=nil{returnerr}returnnil}
$gorun.
scrape_configs:-job_name:fibonacciscrape_interval:5sstatic_configs:-targets:['host.docker.internal:3000']
dockerrun-d--nameprometheus\-p9090:9090\-v"${PWD}/prometheus.yml:/etc/prometheus/prometheus.yml"\prom/prometheus:v2.23.0
$curllocalhost:3000?n=613
$ curl localhost:3000/metrics
# HELP fibonacci_memory_usage_bytes Amount of memory used.
# TYPE fibonacci_memory_usage_bytes gauge
fibonacci_memory_usage_bytes{application="fibonacci",container_id=""} 5.791e+07
# HELP fibonacci_num_goroutines Number of running goroutines.
# TYPE fibonacci_num_goroutines gauge
fibonacci_num_goroutines{application="fibonacci",container_id=""} 6
# HELP fibonacci_requests_total Total number of Fibonacci requests.
# TYPE fibonacci_requests_total counter
fibonacci_requests_total{application="fibonacci",container_id=""} 21891
fibonacci_requests_total metric after three calls to the Fibonacci service.2024/01/09 02:15:10AM User 12345: GET /help in 23ms 2024/01/09 02:15:11AM Database error: connection reset by peer
{"time":1604888110,"level":"info","method":"GET","path":"/help","duration":23,"message":"Access"}{"time":1604888111,"level":"error","error":"connection reset by peer","database":"user","message":"Database error"}
timeA timestamp, which is a piece of contextual information that’s critical for tracking and correlating issues. Note that the JSON example is also in an easily parsable format that’s far less computationally expensive to extract meaning from than the first, barely structured example. When you’re processing billions of log events, little things add up.
levelA log level, which is a label that indicates the level of importance for the log event. Frequently used levels include INFO, WARN, and ERROR. These are also key for filtering out low-priority messages that might not be relevant in production.
These contain background information that provides insight into the state of the application at the time of the message. The entire point of a log event is to express this context information.
func(v...any)funcPrintf(formatstring,v...any)funcPrintln(v...any)
packagemainimport"log"funcmain(){log.("Hello, world!")}
$gorun.2024/01/1009:15:39Hello,world!
funcFatal(v...any)funcFatalf(formatstring,v...any)funcFatalln(v...any)
funcPanic(v...any)funcPanicf(formatstring,v...any)funcPanicln(v...any)
log.SetFlags(log.Ldate|log.Ltime|log.Lshortfile)
2024/01/10 10:14:36 main.go:7: Hello, world!
packagemainimport"log/slog"funcmain(){slog.Info("Hello, world!")}
$ go run . 2024/01/10 14:03:19 INFO Hello, world!
| Log level | Constant | Int value | Output function |
|---|---|---|---|
Debug |
|
-4 |
|
Info |
|
0 |
|
Warn |
|
4 |
|
Error |
|
8 |
|
slog.Log(context.TODO(),2,"Hello, world!")
2024/01/10 14:03:19 INFO+2 Hello, world!
logger:=slog.Default()logger.Info("Hello, world!")
slog.Info("Hello","number",3)
2024/01/10 23:00:00 INFO Hello number=3
slog.Info("Hello",slog.Int("number",3))
logger:=slog.Default()logger2:=logger.With("url",r.URL)
2024/01/10 23:00:00 INFO Hello number=3
logger:=slog.New(slog.NewJSONHandler(os.Stdout,nil))logger.Info("Hello, world!","number",3)
{"time":"2023-11-10T23:00:00Z","level":"INFO","msg":"Hello, world!","number":3}
1 Clifford Stoll, High-Tech Heretic: Reflections of a Computer Contrarian (Random House, 2000).
2 Interestingly, this was also just after AWS launched its Lambda functions as a service (FaaS) offering. Coincidence? Maybe.
3 Assuming, of course, that all of our network and platform configurations are correct!
4 I’m not one of the cool kids, but I still call it that anyway.
5 In addition to Go, implementations exist for Python, Java, JavaScript, C#/.NET, C++, Rust, PHP, Erlang/Elixir, Ruby, and Swift.
6 If you’ve never seen Charity Majors’ blog, I recommend that you check it out immediately. It’s one part genius plus one part experience, tied together with rainbows, cartoon unicorns, and a generous helping of rude language.
7 Formerly known as Lightstep.
8 Benjamin H. Sigelman et al., “Dapper, a Large-Scale Distributed Systems Tracing Infrastructure”, Google Technical Report, April 2010.
9 Which you probably do.
10 Recall that the name “mux” is short for “HTTP request multiplexer.”
11 That wins the record for longest package name, at least in this book.
12 It can also refer to the numerical relationship between two database tables (i.e., one-to-one, one-to-many, or many-to-many), but that definition is arguably less relevant here.
13 Whatever “better” means.
14 Oliver Kiran, “Exploring Prometheus Use Cases with Brian Brazil”, The New Stack Makers, October 30, 2016.
15 I’m quite fond of the technical term for such a value: monotonically increasing. Pure poetry.
16 Any wrapping in the example is for the benefit of formatting for presentation only. Don’t use line breaks in your log events if you can help it.
17 If they are, why are you producing them at all?
18 This is a Go book, after all. At least that’s what I keep telling my incredibly patient editors.
No technology that’s connected to the Internet is unhackable.1
Abhijit Naskar, The Gospel of Technology
Go’s garbage collection mechanism eliminates an entire class of vulnerabilities that are common in languages where manual memory management is required. So common, in fact, that the NSA placed Go on its list of languages recommended for their memory safety, alongside C#, Java, Python, Rust, and Swift.
Go’s type system minimizes common vulnerabilities like buffer overflows and type mismatches, which are common in dynamically typed languages.
Go is designed for simplicity and readability, which is further reinforced by a culture that actively discourages magic and cleverness. This clarity reduces the likelihood of security flaws going unnoticed.
Go’s concurrency model helps developers avoid common concurrency issues like race conditions that can be exploited in security attacks.
Go’s standard library includes well-implemented cryptographic packages, such as crypto/tls, that provide strong cryptographic primitives. These libraries are vetted by the community and are continually refined to address new security vulnerabilities and to adhere to best practices.
The introduction of Go Modules provided a reliable system for managing dependencies, reducing the risk of introducing security vulnerabilities via outdated or compromised dependencies.
Go includes several built-in tools like go vet and gosec that can aid in identifying security vulnerabilities early in the development cycle.
funcauthenticateUser(username,passwordstring,db*sql.DB)(bool,error){// Vulnerable SQL queryquery:="SELECT * FROM users WHERE username = '%s' AND password = '%s'"// Executing the SQL queryresults,err:=db.Query(fmt.Sprintf(query,username,password))iferr!=nil{returnfalse,fmt.Errorf("error executing query: %w",err)}deferresults.Close()// Check if any rows returnedreturnresults.Next(),nil}
SELECT*FROMusersWHEREusername='admin'--' AND password = 'foo'
packagemainimport"net/http"import"io"funcechoHandler(whttp.ResponseWriter,r*http.Request){io.WriteString(w,r.URL.Query().Get("param"))}funcmain(){http.HandleFunc("/",echoHandler)http.ListenAndServe(":8000",nil)}
http://example.com/get-file?file=report.pdf
http://example.com/get-file?file=../../../../etc/passwd
packagemainimport"net/http"funcmain(){dir:=http.Dir("/var/www/myapp/files/")// Define the allowed pathfs:=http.FileServer(dir)// Get the file server handlerhttp.Handle("/",fs)// Register the handlerhttp.ListenAndServe(":8080",nil)}
This was already hinted at in “To Trust, or Not to Trust”, but it’s worth spelling out: always identify all of your data sources (databases, file streams, user input,5 etc.) and classify them as either “trusted” or “untrusted.” Data from all untrusted sources should be carefully validated and sanitized.
All input validation should be performed on the server side (typically by the service itself) where the processing can’t be easily manipulated by a bad actor.
Ideally, there should be a single, centralized input validation routine. This ensures that the validation logic is consistent throughout the entire application and is easier to maintain and test.
If input fails validation, it should be immediately rejected—don’t try to correct it. This is important not only from a security standpoint but from the perspective of data consistency and integrity as well, since data is often used across multiple systems and applications.
packagemainimport("fmt""unicode/utf8")funcmain(){valid:="äǒů"invalid:=string([]byte{0xfd,0xfe,0xff})fmt.Println(valid,utf8.ValidString(valid))fmt.Println(invalid,utf8.ValidString(invalid))}
äǒů true ��� false
packagemainimport("fmt""net/url")funcCanonicalizeInput(instring)(string,error){varerrerrorvarprevstringvarcountintforin!=prev{prev=inifcount>10{return"",fmt.Errorf("too many escape layers")}ifin,err=url.QueryUnescape(in);err!=nil{return"",err}count++}returnin,nil}funcmain(){doubleEncoded:="%253Cfoo%253E"// Double-encoded "<foo>"canonicalized,err:=CanonicalizeInput(doubleEncoded)iferr!=nil{fmt.Println("Error canonicalizing:",err)return}fmt.Println("Canonicalized string:",canonicalized)}
packagemainimport"fmt"varallowedmap[rune]boolfuncinit(){allowed=make(map[rune]bool)forr:='A';r<='Z';r++{allowed[r]=true}forr:='a';r<='z';r++{allowed[r]=true}allowed[' ']=trueallowed[',']=true}funcValidateHazardous(strstring)error{fori,r:=rangestr{if!allowed[r]{returnfmt.Errorf("hazardous character at position %d",i)}}returnnil}funcmain(){iferr:=ValidateHazardous("Hello, Ğö");err!=nil{fmt.Println(err)}}
hazardous character at position 7
packagemainimport("fmt""strconv")funcmain(){strs:=[]string{"42","puppies"}for_,v:=rangestrs{i,err:=strconv.Atoi(v)fmt.Println(i,err)}}
42 <nil> 0 strconv.Atoi: parsing "puppies": invalid syntax
packagemainimport("fmt""github.com/microcosm-cc/bluemonday")funcmain(){// Unsanitized HTML inputin:=`<a onblur="alert(secret)" href="http://oreilly.com">O'Reilly</a>`// The strict policy will strip all elements and attributesp:=bluemonday.StrictPolicy()out:=p.Sanitize(in)// Sanitize the inputfmt.Println(out)// O'Reilly}
When dynamic content is inserted into HTML, characters like < and > should be encoded to their corresponding entities (< and >). This prevents any tags in the content from being interpreted as actual HTML tags. This is similar to the input sanitization method we covered in “Converting special characters to HTML entities”.
When incorporating user-provided data into SQL queries, it’s crucial to escape specific characters that could otherwise be misused to alter the SQL command, leading to SQL injection vulnerabilities. We’ll cover how to use parameterized queries to combat this case in “Parameterized Queries”.
In URLs, characters like spaces, ampersands, and slashes should be URL-encoded so that the URLs are interpreted correctly by web servers and browsers without accidentally delimiting URL parameters or changing the intended path. The standard net/url package includes the PathEscape and QueryEscape functions for this purpose, which escape a string by replacing any special characters with %XX sequences.
When inserting content into JavaScript code, characters like quotes and backslashes should be escaped (\", \', \\) to prevent breaking out of strings in scripts and inadvertently executing malicious code.
Grants access on the basis of a secret, typically a password or passphrase, that’s known only to the user and the system. It’s by far the most common form of authentication due to its familiarity, simplicity, and ease of implementation, but its efficacy depends heavily on users choosing strong passwords and keeping them confidential. We’ll cover password-based authentication in “Password-Based Authentication”.
In this system, a user enters their credentials one time and receives a digital token in exchange that then provides temporary access. This provides an additional layer of security, and the stateless nature of this approach makes it ideal for distributed systems where maintaining state is challenging. It does add a fair amount of complexity, though. We’ll cover token-based authentication in “Token-Based Authentication”.
This uses digital certificates to verify the identity of users, devices, or services, leveraging public key infrastructure (PKI) for enhanced security. This method involves issuing a certificate by a trusted authority after verifying identity credentials, and it uses cryptographic techniques to ensure secure, mutual authentication between parties without transmitting sensitive information.
This combines multiple authentication methods (something you know, something you have, something you are) to verify identity. MFA is more complex to implement than any one form of authentication, and users don’t exactly love having to authenticate more than once, but it’s the de facto standard for user authentication because multiple authentication mechanisms have to be broken to circumvent it. We’re not going to cover MFA explicitly, but you should absolutely know about it.
These attacks attempt to guess every possible combination of passwords until the correct one is found. This kind of attack is very resource-intensive, and can be mitigated by implementing account lockout policies, strong password policies, and CAPTCHA.
These attacks are similar to brute force attacks, except that they use a list of common passwords instead of all possible combinations. Dictionary attacks can be foiled using the same techniques as brute force attacks, but especially by using complex passwords that are not simple words or common password combinations.
These attacks use a precomputed table of hashes for a very large set of character combinations, which an attacker uses to quickly discover what passwords correspond to a given hash (hashes will be introduced in the next section). This method is effective against systems that use unsalted hashes for storing passwords.
These attacks attempt to use stolen account credentials to gain access to user accounts on another unrelated system. These attacks depend on the fact that users often reuse their passwords across multiple services, but they can be mitigated by enforcing MFA.
GenerateFromPassword(password []byte, cost int) ([]byte, error)Accepts a password string and a “cost” value (more on this soon) and returns the bcrypt hash of the password.
CompareHashAndPassword(hashedPassword, password []byte) errorCompares a bcrypt-hashed password with its possible plain-text equivalent and returns nil if they match or an error value if they don’t.
packagemainimport("fmt""golang.org/x/crypto/bcrypt")funcmain(){// The password bytespassword:=[]byte("password123")// Apply the bcrypt algorithm with the default costhash,_:=bcrypt.GenerateFromPassword(password,bcrypt.DefaultCost)fmt.Println("Password:",string(password))fmt.Println("Hashed: ",string(hash))// Does the hash match the original password?iferr:=bcrypt.CompareHashAndPassword(hash,password);err!=nil{fmt.Println("Result: Password mismatch")}else{fmt.Println("Result: Password match!")}}
Password: password123 Hashed: $2a$10$lryHYRUgH4ZP7GGy.HZTY.01HlY5hxqxMS7rT7iL3VGxgOIlZepJa Result: Password match!
Specifies the token type and the algorithm used for the signature
Contains the claims
Used to verify the token and ensure that it hasn’t been tampered with
hhhhhh.pppppp.ssssss
{"alg":"HS256","typ":"JWT"}
{"iat":1715524558,"name":"Matt Titmus"}
HMACSHA256(secret, base64UrlEncode(header) + "." + base64UrlEncode(payload))
eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9. eyJpYXQiOjE3MTU1MjQ1NTgsIm5hbWUiOiJNYXR0IFRpdG11cyJ9. 2NS8o1huBkPZ-3-7W2LrgmOunzOjrW5LR7foC2ypJO0
// Secret is used for signing our token.// Never, ever check secrets into source code.varsecret=[]byte(os.Getenv("JWT_KEY"))funcbuildToken(usernamestring)(string,error){issuedAt:=time.Now()expirationTime:=issuedAt.Add(time.Hour)// Define our claims mapclaims:=jwt.MapClaims{"iat":issuedAt.Unix(),"exp":expirationTime.Unix(),"name":username,}// Create a new token, specifying its signing method and claims.token:=jwt.NewWithClaims(jwt.SigningMethodHS256,claims)// Sign and get the complete encoded token as a string using the secret.returntoken.SignedString(secret)}
funcmain(){http.HandleFunc("/authenticate",authenticateHandler)http.ListenAndServe(":8000",nil)}funcauthenticateHandler(whttp.ResponseWriter,r*http.Request){// This is required to populate r.Form.iferr:=r.ParseForm();err!=nil{w.WriteHeader(http.StatusBadRequest)return}// Retrieve and validate the POST-ed credentials.username:=r.Form.Get("username")password:=r.Form.Get("password")// Authenticate the password, responding to errors appropriately.valid,err:=authenticatePassword(username,password)iferr!=nil{w.WriteHeader(http.StatusInternalServerError)return}elseif!valid{w.WriteHeader(http.StatusUnauthorized)return}// Password is valid; build a new token using our buildToken function.tokenString,err:=buildToken(username)iferr!=nil{w.WriteHeader(http.StatusInternalServerError)return}// Respond with the new token string.fmt.Fprint(w,tokenString)}
curl-XPOST-d"username=user"-d"password=SuperSecret"http://localhost:8000/authenticate
typeCustomClaimsstruct{Namestring`json:"name"`jwt.RegisteredClaims}
// Secret is used for verifying our token.// Never, ever check secrets into source code.varsecret=[]byte(os.Getenv("JWT_KEY"))funckeyFunc(token*jwt.Token)(any,error){returnsecret,nil}
funcverifyToken(tokenStringstring)(*CustomClaims,error){token,err:=jwt.ParseWithClaims(tokenString,&CustomClaims{},keyFunc)iferr!=nil{returnnil,err}claims,ok:=token.Claims.(*CustomClaims)if!ok{returnnil,fmt.Errorf("unknown claims type")}returnclaims,nil}
funcverifyHandler(whttp.ResponseWriter,r*http.Request){header:=r.Header.Get("Authorization")token:=strings.TrimPrefix(header,"Bearer ")ifheader==""||token==header{log.Println("Missing authorization header")w.WriteHeader(http.StatusUnauthorized)return}claims,err:=verifyToken(token)iferr!=nil{log.Println("invalid authorization token:",err)w.WriteHeader(http.StatusUnauthorized)return}fmt.Fprintf(w,"Welcome back, %s!\n",claims.Name)}
packagemainimport("log""net""google.golang.org/grpc""google.golang.org/grpc/credentials")funcmain(){// Specify the credentials to usecreds,err:=credentials.NewServerTLSFromFile("cert.pem","key.pem")iferr!=nil{log.Fatalf("failed to setup tls: %v",err)}// Create a gRPC server and register our server with its:=grpc.NewServer(grpc.Creds(creds))pb.RegisterKeyValueServer(s,&server{})// Open a listening port on 50051lis,err:=net.Listen("tcp",":50051")iferr!=nil{log.Fatalf("failed to listen: %v",err)}// Start accepting connections on the listening portiferr:=s.Serve(lis);err!=nil{log.Fatalf("failed to serve: %v",err)}}
Hashes can be generated of any sized input to make tampering or changes evident.
As mentioned in “Hashing passwords”, systems will typically store hashes of passwords instead of the passwords themselves. When a user logs in, the system hashes the password entered and compares it to the stored hash.
Hash functions are used to generate checksums for data blocks being transmitted over a network. The checksum helps ensure that the data has not been altered during transmission, allowing for error detection.
Each block in a blockchain is connected to the previous one via a cryptographic hash, ensuring the security and immutability of the entire chain.
Hashes can be used to generate unique identifiers for large sets of data, which are useful in databases for quick data retrieval without data conflicts.
Hashing is used in various cryptographic protocols to commit to a chosen value while keeping it hidden, only revealing the value at a later time. This technique ensures that the commitment is kept unchanged once made.
The MD5 algorithm is probably the most popular hashing algorithm in use today, being widely used for verifying data integrity and checksums. You’ll sometimes see SHA-1 used in some legacy systems, and it can even be seen in some older SSL/TLS certificates and digital signatures. Both of these are generally considered obsolete for cryptographic purposes and are not recommended for new applications.
These algorithms are designed to balance speed and security. SHA-256 is secure, widely available, and suitable for general cryptographic use, with the newer SHA-3 providing a modern alternative. BLAKE2b (or just BLAKE2) and BLAKE2s are less widely supported but are considered versatile, fast, and secure. BLAKE2b is optimized for 64-bit platforms, BLAKE2s for 8-bit to 32-bit platforms.
BLAKE2 is generally considered to be the strongest and most flexible of these, but SHA-256 is perfectly good if that’s unavailable.
Algorithms like Argon2, bcrypt, and scrypt are specialized for hashing passwords. They’re designed to be computationally expensive to resist brute-force attacks, and they include some kind of salting strategy to foil rainbow table attacks.
packagemainimport("crypto/md5""crypto/sha256""fmt""golang.org/x/crypto/blake2b""golang.org/x/crypto/sha3")funcmain(){str:=[]byte("Welcome to Cloud Native Go, Second Edition!")hMd5:=md5.New()hSha256:=sha256.New()hSha3:=sha3.New256()hBlake2b,_:=blake2b.New256(nil)hMd5.Write(str)hSha256.Write(str)hSha3.Write(str)hBlake2b.Write(str)fmt.Printf("MD5 : %x\n",hMd5.Sum(nil))fmt.Printf("SHA256 : %x\n",hSha256.Sum(nil))fmt.Printf("SHA3-256 : %x\n",hSha3.Sum(nil))fmt.Printf("BLAKE2-256: %x\n",hBlake2b.Sum(nil))}
MD5 : 390127f420dbc185fce3ed7fe914a08d SHA256 : a64c3cadf4803bcbcbe5a1ca277af785cb604a492ae578dbd9cd87809b1e6bb8 SHA3-256 : a4abab18bd01ae4e85e07f27e575022b614d5a06d100707cd43229fa447cd20c BLAKE2-256: 37fad4546db45abfe6aa85324460098d7c9dde02d0c03a9d7d67dcd9dce760f7
import("crypto/aes""crypto/cipher""crypto/rand""io")// encryptAES encrypts plaintext using the given key with AES-GCM.funcencryptAES(key,plaintext[]byte)([]byte,error){// Create a new `cipher.Block`, which implements the AES cipher// using the given keyblock,err:=aes.NewCipher(key)iferr!=nil{returnnil,err}// Specify the cipher mode to be GCM (Galois/Counter Mode).gcm,err:=cipher.NewGCM(block)iferr!=nil{returnnil,err}// GCM requires the use of a nonce (number used once), which is// a []byte of (pseudo) random values.nonce:=make([]byte,gcm.NonceSize())if_,err=io.ReadFull(rand.Reader,nonce);err!=nil{returnnil,err}// We can now encrypt our plaintext. Prepends the nonce value// to the ciphertext.ciphertext:=gcm.Seal(nonce,nonce,plaintext,nil)returnciphertext,nil}
import("crypto/aes""crypto/cipher""fmt")// decryptAES decrypts ciphertext using the given key with AES-GCM.funcdecryptAES(key,ciphertext[]byte)([]byte,error){// Retrieve our AES cipherblock,err:=aes.NewCipher(key)iferr!=nil{returnnil,err}// Specify the cipher mode to be GCM (Galois/Counter Mode).gcm,err:=cipher.NewGCM(block)iferr!=nil{returnnil,err}// Retrieve the nonce value, which was prepended to the ciphertext,// and the ciphertext proper.nonceSize:=gcm.NonceSize()iflen(ciphertext)<nonceSize{returnnil,fmt.Errorf("invalid input")}nonce,cipherbytes:=ciphertext[:nonceSize],ciphertext[nonceSize:]// We can now decrypt our ciphertext!plaintext,err:=gcm.Open(nil,nonce,cipherbytes,nil)iferr!=nil{returnnil,err}returnplaintext,nil}
import("encoding/base64""fmt")funcmain(){// 32 bytes for AES-256. Obviously, don't do this.key:=[]byte("example.key.12345678.example.key")plaintext:=[]byte("Hello, Cloud Native Go!")encrypted,err:=encryptAES(key,plaintext)iferr!=nil{panic(err)}decrypted,err:=decryptAES(key,encrypted)iferr!=nil{panic(err)}// Encode the encrypted string bites to Base64encoded:=base64.StdEncoding.EncodeToString(encrypted)fmt.Println("Encrypted:",encoded)fmt.Println("Decrypted:",string(decrypted))}
Encrypted: L0+xZEX6mdS1kX7hSdQ1m3KXvdj4DyvmoypXNGoxqZxMv0DkUx9sYkBSzkAk5Z/XgZDP Decrypted: Hello, Cloud Native Go!
import"crypto/rsa"// generateKeyPair generates an RSA key pair.funcgenerateKeyPair(bitsint)(*rsa.PrivateKey,*rsa.PublicKey,error){privateKey,err:=rsa.GenerateKey(rand.Reader,bits)iferr!=nil{returnnil,nil,err}returnprivateKey,&privateKey.PublicKey,nil}
import("crypto/rsa""crypto/x509""encoding/pem""fmt")// exportKeys exports keys to PEM format for demonstration purposesfuncexportKeys(privateKey*rsa.PrivateKey,publicKey*rsa.PublicKey){privBytes:=x509.MarshalPKCS1PrivateKey(privateKey)privPEM:=pem.EncodeToMemory(&pem.Block{Type:"RSA PRIVATE KEY",Bytes:privBytes,})pubBytes,_:=x509.MarshalPKIXPublicKey(publicKey)pubPEM:=pem.EncodeToMemory(&pem.Block{Type:"RSA PUBLIC KEY",Bytes:pubBytes,})fmt.Println(string(privPEM))fmt.Println(string(pubPEM))}
import("crypto/rand""crypto/rsa""crypto/sha256")// encryptRSA encrypts the given message with the RSA public key.funcencryptRSA(publicKey*rsa.PublicKey,message[]byte)([]byte,error){ciphertext,err:=rsa.EncryptOAEP(sha256.New(),rand.Reader,publicKey,message,nil)iferr!=nil{returnnil,err}returnciphertext,nil}
import("crypto/rand""crypto/rsa""crypto/sha256")// decryptRSA decrypts the given ciphertext with the RSA private key.funcdecryptRSA(privateKey*rsa.PrivateKey,ciphertext[]byte)([]byte,error){plaintext,err:=rsa.DecryptOAEP(sha256.New(),rand.Reader,privateKey,ciphertext,nil)iferr!=nil{returnnil,err}returnplaintext,nil}
import("encoding/base64""fmt")funcmain(){// Generate 2048-bit RSA keysprivateKey,publicKey,err:=generateKeyPair(2048)iferr!=nil{panic(err)}plaintext:=[]byte("Hello, Cloud Native Go!")// Encrypt messageencrypted,err:=encryptRSA(publicKey,plaintext)iferr!=nil{panic(err)}// Decrypt messagedecrypted,err:=decryptRSA(privateKey,encrypted)iferr!=nil{panic(err)}data:=base64.StdEncoding.EncodeToString(encrypted)fmt.Println("Encrypted:",data)fmt.Println("Decrypted:",string(decrypted))}
Encrypted: ucX/ZZj0ISevpqL+rn+bKMBAubsUdJ+wA7ah2r+PpIVyOVaSm67zZSivTpZl2... Decrypted: Hello, Cloud Native Go!
Relatively fast and efficient, making them best suited for encrypting large volumes of data with minimal computational overhead. However, since they use the same key for both encryption and decryption, key distribution and management is critical but often challenging.
Simplify key distribution by using a pair of keys (public and private) for encryption and decryption, enhancing security for key exchanges and digital signatures. Their main drawbacks are their slower performance and higher computational requirements, which make them less suitable for encrypting large amounts of data.
Using an asymmetric algorithm to securely exchange a symmetric cipher key, which is then used to secure the remainder of the communication, provides something of a best-of-both-worlds. This benefits from the security of asymmetric key exchange with the efficiency of symmetric encryption, but it adds a fair amount of complexity.
import("crypto/rand""fmt")funcmain(){b:=make([]byte,10)if_,err:=rand.Read(b);err!=nil{fmt.Println("error:",err)return}// The slice will now contain random bytes instead of only zeroes.fmt.Println(b)}
import("database/sql"_"github.com/lib/pq")funcsetupDB()(*sql.DB,error){db,err:=sql.Open("postgres","user:password@host/dbname")iferr!=nil{returnnil,err}returndb,nil}
funcgetUserId(ctxcontext.Context,db*sql.DB,usernamestring)(int,error){varuserIdintquery:="SELECT id FROM users WHERE username = $1"row:=db.QueryRowContext(ctx,query,username)iferr:=row.Scan(&userId);err!=nil{return0,err}returnuserId,nil}
funcinsertUser(ctxcontext.Context,db*sql.DB,username,string)error{query:="INSERT INTO users (username, email) VALUES ($1, $2)"_,err:=db.ExecContext(ctx,username,)returnerr}
1 Abhijit Naskar, The Gospel of Technology (Independently published, January 2020).
2 And dedicated they are—but also often chronically overworked and underfunded.
3 I would have printed it here if I could have, but licensing, ya know?
4 An argument can be made that all data is sensitive to a greater or lesser degree.
5 As a rule, user input generally shouldn’t be considered trusted.
6 The name “Atoi” stands for “ASCII to integer.” This name comes by way of the C language, of which it’s been a part since at least 1971.
7 Curiously, the html/template package does have a stripTags function, but it’s unexported.
8 At least in theory. Netflix knows this isn’t necessarily true, though.
9 Wikipedia maintains a very informative list of data breaches that contains about 500 entries spanning 20 years.
10 This one’s my favorite.
11 Or, perhaps, fortunately.
12 Michael B. Jones et al., “RFC 7519: JSON Web Token (JWT)”, IETF Datatracker, May 2015.
13 Which can be good fun, for a very specific definition of “fun”.
14 Which is essentially what the parser is doing.
15 Sadly, gRPC over TLS doesn’t have a fancy acronym like HTTPS does.
16 Probably many more less-popular ones, too.
17 At least it is if you don’t have a supercomputer and a couple thousand years.
18 Although this is less true (but still somewhat true) with Go 1.22. See “Secure Randomness in Go 1.22” by Russ Cox and Filippo Valsorda for more information on that.
19 Of course there is. There always is.
20 I rather like regular-expressions.info.
A distributed system is one in which the failure of a computer you didn’t even know about can render your own computer unusable.1
Leslie Lamport, DEC SRC Bulletin Board (May 1987)
Keeping data synchronized across multiple nodes introduces the challenge of maintaining consistency across multiple replicas. Strategies include synchronous and asynchronous replication, each with its own trade-offs. Inconsistent replicas can lead to data conflicts, which require robust conflict resolution mechanisms.
Maintaining a consistent view of data across multiple nodes can be challenging. There are different consistency models, which we’ll discuss in “Consistency Models”, that trade off between performance and reliability. Algorithms like Paxos or Raft are often used to ensure that all nodes agree on the data state, but these are complex and can introduce latency.
A network partition occurs when a network is divided into disjoint segments, preventing nodes in one segment from communicating with those in another. This can result in a “split-brain scenario” in which each partition operates independently, potentially leading to inconsistent states and conflicting actions across the system.
Distributing data across multiple nodes (sharding) can help with scalability but introduces complexity in ensuring efficient query processing and data rebalancing. Ensuring an even distribution of data and queries across nodes is challenging and crucial for performance.
Diagnosing and fixing issues in a distributed system is significantly more challenging due to the number and complexity of components and their interactions.
In a perfectly consistent system, every read receives the most recent write (or an error). In other words, all nodes see the same data at the same time. This ensures that any read operation returns the latest write value.
This degree of consistency comes at a cost, however, requiring complex synchronization mechanisms that can increase latency and reduce availability during network issues.
In a perfectly available system, every read or write request receives a response, regardless of the success of that response. The system remains operational and responsive, even in the presence of some node failures.
Guaranteeing that the system is always responsive means sometimes serving stale data or sacrificing consistency, especially during network partitions.
A partition-tolerant system will continue to function even if an arbitrary number of messages are dropped or delayed by the network. Such a system can handle network failures or partitions without completely shutting down.
Handling network partitions means the system can continue to operate despite failures, but it must trade off between returning consistent data or ensuring every request is processed.
CP systems ensure consistency even in the presence of network partitions but may sacrifice availability. In practice this often means that the system can refuse to respond to some requests—becoming less available—to ensure data consistency.
Examples of CP systems include distributed databases such as Google Spanner that use consensus algorithms2 to ensure consistency but that delay responses during partitions.
AP systems are designed to remain available and to tolerate network partitions but may return stale or inconsistent data during partitions.
Examples include NoSQL databases like Apache Cassandra and DynamoDB that provide high availability and tolerate partitions but allow for eventual consistency.
In a CA system, if there’s a network partition, the system must become unavailable to maintain consistency. However, since network partitions are inevitable, this is usually impractical for distributed systems.
typeStorestruct{sync.Mutexstoremap[string]stringraft*raft.Raft}
// Get returns the value for the given key.func(s*Store)Get(keystring)(string,error){s.Lock()defers.Unlock()returns.store[key],nil}
typecommandstruct{OpstringKeystringValuestring}// Set sets the value for the given key.func(s*Store)Put(key,valuestring)error{c:=command{Op:"put",Key:key,Value:value}b,err:=json.Marshal(c)iferr!=nil{returnerr}f:=s.raft.Apply(b,time.Second*10)returnf.Error()}// Delete deletes the given key.func(s*Store)Delete(keystring)error{c:=command{Op:"delete",Key:key}b,err:=json.Marshal(c)iferr!=nil{returnerr}f:=s.raft.Apply(b,time.Second*10)returnf.Error()}
typeFSMinterface{// Apply is called once a log entry is committed by// a majority of the cluster.Apply(*Log)interface{}}
func(s*Store)Apply(log*raft.Log)any{// Unmarshal the log data into a command valuevarcmdcommandiferr:=json.Unmarshal(log.Data,&cmd);err!=nil{returnerr}s.Lock()defers.Unlock()// Make the appropriate change based on the operationswitchcmd.Op{case"put":s.m[cmd.Key]=cmd.Valuecase"delete":delete(s.m,cmd.Key)}returnnil}
Contains settings that define a variety of behaviors, such as the protocol version and various timeouts.
Maintain the state and durability of the system by managing the state machine log and persist Raft metadata, respectively.
Responsible for handling communication between Raft nodes including message exchange, network configuration, and handling network errors.
packagemainimport("fmt""net""os""path/filepath""time""github.com/hashicorp/raft"raftboltdb"github.com/hashicorp/raft-boltdb/v2")var(// The identifier for the Raft nodelocalIDstring="node1"// The network address to which the Raft node will bindraftBindstring="127.0.0.1:8080")// Initializes the Raft node.func(s*Store)Open()error{// Create a default Raft configuration and set the local node IDconfig:=raft.DefaultConfig()config.LocalID=raft.ServerID(localID)// Create a BoltDB-backed log store at the specified pathlogStore,err:=raftboltdb.NewBoltStore(filepath.Join("raft","raft.db"))iferr!=nil{returnfmt.Errorf("failed to create log store: %w",err)}// Create a BoltDB-backed stable storestableStore,err:=raftboltdb.NewBoltStore(filepath.Join("raft","stable.db"))iferr!=nil{returnfmt.Errorf("failed to create stable store: %w",err)}// Create a file-based snapshot store in the "raft" directory,// with a maximum of 1 snapshot retainedsnapshots,err:=raft.NewFileSnapshotStore("raft",1,os.Stdout)iferr!=nil{returnfmt.Errorf("failed to create snapshot store: %w",err)}// Resolve the TCP address specified by raftBindaddr,err:=net.ResolveTCPAddr("tcp",raftBind)iferr!=nil{returnfmt.Errorf("failed to resolve TCP address: %w",err)}// Create a TCP transport for Raft communication, with a connection pool// of 3 and a timeout of 10 secondstransport,err:=raft.NewTCPTransport(raftBind,addr,3,10*time.Second,os.Stdout)iferr!=nil{returnfmt.Errorf("failed to create transport: %w",err)}// Initialize a new Raft node with the given configuration, state machine,// log store, stable store, snapshot store, and transportraftNode,err:=raft.NewRaft(config,s,logStore,stableStore,snapshots,transport)iferr!=nil{returnfmt.Errorf("failed to create Raft node: %w",err)}// If the Raft node is not already part of a cluster, bootstrap the// cluster with the current node as the initial serverifraftNode.Leader()==""{config:=raft.Configuration{Servers:[]raft.Server{{ID:raft.ServerID(localID),Address:transport.LocalAddr(),},},}raftNode.BootstrapCluster(config)}// Assign the created Raft node to the Store structs.raft=raftNodereturnnil}
typeServicestruct{store*Store}
// getHandler is the handler function for HTTP requests to get a value// from the key-value store.func(s*Service)getHandler(whttp.ResponseWriter,r*http.Request){vars:=mux.Vars(r)key:=vars["key"]value,err:=s.store.Get(key)iferr!=nil{http.Error(w,err.Error(),http.StatusInternalServerError)return}w.Write([]byte(value))}// putHandler is the handler function for HTTP requests to set a value// to the key-value store.func(s*Service)putHandler(whttp.ResponseWriter,r*http.Request){vars:=mux.Vars(r)key:=vars["key"]value,err:=io.ReadAll(r.Body)iferr!=nil{http.Error(w,err.Error(),http.StatusInternalServerError)return}deferr.Body.Close()iferr=s.store.Put(key,string(value));err!=nil{http.Error(w,err.Error(),http.StatusInternalServerError)return}w.WriteHeader(http.StatusCreated)}
funcmain(){store:=&Store{m:map[string]string{}}store.Open()service:=&Service{store:store}// Create a new mux routerr:=mux.NewRouter()r.HandleFunc("/v1/{key}",service.getHandler).Methods("GET")r.HandleFunc("/v1/{key}",service.putHandler).Methods("PUT")iferr:=http.ListenAndServe(":8080",r);err!=nil{slog.Error(err.Error())os.Exit(1)}}
1 Leslie Lamport, DEC SRC Bulletin Board, May 28, 1987.
2 Specifically, Spanner uses a variant of Paxos, though nobody outside of Google knows what the variations are.
3 It’s also a common source of headaches for the people who maintain those systems.
4 That’s academic-speak for your service and its central logic.
5 The worst kind of correct.
6 A great place to start would be Leslie Lamport’s own “Paxos Made Simple”.
7 Diego Ongaro and John Ousterhout, “In Search of an Understandable Consensus Algorithm”, Proceedings of the 2014 USENIX Annual Technical Conference, 2014.
8 Miguel Castro and Barbara Liskov, “Practical Byzantine Fault Tolerance”, Proceedings of the Third Symposium on Operating Systems Design and Implementation, February 1999.
9 “ZooKeeper Internals”, June 5, 2022.
10 Mike Burrows, “The Chubby Lock Service for Loosely-Coupled Distributed Systems”, 7th Symposium on Operating Systems Design and Implementation, November 2006, 335–350.
11 Brian M. Oki and Barbara H. Liskov, “Viewstamped Replication: A New Primary Copy Method to Support Highly-Available Distributed Systems”, Proceedings of the Seventh Annual ACM Symposium on Principles of Distributed Computing, August 15–17, 1988, 8–17.
12 Iulian Moraru et al., “There Is More Consensus in Egalitarian Parliaments”, Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, November 3–6, 2013.