What Adrian Did Next — Part 2 — Sun Microsystems
I spent six years at Cambridge Consultants, building some interesting systems, managing our Sun workstations and learning a lot, but by then Sun had opened a sales office across the street, and I wanted to find out what they were going to release next, before everyone else. So in 1988 I joined the Sun sales force and became a Systems Engineer (nowadays we would say Solutions Architect). This was a big jump to make, I had to wear a suit to work every day, and spend all my time paired up with a sales rep visiting customers. I really enjoyed the variety of working with several different customers every day, on different problems, and being part of an extremely innovative and fast growing company. I also learned a lot about how to work directly with customers, when to shut up and let the sales guy drive the conversation, and generally how technology sales works. The early days at Sun Cambridge were special, I absorbed a lot about networking and the technical side of the role from my fellow systems engineer Martin Baines, and we were driving all over the region in cool company cars (I had a Citroen BX 16V) selling a really hot product. The company and the team got bigger over time, but with the initial sales team of Paul Martin (Ford Sierra Cosworth — the fastest sales guy in East Anglia), Martin Harris (Land Rover Discovery — recruited an inflatable skeleton as an extra sales guy), David Morley (Vauxhall Carlton — always betting on the horses at Newmarket), and Gary Littlefair (BMW 320 — did over 200% of his target several years running as we took British Telecom to Sun UK’s biggest account) we were “kicking butt and having fun” as Scott McNealy liked to say. I also got to visit Sun head office in California regularly for training sessions, and met some interesting people there, in particular Reynold Jabbour from New York, who is deeply insightful on both people and technology, ran the Systems Engineers team for the east coast, later went to work for various Wall Street banks, and is still one of my closest friends.
I became the Sun UK local specialist in performance and hardware, and as Sun transitioned from a desktop workstation company to sell high end multiprocessor servers I was helping customers find and fix scalability problems. In 1991 I wrote a white paper on performance that was widely read, and in 1993 (with help from Brian Wong) that got me a job in the USA, working alongside Brian for Mike Briggs in technical product marketing. Another big jump, but now it was my job to run benchmarks in the lab, and write white papers that explained the new products to the world, as they were launched. I also learned how marketing worked, and began to build my presentation and training skills as I was sent around the world by Sun to teach workshops and speak at events. I was mostly coding in C, tuning FORTRAN, and when I needed to do a lot of data analysis of benchmark results used the S-PLUS statistics language, that is the predecessor to R. I still code in R…
One role I had for many years was to lead technical training events for the global specialist systems engineers, which we called the Ambassador Program. I had attended these events when I was based in the UK, but became one of the organizers. I had to setup a week of talks by all the relevant product teams, with a hundred or so of the most experienced systems engineers from all over the world as an audience. We had specializations in hardware, operating systems, databases, graphics, etc. At one point we had two conflicting high availability products from different teams, who weren’t cooperating. I deliberately scheduled them back to back at one event so they were all in the room at the same time, and the audience roasted them — as a result there were product and messaging changes. We’d sometimes manage to get Sun founder Andy Bechtolsheim to give us a talk, and he’d tell us far more than the product managers wanted us to know, about all the cool things he was working on that haven’t happened yet. He hasn’t changed. So many global friendships were made that continue to today. I saw Erik Fisher last time I was in Budapest, and Constantin Gonzalez later became one of the first AWS solutions architects in Germany, too many names to mention...
I remember organizing my first week long event, having it go well, and heading back to the office to tell my manager Mike Briggs. He was pleased, but also told me he was leaving that day, to be one of the first employees of NetApp, at the time I was shocked! We ended up hiring Phil Parkman to take over managing our group, which by then also included my friend Dave Fisk, who I still hang out with regularly. Later on after some re-orgs we ended up reporting to Greg Papadopoulos, who was CTO of the server organization at the time.
I’d been using the Internet since about 1984, for email and newsgroups, sharing files like my white papers via ftp sites, but in the early 1990’s the world wide web came along, and Sun was a very early adopter. Corporate IT had no idea what was going on, but someone got a T1 line hooked up and we put a spare desktop machine on it and set up www.sun.com (link to archive from 1996). I ended up with root access helping Shane Sigler run the site around 1995–1996 when it was one of the very first corporate web sites. I learned about tuning systems for this new network connection intensive workload, helped Solaris engineering tune the network stack and helped customers run some of the early web properties. Mike Briggs suggested that Brian Wong and I write books, because then we’d have more influence on the engineering teams, which turned out to be excellent advice. Brian wrote Configuration and Capacity Planning for Sun Servers, and I turned my collection of white papers into a book, Sun Performance and Tuning, which became widely read, and published a monthly Q&A column at SunWorld Online. This was followed by a greatly expanded second edition with some additional chapters by Rich Pettit.
Brian Wong started a “rotation program” that he and I ran for many years, with admin help and funding from Barb Hill, where we would borrow a systems engineer from somewhere in the world for a month, and have them work with us benchmarking something or writing a paper. We made great friends and found some amazing talent this way, and so many people ended up getting hired into permanent roles that it was called the job donation program by some of the SE managers, who lost good people to the central teams, but were mostly happy to see them succeed.
Rich Pettit is one of the best programmers I ever met. He turned up on a month long rotation one Monday in 1993, took a look at a complicated Perl script Brian had written to try to capture some tuning ideas, and decided to use lex and yacc to write a dedicated language to make it easier. He had it up and running on Wednesday. I adopted the language, called the SE Toolkit, which was a C-based interpreter that understood how to directly read all the performance data out of the kernel, and wrote a lot of scripts over the years, including one called virtual_adrian.se that a lot of people used. The SE toolkit was fast, had no memory leaks (monitoring scripts could run for years) and let me implement lots of cool performance monitoring ideas. Rich became co-author of the second edition of the Sun Performance Tuning book, to describe how it worked.
Phil Harman is a friend of mine from Sun UK, we were the two most senior systems engineers in the UK at the time I left for the USA, and he came over on rotation a few times. He became a leading expert in Solaris multi-threading. Jim Mauro and Allan Packer visited on rotation from New Jersey, and Adelaide Australia, and worked out how to produce a database sizing guide for Oracle Financials, which I wrote up and published. It was the first database sizing guide Sun had ever produced. Allan moved to the USA for several years to work on database performance and wrote the book Configuring and Tuning Databases on the Solaris Platform. He also worked closely with IBM as they ported DB2 to run on Solaris. Jim also joined the database performance team.
Allan also introduced me to Richard McDougall, who had worked with Allan for Sun Australia, and Richard visited for a rotation to build some monitoring tools. No-one could believe how much he got done in a month, building several tools including one that could lay out the entire memory usage of a machine in a way no one had ever seen before. A bit later our manager asked Brian and I who we should hire next, and we both said Richard. He wanted someone easier to hire than an international relocation, but in the end we made it happen. Later on Richard teamed up with Jim Mauro to write the Solaris Internals series of books, and while trying to figure out how something worked for the first book, ended up finding and fixing a major bug in the kernel code that was the biggest performance win in Solaris 8.
Paul Reithmuller was yet another imported Australian engineer who did amazing work. One day we were chatting in a group meeting and he said he was working on a TPC-C benchmark that was spending too much time in the SCSI disk driver, so he wrote a 10-line awk script running against kernel memory to look at the SPARC machine code execution path and find which branches were being predicted wrong, then wrote another awk script to flip the branch prediction bits in memory and it showed a significant speedup. It was a memorable jaw drop moment for a room full of people, you did what!?! After that we had a running joke that every impossibly hard problem would be solved with a ten line awk script.
I met Paul Strong when he was working on a rotation in 1998, and we became close friends, bonding over a shared interest in music by Hawkwind, Frank Zappa and King Crimson. He later moved to the US to work on Solaris, and for many years we worked and played together, in particular when I helped him form the progressive rock band Fractal. He’s an excellent drummer, and I wasn’t a good enough bass player, but I recorded many of their rehearsals and gigs and named some of their early instrumental songs (mostly with bad puns).
After the success of the books that Brian and I had written, which we got a cut of royalties for, our management decided that we should write books as our day job, and we assembled a team of experts to write up best practice guides as books in the Sun Blueprints series. I made major contributions to two Sun Blueprints books on Resource Management (1999) with Richard McDougall, Evert Hoogendoorn, Enrique Vargas, and Jim Bialaski, and Capacity Planning for Internet Services (2001) with Bill Walker. Our team also produced The Guide to High Availability by storage expert Jeannie Johnstone Kobert, Sun Cluster Environment by Enrique Vargas, Solaris PC Netlink by Don Devitt (we had a great team outing near Boston on his sailboat), and several other books. In the end, the books we wrote in our own time were more opinionated, and I think more useful, and sold better, than the official books that were more “correct” and reviewed by lots of people. I also came up with Cockcroft’s Law of Book Writing, which is that books get bigger as you write them, faster than you can write them, so you need to prune them to finish them.
Eventually, after becoming the go-to people for capacity planning and performance for Sun, Brian Wong and I became Distinguished Engineers, and I joined the central performance engineering team in 1999. I led a small group, and had lots of fun working with Elizabeth Purcell at the Menlo Park campus that now houses Facebook, along with Roch Bourbonnais in France, and Bob Sneed in Virginia. We ran what our datacenter designer Rob Snevely (who wrote a Blueprint book that contains a chapter by Elizabeth Purcell) named the Can of Worms Project, where we would install and run performance tests on large server configurations that were more customer-like and complex than the usual test or benchmarking environments that engineering teams ran. We found many things that didn’t work properly together, filed lots of bugs, and headed off a bunch of customer issues. We called this “meltdown prevention” and had to work hard to get people to see that near misses are worth tracking as well as actual meltdowns.
In the early 2000s, Sun was designing it’s next generation high end server, and to avoid reliability issues that had caused problems in previous systems, adopted Six Sigma to create quality best practices. I was looking for a new challenge, so joined the program and ended up getting certified as a “black belt”. It was like an intense and very technical MBA, re-learning a lot of the statistics I’d forgotten from my degree, along with courses in change management, requirements analysis, failure modeling, and other things I still find useful today. I worked with the engineering team that was designing the backplane connector, but the project was eventually cancelled when Sun ran out of money. I also applied Six Sigma to capacity planning and presented this at a conference in 2003.
One fun side project I got involved in was that Sun provided the back end server systems for the 2002 Winter Olympics in Salt Lake City, and I joined the effort as the capacity planning consultant, to make sure nothing would go wrong when the events were live. I used some custom monitoring tools I’d built over the years as part of Rich Pettit’s SE Toolkit, that included a tool called virtual-adrian, which implemented all the ideas I had come up with. Nothing went wrong, I got to see some events, and got very cold at times. I also helped out in the run-up to the 2004 Athens Olympics.
The last role I had at Sun was as Chief Architect for the High Performance Technical Computing team led by Shahin Khan, we had fun and he was one of the best managers I’ve had, developing me in lots of different directions. At some point our product boss left, and I took over the role, so I spent a year or two learning how to be the product manager for our high performance computing and high end graphics product lines. I was also working alongside a very experienced Business Development Manager, Susanna Kass, to setup relationships with many internal and external partners, and got a good insight into how BD works. The team partnered with Sun’s director of standards Carl Cargill, as Sun and Oracle created a new standards body, the Enterprise Grid Alliance. I worked as the main representative for Sun, and on our competing relationship with the IBM dominated Global Grid Forum. I got Paul Strong involved in the EGA before I left Sun, the two competing bodies eventually merged, and Grid and Web Services evolved into Cloud Computing. We were thinking about the trends that eventually became cloud computing when AWS launched it, but Sun didn’t have the right business model to go direct to developers and sell capacity on a credit card, which is how AWS bypassed the resistance from corporate CIOs that we saw at Sun. This job stretched me in many different ways and we were having a good time, but by then Sun was shrinking and one of the many layoffs included our entire team.
We had just launched our main HPC product at an event in Shanghai, racks of Intel Linux based compute nodes using a SPARC Solaris server to configure it and provide storage, with Infiniband or Ethernet connectivity, Sun Grid Engine etc. While I was there, we heard that there were layoffs coming, but that everyone thought our team was still needed. On the way back I went to a Global Grid Forum meeting in Honolulu, representing EGA, and met Vikas Deolaliker, who said it looked as if no-one could find the head count to save the HPC team. I remember sitting in a bar with Savas Parastatidis and Jim Webber (now CTO of Neo4j) and telling them what was going on, then later that night starting an email thread with Maynard Webb, COO of eBay, on the pretext that eBay should start a new line of business auctioning compute capacity on demand, but really fishing for a job…
So in mid-2004 I was laid off, looking for a change and decided to get out of the business of selling computers to big companies and figure out consumer oriented technology at eBay. I’d helped eBay recover from capacity related outages in 1999 and had setup their capacity planning processes. I had also turned down a job offer at that time, but they took me on this time as a Distinguished Engineer in their Operations Architecture team.
Vikas left and co-founded Sonoa systems, which later became Apigee which is now part of Google Cloud. Shahin Khan ran marketing for Azul systems for a while, and is still active in the HPC community. Susanna Kass is working on renewable energy for datacenters nowadays, and we reconnected again recently.
There’s too many stories, and people to mention, so apologies to those I left out, and these are my memories, which may be faulty. Sun was a formative part of my career, a very cool place to work, and was a big concentration of talent for many years.