Technology Timing Technology

Technology Timing Technology

Technology How do you time your startup? Technological forecasts are often surprisingly prescient in terms of predicting that something was possible

Technology Satellites are real-time spies in the sky – CNET
Technology SafeAI raises $5M to develop and deploy autonomy for mining and construction vehicles
Technology Reality Check: The marvel of computer vision technology in today’s camera-based AR systems


How do you time your startup? Technological forecasts are often surprisingly prescient in terms of predicting that something was possible & desirable and what they predict eventually happens; but they are far less successful at predicting the timing, and almost always fail, with the success (and riches) going to another.
Why is their knowledge so useless? Why are success and failure so intertwined in the tech industry? The right moment cannot be known exactly in advance, so attempts to forecast will typically be off by years or worse. For many claims, there is no way to invest in an idea except by going all in and launching a company, resulting in extreme variance in outcomes, even when the idea is good and the forecasts correct about the (eventual) outcome.
Progress can happen and can be foreseen long before, but the details and exact timing due to bottlenecks are too difficult to get right. Launching too early means failure, but being conservative & launching later is just as bad because regardless of forecasting, a good idea will draw overly-optimistic researchers or entrepreneurs to it like moths to a flame: all get immolated but the one with the dumb luck to kiss the flame at the perfect instant, who then wins everything, at which point everyone can see that the optimal time is past. All major success stories overshadow their long list of predecessors who did the same thing, but got unlucky. So, ideas can be divided into the overly-optimistic & likely doomed, or the fait accompli. On an individual level, ideas are worthless because so many others have them too—‘multiple invention’ is the rule, and not the exception. Progress, then, depends on the ‘unreasonable man’.
This overall problem falls under the reinforcement learning paradigm, and successful approaches are analogous to Thompson sampling/posterior sampling: even an informed strategy can’t reliably beat random exploration which gradually shifts towards successful areas while continuing to take occasional long shots. Since people tend to systematically over-exploit, how is this implemented? Apparently by individuals acting suboptimally on the personal level, but optimally on societal level by serving as random exploration.
A major benefit of R&D, then, is in laying fallow until the ‘ripe time’ when they can be immediately exploited in previously-unpredictable ways; applied R&D or VC strategies should focus on maintaining diversity of investments, while continuing to flexibly revisit previous failures which forecasts indicate may have reached ‘ripe time’. This balances overall exploitation & exploration to progress as fast as possible, showing the usefulness of technological forecasting on a global level despite its uselessness to individuals.

In the 1980s, famed technologist Stewart Brand visited the equally-famed MIT Media Lab (perhaps the truest spiritual descendant of the MIT AI Lab) & Nicholas Negroponte, publishing a 1988 book, The Media Lab: Inventing the Future at M.I.T. (TML). Brand summarized the projects he saw there and Lab members’ extrapolations into the future which guided their projects, and added his own forecasting thoughts.

Three decades later, the book is highly dated, and the descriptions are of mostly historical interest for the development of various technologies (particularly in the 1990s). But enough time has passed since 1988 to enable us to judge the basic truthfulness of the predictions and expectations held by the dreamers such as Nicholas Negroponte: they were remarkably accurate! And the Media Lab wasn’t the only one, General Magic (1989), had an almost identical vision of a networked future powered by small touchscreen devices. (And what about Douglas Engelbart, or Alan Kay/Xerox PARC, who explicitly aimed to ‘skate towards where the puck would be’?) If you aren’t struck by a sense of déjà vu or pity when you read this book, compare the claims by people at the Media Lab with contemporary—or later—works like Clifford Stoll’s Stigler’s Law, the last Bill Gates13; many people made huge fortunes off OSes, both before and after Gates—you may have forgotten Wang, but hopefully you remember Steve Jobs (before, Mac) and Steve Jobs (after, NeXT). Similarly, Mark Zuckerberg was not the first and only Zuckerberg, he was the last Zuckerberg; many people made social networking fortunes before him—maybe Orkut didn’t make its Google inventor a fortune, but you can bet that MySpace’s DeWolfe and Anderson did well. And there were plenty of lucrative search engine founders (is Jerry Yang still a billionaire? Yes).
Gates, however, proved the market, and refined the Gates strategy to perfection, using up the trick; no one can get historically rich off shipping an OS plus some business productivity software because there are too many competitors and too many players interested in ensuring that no one becomes the next Gates, and so opportunity has moved on to the next area.
A successful company rewrites history and its precursors14; history must be lived forward, progressing to an obscure destination, but we always recall it backwards as progressing towards the clarity of the present.
The Wise in their Craftiness

“It is universally admitted that the unicorn is a supernatural being and one of good omen; thus it is declared in the Odes, in the Annals, in the biographies of illustrious men, and in other texts of unquestioned authority. Even the women and children of the common people know that the unicorn is a favorable portent. But this animal does not figure among the domestic animals, it is not easy to find, it does not lend itself to any classification. It is not like the horse or the bull, the wolf or the deer. Under such conditions, we could be in the presence of a unicorn and not know with certainty that it is one. We know that a given animal with a mane is a horse, and that one with horns is a bull. We do not know what a unicorn is like.”
Jorge Luis Borges, “Kafka And His Precursors” (1951)

Can you ask researchers if the time is ripe? Well: researchers have a slight conflict of interest in the matter, and are happy to spend arbitrary amounts of money on topics without anything to show for it. After all, why would they say no?
Scott Fisher:

I ended up doing more work in Japan than anything else because Japan in general is so tech-smitten and obsessed that they just love it [VR]. The Japanese government in general was funding research, building huge research complexes just to focus on this. There were huge initiatives while there was nothing happening in the US. I ended up moving to Japan and working there for many years.

Indeed, this would have around the Japanese boondoggle the Fifth Generation Project (note that despite Japan’s reputed prowess at robotics, it is not Japan’s robots who went into Fukushima / flying around the Middle East / revolutionizing agriculture and construction). All those ‘huge initiatives’ and…? Don’t ask Fisher, he’s hardly going to say, “oh yeah, all the money was completely wasted, we were trying to do it too soon; our bad”. And Lanier implies that Japan alone spent a lot of money:

Jaron Lanier: “The components have finally gotten cheap enough that we can start to talk about them as being accessible in the way that everybody’s always wanted…Moore’s law is so interesting because it’s not just the same components getting cheaper, but it really changes the way you do things. For instance, in the old days, in order to tell where your head was so that you could position virtual content to be standing still relative to you, we used to have to use some kind of external reference point, which might be magnetic, ultrasonic, or optical. These days you put some kind of camera on the head and look around in the room and it just calculates where you are—the headsets are self-sufficient instead of relying on an external reference infrastructure. That was inconceivable before because it would have been just so expensive to do that calculation. Moore’s law really just changes again and again, it re-factors your options in really subtle and interesting ways.”
Kevin Kelly: “Our sense of history in this world is very dim and very short. We were talking about the past: VR wasn’t talked about for a long time, right? 35 years. Most people have no idea that this is 35 years old. 30 years later, it’s the same headlines. Was the technological power just not sufficient 30 years ago?”
…On the Nintendo Power Glove, based on a VPL dataglove design:
JL: “Both I and a lot of other people really, really wanted to get a consumerable version of this stuff out. We managed to get a taste of the experience with something called the Power Glove…Sony actually brought out a little near-eye display called Virtual Boy; not very good, but they gave it their best shot. and there were huge projects that have never been shown to the public to try to make a consumable [VR product], very expensive ones. Counting for inflation, probably more money was spent [than] than Facebook just spent on Oculus. We just could never, never, never get it quite there.”
KK: “Because?”
JL: “The component cost. It’s Moore’s law. Sensors, displays… batteries! Batteries is a big one.”

Issues like component cost were not something that could be solved by a VR research project, no matter how ambitious. Those were hard binding limits, and to solve them by creating tiny high-resolution LED/LCD screens for smartphones, required the benefit of decades of Moore’s law and the experience curve effects of manufacturing billions of smartphones.
Researchers in general have no incentive to say, “this is not the right time, wait another 20 years for Moore’s law to make it doable”, even if everyone in the field is perfectly aware of this—Palmer Luckey:

I spent a huge amount of time reading…I think that there were a lot of people that were giving VR too much credit, because they were working as VR researchers. You don���t want to publish a paper that says, ‘After the study, we came to the conclusion that VR is useless right now and that we should just not have a job for 20 years.’ There were a few people that basically came to that conclusion. They said, ‘Current VR gear is low field of view, high lag, too expensive, too heavy, can’t be driven properly from consumer-grade computers, or even professional-grade computers.’ It turned out that I wasn’t the first person to realize these problems. They’d been known for decades.

AI researcher Donald Michie, claimed in 1970, based on a 1969 poll, that a majority of AI researchers estimated 10–100 years for AGI (or 1979–2069) and that “There is also fair agreement that the chief obstacles are not hardware limitations.”15 While AI researcher surveys still suggest that wasn’t a bad range (Gruetzemacher et al 2019), the success of deep learning makes clear that hardware was a huge limitation, and resources 50 years ago fell short by at least 6 orders of magnitude. Michie went on to point out that in a previous case, Charles Babbage, his work was foredoomed by it being an “unripe time” due to hardware limitations and represented a complete waste of time & money16. This, arguably, was the case for Michie’s own research.
Nor Riches to Men of Understanding

“But to come very near to a true theory, and to grasp its precise application, are two very different things, as the history of science teaches us. Everything of importance has been said before by somebody who did not discover it.”
Alfred North Whitehead, The Organization of Thought (1917)

So you don’t know the timing well enough to reliably launch. You can’t imitate a successful entrepreneur, the time is past. You can’t foresee what will be successful based on what has been successful; you can’t even foresee what won’t be successful based on what was already unsuccessful; and you can’t ask researchers because they are incentivized to not know the timing any better than anyone else.
Can you at least profit from your knowledge of the outcome? Here again we must be pessimistic.
Certainty is irrelevant, you still have problems making use of this knowledge. Example: in retrospect, we know everyone wanted computers, OSes, social networks—but the history of them is strewn with flaming rubble. Suppose you somehow knew in 2000 that “in 2010, the founder of the most successful social network will be worth at least $10b”; this is a falsifiable belief at odds with all conventional wisdom and about a tech that blindsided everyone. Yet, how useful would this knowledge be, really? What would you do with it? Do you have the capital to start a VC fund of your own, and throw multi-million-dollar investments at every social media until finally in 2010 you knew for sure that Facebook was the winning ticket and could cash out in the IPO? I doubt it.
It’s difficult to invest in ‘computers’ or ‘AI’ or ‘social networking’ or ‘VR’; there is no index for these things, and it is hard to see how there even could be such a thing. (How do you force all relevant companies to sell tradable stakes? “If people don’t want to go to the ball game, how are you going to stop them?” as Yogi Berra asked.) There is no convenient CMPTR you can buy 100 shares of and hold indefinitely to capture gains from your optimism about computers. IBM and Apple both went nearly bankrupt at points, and Microsoft’s stock has been flat since 1999 or whenever (translating to huge real losses and opportunity costs to long-term holders of it). If you knew for certain that Facebook would be as huge as it was, what stocks, exactly, could you have invested in, pre-IPO, to capture gains from its growth? Remember, you don’t know anything else about the tech landscape in the 2000s, like that Google will go way up from its IPO, you don’t know about Apple’s revival under Jobs—all you know is that a social network will exist and will grow hugely. Why would anyone think that the future of smartphones would be won by “a has-been 1980s PC maker and an obscure search engine”? (The best I can think of would be to sell any Murdoch stock you owned when you heard they were buying MySpace, but offhand I’m not sure that Murdoch didn’t just stagnate rather than drop as MySpace increasingly turned out to be a writeoff.) In the hypothetical that you didn’t know the name of the company, you might’ve bought up a bunch of Google stock hoping that Orkut would be the winner, but while that would’ve been a decent investment (yay!) it would have had nothing to do with Orkut (oops)…
And even when there are stocks available to buy, you only benefit based on the specifics—like one of the existing stocks being a winner, rather than all the stocks being eaten by some new startup. Let’s imagine a different scenario, where instead you were confident that home robotics were about to experience a huge growth spurt. Is this even nonpublic knowledge at all? The world economy grows at something like 2% a year, labor costs generally seem to go up, prices of computers and robotics usually falls… Do industry projections expect to grow their sales by Ecclesiastes

Where does this leave us? In what I would call, in a nod to Thiel’s ‘definite’ vs ‘indefinite optimism’, definitely-maybe optimism. Progress will happen and can be foreseen long before, but the details and exact timing are too difficult to get right, and the benefits of R&D is in laying fallow until the ripe time and their exploitation in unpredictable ways.
Returning to Donald Michie: one could make fun of his extremely overly-optimistic AI projections, and write him off as the stock figure of the biased AI researcher blinded by the ‘Maes-Garreau law’ where AI is always scheduled for right when a researcher will retire17 but while he was wrong, it is unclear this was a mistake because in other cases, an apparently doomed research project—Marconi’s attempt to radio across the Atlantic ocean—succeeded because of an unknown factor—the Kennelly–Heaviside layer18. We couldn’t know for sure that such projections were wrong, and the amount of money being spent back then on AI was truly trivial (and the commercial spinoffs likely paid for it all anyway).
Further, on the gripping hand, Michie suggests that such research efforts like Babbage’s should be thought of not as commercial R&D, expected to usually pay off right now, but as prototypes buying optionality, demonstrating that a particular technology was approaching its ‘ripe time’ & indicating what are the bottlenecks, so society can go after the bottlenecks and then has the option to scale up the prototype as soon as the bottlenecks are fixed19. Richard Hamming describes ripe time as finally enabling attacks on consequential problems20 Edward Boyden describes the development of both optogenetics & expansion microscopy as “failure rebooting”, revisiting (failed) past ideas which may now be workable in the light of progress in other areas21. As time passes, the number of options may open up, and any of them may bypass what was formerly a necessary or serial dependency which was fatal. Enough progress in one domain (particularly computing power), can sometimes make up for stasis in another domain.
So, what Babbage should have aimed for is not making a practical thinking machine which could churn out naval tables, but demonstrating that a programmable thinking machine is possible & useful, and currently limited by the slowness & size of its mechanical logic—so that transistors could be pursued with higher priority by governments, and programmable computers could be created with transistors as soon as possible, instead of the historical course of a meandering piecemeal development where Babbage’s work was forgotten & then repeatedly reinvented with delays (eg Konrad Zuse vs von Neumann). Similarly, the benefit of taking Moore’s law seriously is that one can plan ahead to take advantage of it22 even if one doesn’t know exactly when, if ever, it will happen.
Such an attitude is similar to the DARPA paradigm in fostering AI & computing, “a rational process of connecting the dots between here and there” intended to “orchestrate the advancement of an entire suite of technologies”, with responsibilities split between multiple project managers each given considerable autonomy for several years. These project managers tend to pick polarizing projects rather than consistent projects (Goldstein & Kearney 2017), ones which generate disagreement among reviews or critics. Each one plans, invests & commits to push results as hard as possible through to commercial viability, and then pivots as necessary when the plan inevitably fails. (DARPA indeed saw itself as much like a VC firm.)
The benefit for someone like DARPA of a forecast like Moore’s law is that it provides one fixed trend to gauge overall timing to within a decade or so, and look for those dots which have lagged behind and become reverse salients.23 For an entrepreneur, the advantage of exponential thinking is more fatalistic: being able to launch in the window of time between just after technical feasibility but before someone else randomly gives it a try; if wrong and it was always impossible, it doesn’t matter when one launches, and if wrong because timing is wrong, one’s choice is effectively random and little is lost by delay.
Try & Try Again (But Less & Less)

“The road to wisdom?—Well, it’s plainand simple to express:Errand errand err againbut lessand lessand less.”
Piet Hein, Grooks

This presents a conflict between personal and social incentives. Socially, one wants people regularly tossing their bodies into the marketplace to be trampled by uncaring forces just on the off chance that this time it’ll finally work, and since the critical factors are unknown and constantly changing, one needs a sacrificial startup every once in a while to check (for a good idea, no amount of failures is enough to prove that it should never be tried—many failures just implies that there should be a backoff). Privately, given the skewed returns, diminishing utility, the oversized negative impacts (a bad startup can ruin one’s life and drive one to suicide), the limited number of startups any individual can engage in (yielding gambler’s ruin)24, and the fact that startups & VC will capture only a minute percentage of the total gains from any success (most of which will turn into consumer surplus/positive externalities), the only startups that make any rational sense, which you wouldn’t have to be crazy to try, are the overdetermined ones which anyone can see are a great idea. However, those are precisely the startups that crazy people will have done years before when they looked like bad ideas, avoiding the waste of delay. Further, people in general appear to overexploit & underexplore, exacerbating the problem—even if the expected value of a startup (or experimentation, or R&D in general) is positive for individuals
So, it seems that rapid progress depends on crazy people.
There is a more than superficial analogy here, I think, to Thompson sampling25/posterior sampling (PSRL) Bayesian reinforcement learning. In RL’s multi-armed bandit setting, each turn one has a set of ‘arms’ or options with unknown payoffs and one wants to maximize the total long-term reward. The difficulty is in coping with failure: even good options may fail many times in a row, and bad options may succeed, so options cannot simply be ruled out after a failure or two, and if one is too hasty to write an option off, one may take a long time to realize that, losing out for many turns.
One of the simplest & most efficient MAB solutions, which maximizes the total long-term reward and minimizes ‘regret’ (opportunity cost), is Thompson sampling & its generalization PSRL26: randomly select each option with a probability equal to the current estimated probability that it is the most profitable option. This explores all options initially but gradually homes in on the most profitable option to exploit most of the time, while still occasionally exploring all the other options once in a while, just in case; strictly speaking Thompson sampling will never ban an option permanently, the probability of selecting an option merely becomes vanishingly rare. Bandit settings can further assume that options are ‘restless’ and the optimal option may ‘drift’ over time or ‘run out’ or ‘switch’, in which case one also estimates the probability that an option has switched, and when it does, one changes over to the new best option; instead of the regular Thompson sampling where bad options become ever more unlikely to be tried, a restless bandit results in constant low-level exploration because one must constantly check lest one fails to notice a switch.
This bears a resemblance to startup rates over time: an initial burst of enthusiasm for a new ‘option’, when it still has high prior probability of being the most profitable option at the moment, triggers a bunch of startups selecting that option, but then when they fail, the posterior probability drops substantially; however, even if something now looks like a bad idea, there will still be people every once in a while who insist on trying again anyway, and, because the probability is not 0, once in a while they succeed wildly and everyone is astonished that ‘so, X is a thing now!’
In DARPA’s research funding and VC, they often aren’t looking for a plan which looks good on average to everyone, or which no one can find any particular problem with, but something closer to a plan which at least one person thinks could be awesome for some reason. An additional analogy from reinforcement learning is PSRL, which handles more complex problems by committing to a strategy and following it until the end and either success/failure. A naive Thompson sampling would do badly in a long-term problem because at every step, it would ‘change its mind’ and be unable to follow any plan consistently for long enough to see what happens; what is necessary is to do ‘deep exploration’, following a single plan long enough to see how it works, even if one thinks that plan is almost certainly wrong, one must “Disagree and commit”. The average of multiple plans is often worse than any single plan. The most informative plan is the most polarizing one.27
The system as a whole can be seen in RL terms. One theme I notice in many systems is that they follow a multi-level optimization structure where slow blackbox methods give rise to more efficient Bayesian inference. Ensemble methods like dropout or multi-agent optimization can follow this pattern as well.
A particularly germane example here is Krafft et al 2016/Krafft 2017 (discussion), which examines a large dataset of trades made by eToro online traders, who are able to clone financial trading strategies of more successful traders; as traders find successful strategies, others gradually imitate them, and so the system as a whole converges on better strategies in what they identify as a sort of particle filter-like implementation of “distributed Thompson sampling” which they dub “social sampling”. So for the most part, traders clone popular strategies, but with certain probabilities, they’ll randomly explore rarer apparently-unsuccessful strategies.
This sounds a good deal like individuals pursuing standard careers & occasionally exploring unusual strategies like a startup; they will occasionally explore strategies which have performed badly (ie. previous similar startups failed). Entrepreneurs, with their speculations and optimistic biases, serve as randomization devices to sample a strategy regardless of the ‘conventional wisdom’, which at that point may be no more than an information cascade; information cascades, however, can be broken by the existence of outliers who are either informed or act at random (“misfits”). While each time a failed option is tried, it may seem irrational (“how many times must VR fail before people finally give up on it‽”), it was still rational in the big picture to give it a try, as this collective strategy collectively minimizes regret & maximizes collective total long-term returns—as long as failed options aren’t tried too often.
Reducing Regret
What does this analogy suggest? The two failure modes of a MAB algorithm are investing too much in one option early on, and then investing too little later on; in the former, you inefficiently buy too much information on an option which happened to have good luck but is not guaranteed to be the best at the expense of others (which may in fact be the best), while in the latter, you buy too little & risk permanently making a mistake by prematurely rejecting an apparently-bad option (which simply had bad luck early on). To the extent that VC/startups stampede into particular sectors, this leads to inefficiency of the first time—were so many ‘green energy’ startups necessary? When they began failing in a cluster, information-wise, that was highly redundant. And then on the other hand, if a startup idea becomes ‘debunked’, and no one is willing to invest in it ever, that idea may be starved of investment long past its ripe time, and this means big regret.
I think most people are aware of fads/stampedes in investing, but the latter error is not so commonly discussed. One idea is that a VC firm could explicitly track ideas that seem great but have had several failed startups, and try to schedule additional investments at ever greater intervals (similar to DS-PRL), which bounds losses (if the idea turns out to be truly a bad idea after all) but ensures eventual success (if a good one). For example, even if online pizza delivery has failed every time it’s tried, it still seems like a good idea that people will want to order pizza online via their smartphones, so one could try to do a pizza startup 2.5 years later, then 5 years later, then 10 years, then 20 years, or perhaps every time computer costs drop an order of magnitude, or perhaps every time the relevant market doubles in size? Since someone wanting to try the business again might not pop up at the exact time desired, a VC might need to create one themselves by trying to inspire someone to do it.
What other lessons could we draw if we thought about technology this way? The use of lottery grants is one idea which has been proposed, to help break the over-exploitation fostered by peer review; the randomization gives disfavored low-probability proposals (and people) a chance. If we think about multi-level optimization systems & population-based training, and optimization of evolution like strong amplifiers (which resemble small but networked communities: Pavlogiannis et al 2018), that would suggest we should have a bias against both large and small groups/institutes/granters, because small ones are buffeted by random noise/drift and can’t afford well-powered experiments, but large ones are too narrow-minded.28 But a network of medium ones can both explore well and then efficiently replicate the best findings across the network to exploit them.

Origins of Innovation: Bakewell & Breeding
Evolution as Backstop for Reinforcement Learning
Littlewood’s Law and the Global Media (It only takes one—for good & ill)
“Guess 2/3 of the average”

“The Myth of The Infrastructure Phase”
“Book Review: Zero To One”, SSC
“Resistant protocols: How decentralization evolves”, John Backus
“Technological convergence in drug discovery and other endeavors”
“Explicit and Tacit Rationality”
“The Milo Criterion”
“Quantifying the evolution of individual scientific impact”, Sinatra et al 2016
“Large teams develop and small teams disrupt science and technology”, Wu et al 2019
“Why did we wait so long for the bicycle?”
Discussion: HN, 1/2, Twitter: 1/2/3, GoodReads

ARPA and SCI: Surfing AI (Review of Roland & Shiman 2002)

Review of DARPA history book, Strategic Computing: DARPA and the Quest for Machine Intelligence, 1983–1993, Roland & Shiman 2002, which reviews a large-scale DARPA effort to jumpstart real-world uses of AI in the 1980s by a multi-pronged research effort into more efficient computer chip R&D, supercomputing, robotics/self-driving cars, & expert system software. Roland & Shiman 2002 particularly focus on the various ‘philosophies’ of technological forecasting & development, which guided DARPA’s strategy in different periods, ultimately endorsing a weak technological determinism where the bottlenecks are too large for a small (in comparison to the global economy & global R&D) organization best a DARPA can hope for is a largely agnostic & reactive strategy in which granters ‘surf’ technological changes, rapidly exploiting new technology while investing their limited funds into targeted research patching up any gaps or lags that accidentally open up and block broader applications.

While reading “Funding Breakthrough Research: Promises and Challenges of the ‘ARPA Model’”, Azoulay et al 2018, on DARPA, I noticed an interesting comment:

In this paper, we propose that the key elements of the ARPA model for research funding are: organizational flexibility on an administrative level, and significant authority given to program directors to design programs, select projects and actively manage projects. We identify the ARPA model’s domain as mission-oriented research on nascent S-curves within an inefficient innovation system.
…Despite a great deal of commentary on DARPA, lack of access to internal archival data has hampered efforts to study it empirically. One notable exception is the work of Roland and Shiman (2002),2 who offer an industrial history of DARPA’s effort to develop machine intelligence under the “Strategic Computing Initiative” [SCI]. They emphasize both the agency’s positioning in the research ecosystem—carrying military ideas to proof of concept that would be otherwise neglected—as well as the program managers’ role as “connectors” in that ecosystem. Roland and Shiman are to our knowledge the only academic researchers ever to receive internal access to DARPA’s archives. Recent work by Goldstein and Kearney (2018a) on ARPA-E is to-date the only quantitative analysis using internal program data from an ARPA agency. [For insights into this painful process, see the preface of Roland and Shiman (2002).]

The two Goldstein & Kearney 2018 papers sounded interesting but alas, are listed as “manuscript under review”/“manuscript in preparation”; only one is available as a preprint. I was surprised that an agency as well known and intimately involved in computing history could be described as having one internal history, ever, and looked up a PDF copy of Strategic Computing: DARPA and the Quest for Machine Intelligence, 1983–1993, Roland & Shiman 2002.
The preface makes clear the odd footnote: while they may have had some access to internal archival data, they had a lot less access than they requested, DARPA was not enthusiastic about it, and eventually canceled their book contract (they published anyway). This leads to an… interesting preface. You don’t often hear historians of solicited official histories describe the access as a “mixed blessing” and say things like “they never lied to us, as best as we can tell”, they just “simply could not understand why we wanted to see the materials we requested”, or recount that their “requests for access to these [emails] were most often met with laughter”, noting that “We were never explicitly denied access to records controlled by DARPA; we just never gained complete access.” Frustrated, they

…then asked if they could identify any document in the SC program that spoke the truth, that could be accepted at face value. They [ARPA interviewees] found this an intriguing question. They could not think of a single such document. All documents, in their view, distorted reality one way or another—always in pursuit of some greater good.

In one anecdote from the interviews, Lynn Conway shows up with a stack of internal DARPA documents, states that a NDA prevents her from talking about them (as if anyone cared about NDAs from decades before), and refuses to show any of the documents to the interviewer, leaving me rather bemused—why bother? (Although in this case, it may just be that Conway is a jerk—one might remember her from helping try to frame Michael Bailey for sexual abuse.) I was reminded a little of Carter Scholz’s also 2002 novel, Radiance, which touches on SDI and indirectly on SCI.
The book itself doesn’t seem to have suffered too badly for the birth pangs. It’s an overview of the birth and death of the SCI, organized in chunks by the manager. The division by manager is not an accident—R&S comment deprecatingly about DARPA personnel being focused on the technology and how they didn’t want them to “talk about people and politics” and invoke the strawman of “technological determinists”; they seem to adopt the common historian pose that a sophisticated historian focuses on people and it is naive & unsophisticated to invoke objective constraints of science & technology & physics. This is wrong in the context of SCI, as their in-depth recounting will eventually make clear. The people did not have much to do with the failures: stuff like gallium arsenide or expert systems or autonomous robots didn’t work out because they don’t work or are hard or require computing power unavailable at the time, not because some bureaucrat made a bad naming choice or ran afoul of the wrong Senator. People don’t matter to something like Moore’s law. Man proposes but Nature disposes—you can fake medicine or psychology easily, but it’s harder to fake a robot not running into trees. Fortunately, for all the time R&S spend on project managers shuffling around acronyms, they still devote adequate space to the actual science & technology and do a good job of it.
So what was SCI? It was a 1980s–1990 add-on to ARPA’s existing funding programs, where the spectre of Japan’s Fifth Generation Project was used to lobby Congress for additional R&D funding which would be devoted to a cluster of interconnected technological opportunities ARPA spied on the US horizon, to push them forward simultaneously and break the logjams. (As always, “funding comes from the threat”, though many were highly skeptical that Fifth Generation would go anywhere or that its intended goals—much of which was to simply work around flaws in Japanese language handling—were much of a threat, and most Western evaluations of it generally describe it as a failure or at least not a notably productive R&D investment.) The systems included gallium arsenide chips to replace silicon’s poor thermal/radiation tolerance and operate at faster frequencies as well, VLSI chips which would combine previously disparate chips onto a single small chip as part of a silicon design ecosystem which would design & manufacture chips much faster than previously29, parallel processing computers going far beyond just 1 or 2 processors, autonomous car robots, AI expert systems, and advanced user-friendly software tools in general. The name “Strategic Computing Initiative” was chosen to try to benefit from Reagan’s SDI, but while the military connections remained throughout, the connection was ultimately quite tenuous and the gallium arsenide chips were deliberately split out to SDI to avoid contamination, although the US military would still be the best customer for many of the products & the connections continued to alienate people. Surprisingly—shockingly, even—computer networking was not a major SCI focus: the ARPA networking PM Barry Leiner kept clear of SCI (not needing the money & fearing a repeat of know-nothing Republican Congressmen searching for something to axe). The funding ultimately amounted to $1,000,0002,141,3661993, trivial compared to total military funding, but still real money.
The project implementation followed ARPA’s existing loose oversight paradigm, where traveling project managers were empowered to dispense grants to applicants on their own authority, depending primarily on their own good taste to match talented researchers with ripe opportunities, with bureaucracy limited to meeting with the grantees semi-annually or annually for progress reports & evaluation, often in groups so as to let researchers test each other’s mettle & form social ties. (“ARPA program managers like to repeat the quip that they are 75 entrepreneurs held together by a common travel agent.”) An ARPA PM would humbly ‘surf’ the cutting-edge, going with the waves rather than swimming upstream, so to speak, to follow growing trends while cutting their losses on dead ends, to bring things through the ‘valley of death’ between lab prototype and the real world:

Steven Squires, who rose from program manager to be Chief Scientist of SC and then director of its parent office, sought orders-of-magnitude increases in computing power through parallel connection of processors. He envisioned research as a continuum. Instead of point solutions, single technologies to serve a given objective, he sought multiple implementations of related technologies, an array of capabilities from which users could connect different possibilities to create the best solution for their particular problem. He called it “gray coding”. Research moved not from the white of ignorance to the black of revelation, but rather it inched along a trajectory stepping incrementally from one shade of gray to another. His research map was not a quantum leap into the unknown but a rational process of connecting the dots between here and there. These and other DARPA managers attempted to orchestrate the advancement of an entire suite of technologies. The desideratum of their symphony was connection. They perceived that research had to mirror technology. If the system components were to be connected, then the researchers had to be connected. If the system was to connect to its environment, then the researchers had to be connected to the users. Not everyone in SC shared these insights, but the founders did, and they attempted to instill this ethos in the program.

Done wrong, of course, this results in a corrupt slush fund doling out R&D funds to an incestuous network of grantees for technologies always just on the horizon and whose failure is always excused by the claim that high-risk research often won’t work out, or results in elaborate systems trying to do too many things and collapsing under the weight of many advanced half-debugged systems chaotically interacting (eg ILLIAC IV). Having been conceived in scientific sin and born of blue-uniform bureaucracy while midwifed by conniving committees, SCI’s prospects might not look too great.
So, did SCI work out? The answer is a definite, unqualified—maybe:

At the end of their decade, 1983–1993, the connection failed. SC never achieved the machine intelligence it had promised. It did, however, achieve some remarkable technological successes. And the program leaders and researchers learned as much from their failures as from their triumphs. They abandoned the weak components in their system and reconfigured the strong ones. They called the new system “high performance computing”. Under this new rubric they continued the campaign to improve computing systems. “Grand challenges” replaced the former goal, machine intelligence; but the strategy and even the tactics remained the same.

The end of SCI coincided with (and partially caused) the “AI Winter”, but SCI went beyond just the Lisp machine & expert system software companies we associate with the AI winter. Of the systems, some worked out, others were good ideas but the time wasn’t ripe in an unforeseeable way and have been maturing ever since, some have poked along in a kind of permanent stasis (not dead but not alive either), others were dead ends but dead ends in important ways, and some are plain dead. In order, one might list: parallel commodity processors and rapid development of large silicon chips via a subsidized foundry, the autonomous cars/vehicles and generalized machine intelligence systems and expert systems, Thinking Machines’s Connection Machine, and Josephson junctions.
Pining for the fjords: super-fast superconducting Josephson junctions were rapidly abandoned before becoming officially part of SCI research, while gallium arsenide suffered a similar fate—at the time, they were exciting and Cray Computers infamously bet big on the Cray 3 achieving its OOM improvement in part with gallium arsenide chips, but somehow it never quite worked out or replaced silicon and remains in a small niche. (I doubt it was SDI’s fault, since gallium arsenide has had 2 decades since, and there’s been a ton of commercial incentive to find a replacement for silicon as it gets ever harder to shrink silicon nodes.)
Important failures: autonomous vehicles and generalized AI systems represent an interesting intermediate case: the funded vehicles, like the work at CMU, were useless—expensive, slow, trivially confused by slight differences in roads or scenery, unable to cope in realtime with more than monochrome images with pitiful resolutions like 640x640px or smaller because the computer vision algorithms were too computationally demanding, and the development bogged down by endless tweaks and hacking with regular regressions in capability. But these research programs and demos were direct ancestors of the DARPA Grand Challenge, which itself kickstarted the current self-driving car boom a decade ago. ARPA and the military didn’t get the exciting vehicles promised by the early ’90s, but they do now have autonomous cars and especially drones, and it’s amazing to think that Google Waymo cars are wandering around Arizona now regularly picking up and dropping off riders without a single fatality or major injury after millions of miles. As far as I can tell, Waymo wouldn’t exist now without the DARPA Grand Challenge, and it seems possible that DARPA was encouraged by the mixed success of the SCI vehicles, so that’s an interesting case of potential success albeit delayed. (But then, we do expect that with technology—Amara’s law.)
Parallel computers: Thinking Machines benefited a lot from SCI as did other parallel computing projects, and while TM did fail and the computers we use now don’t resemble the Connection Machine at all30, the field of parallel processing was proven out (ie. systems with thousands of weak CPUs could be successfully built, programmed, realize OOM performance gains, and commercially sold); I’d noticed once that a lot of parallel computing architectures we use now seemed to stem from an efflorescence in the 1980s, but it was only while reading R&S and noting all the familiar names that I realized that that was not a coincidence because many of them were ARPA-funded at this time. Even without R&S noting that the parallel computing was successfully rolled over into “HPC”, SCI’s investment into parallel computing was a big success.
A successful adjunct to the parallel computing was an interesting program I’d never heard of before: MOSIS. MOSIS was essentially a government-subsidized chip foundry, competitive with commercial chip foundries, which would accept student & researcher submissions of VLSI chip designs like CPUs or ASICs and make physical chips in combined batches to save costs. Anyone with interesting new ideas could email in a design and get back within 2 months a real live chip for a few hundred dollars. The chips would be made cheaply, quickly, quality-checked, with assurance of privacy, and ran thousands of projects a year (peaking at 1880 in 1989). This is quite a cool program to run and must have been a godsend, especially for anyone trying to make custom chips for parallel projects. (“SC also supported BBN’s Butterfly parallel processor, Charles Seitz’s Hypercube and Cosmic Cube at CalTech, Columbia’s Non-Von, and the CalTech Tree Machine. It supported an entire newcomer as well, Danny Hillis’s Connection Machine, coming out of MIT.47 All of these projects used MOSIS services to move their design ideas into experimental chips.”) It was involved in early GPU work (Clark’s Geometry Engine) and RISC designs like MIPS and even oddities like systolic array chips/computers like the iWarp. Sadly, MOSIS was a bit of a victim of its own success and drew political fire.
Expert systems and planners are generally listed as a ‘failure’ and the cause of the AI Winter, and it’s true they didn’t give us HAL as some GOFAI people hoped, but they did find a useful niche and have been important—R&S give a throwaway paragraph noting that one system from SCI, DART, was used in planning logistics for the first Gulf War and saved the DoD more money than the whole SCI program combined cost. (The listed reference, “DART: Revolutionizing Logistics Planning”, Hedberg 2002, actually makes the bolder claim that DART “paid back all of DARPA’s 30 years of investment in AI in a matter of a few months, according to Victor Reis, Director of DARPA at the time.” Which could be equally well taken as a comment on how expensive a war is, how inefficient DoD logistics planning was, or how little has been invested in AI.) It’s also worth noting that speech recognition based on Hidden Markov models & n-grams, the first speech recognition systems which were any use (underlying successes like Dragon Naturally Speaking), was a success here, even if now obsolesced by deep learning.
Perhaps the most relevant area to contemporary AI discussions of deep learning is the expert systems. Why was there such optimism? Expert systems had accomplished a few successes: MYCIN/DENDRAL (although it was never used in production), some mining/oil case studies like PROSPECTOR, a customer configuration assistant XCON for DEC… And SCI was a synergistic program, remember, providing the chips and then powerful parallel computers whose expert systems would scale up to the tens of thousands of rules per second estimated necessary for things like the autonomous vehicles:

Small wonder, then, that Robert Kahn and the architects of SC believed in 1983 that AI was ripe for exploitation. It was finally moving out of the laboratory and into the real world, out of the realm of toy problems and into the realm of real problems, out of the sterile world of theory and into the practical world of applications.
…That such a goal appeared within reach in the early 1980s is a measure of how far the field had already come. In the early 1970s, the MYCIN expert system had taken twenty person-years to produce just 475 rules.38 The full potential of expert systems lay in programs with thousands, even tens and hundreds of thousands, of rules. To achieve such levels, production of the systems had to be dramatically streamlined. The commercial firms springing up in the early 1980s were building custom systems one client at a time. DARPA would try to raise the field above that level, up to the generic or universal application.
Thus was shaped the SC agenda for AI. While the basic program within IPTO continued funding for all areas of AI, SC would seek “generic applications” in four areas critical to the program’s applications: (1) speech recognition would support Pilot’s Associate and Battle Management; (2) natural language would be developed primarily for Battle Management; (3) vision would serve primarily the Autonomous Land Vehicle; and (4) expert systems would be developed for all of the applications. If AI was the penultimate tier of the SC pyramid, then expert systems were the pinnacle of that tier. Upon them all applications depended. Development of a generic expert system that might service all three applications could be the crowning achievement of the program. Optimism on this point was fueled by the whole philosophy behind SC. AI in general, and expert systems in particular, had been hampered previously by lack of computing power. Feigenbaum, for example, had begun DENDRAL on an IBM 7090 computer, with about 130K bytes of core memory and an operating speed between 50 and 100,000 floating point operations per second.39 Computer power was already well beyond that stage, but SC promised to take it to unprecedented levels—a gigaflop by 1992. Speed and power would no longer constrain expert systems. If AI could deliver the generic expert system, SC would deliver the hardware to run it. Compared to existing expert systems running 2,000 rules at 50–100 rules per second, SC promised “multiple cooperating expert systems with planning capability” running 30,000 rules firing at 12,000 rules per second and six times real time.40

What happened was that the hardware came into existence, but the expert systems didn’t scale. They instantly hit a combinatorial wall, couldn’t solve the grounding problem, and knowledge engineering never became feasible at the level where you might encode a human’s knowledge. Expert systems also struggled to be extended beyond symbolic systems to real data like vision or sound. AI didn’t have remotely enough computing power to do anything useful, and it didn’t have methods which could use the computing power if it had it. We got the VLSI chips, we got the gigahertz processors even without gallium arsenide, we got the gigaflops and then the teraflops and now the petaflops—but what do you do with an expert system on those? Nothing. The grand goals of SCI relied on all the parts doing their part, and one part fell through:

Only four years into the SC program, when Schwartz was about to terminate the IntelliCorp and Teknowledge contracts, expectations for expert systems were already being scaled back. By the time that Hayes-Roth revised his article for the 1992 edition of the Encyclopedia, the picture was still more bleak. There he made no predictions at all about program speeds. Instead he noted that rule-based systems still lacked “a precise analytical foundation for the problems solvable by RBSs . . . and a theory of knowledge organization that would enable RBSs to be scaled up without loss of intelligibility of performance.”108 SC contractors in other fields, especially applications, had to rely on custom-developed software of considerably less power and versatility than those envisioned when contracts were made with IntelliCorp and Teknowledge. Instead of a generic expert system, SC applications relied increasingly on “domain-specific software”, a change in terminology that reflected the direction in which the entire field was moving.109 This is strikingly similar to the pessimistic evaluation Schwartz had made in 1987. It was not just that IntelliCorp and Teknowledge had failed; it was that the enterprise was impossible at current levels of experience and understanding…Does this mean that AI has finally migrated out of the laboratory and into the marketplace? That depends on one’s perspective. In 1994 the U.S. Department of Commerce estimated the global market for AI systems to be about $9001,8621994 million, with North America accounting for two-thirds of that total.119 Michael Schrage, of the Sloan School’s Center for Coordination Science at MIT, concluded in the same year that “AI is—dollar for dollar—probably the best software development investment that smart companies have made.”120 Frederick Hayes-Roth, in a wide-ranging and candid assessment, insisted that “KBS have attained a permanent and secure role in industry”, even while admitting the many shortcomings of this technology.121 Those shortcomings weighed heavily on AI authority Daniel Crevier, who concluded that “the expert systems flaunted in the early and mid-1980s could not operate as well as the experts who supplied them with knowledge. To true human experts, they amounted to little more than sophisticated reminding lists.”122 Even Edward Feigenbaum, the father of expert systems, has conceded that the products of the first generation have proven narrow, brittle, and isolated.123 As far as the SC agenda is concerned, Hayes-Roth’s 1993 opinion is devastating: “The current generation of expert and KBS technologies had no hope of producing a robust and general human-like intelligence.”124
…Each new [ALV] feature and capability brought with it a host of unanticipated problems. A new panning system, installed in early 1986 to permit the camera to turn as the road curved, unexpectedly caused the vehicle to veer back and forth until it ran off the road altogether.45 The software glitch was soon fixed, but the panning system had to be scrapped anyway; the heavy, 40-pound camera stripped the device’s gears whenever the vehicle made a sudden stop.46 Given such unanticipated difficulties and delays, Martin increasingly directed its efforts toward achieving just the specific capabilities required by the milestones, at the expense of developing more general capabilities. One of the lessons of the first demonstration, according to the ALV engineers, was the importance of defining “expected experimental results”, because “too much time was wasted doing things not appropriate to proof of concept.”47 Martin’s selection of technology was conservative. It had to be, as the ALV program could afford neither the lost time nor the bad publicity that a major failure would bring. One BDM observer expressed concern that the pressure of the demonstrations was encouraging Martin to cut corners, for instance by using the “flat earth” algorithm with its two-dimensional representation. ADS’s obstacle-avoidance algorithm was so narrowly focused that the company was unable to test it in a parking lot; it worked only on roads.84…The vision system proved highly sensitive to environmental conditions—the quality of light, the location of the sun, shadows, and so on. The system worked differently from month to month, day to day, and even test to test. Sometimes it could accurately locate the edge of the road, sometimes not. The system reliably distinguished the pavement of the road from the dirt on the shoulders, but it was fooled by dirt that was tracked onto the roadway by heavy vehicles maneuvering around the ALV. In the fall, the sun, now lower in the sky, reflected brilliantly off the myriads of polished pebbles in the tarmac itself, producing glittering reflections that confused the vehicle. Shadows from trees presented problems, as did asphalt patches from the frequent road repairs made necessary by the harsh Colorado weather and the constant pounding of the eight-ton vehicle.42
…Knowledge-based systems in particular were difficult to apply outside the environment for which they had been developed. A vision system developed for autonomous navigation, for example, probably would not prove effective for an automated manufacturing assembly line. “There’s no single universal mechanism for problem solving”, Amarel would later say, “but depending on what you know about a problem, and how you represent what you know about the problem, you may use one of a number of appropriate mechanisms.”…In another major shift in emphasis, SC2 removed “machine intelligence” from its own plateau on the pyramid, subsuming it under the general heading “software”. This seemingly minor shift in nomenclature signaled a profound reconceptualization of AI, both within DARPA and throughout much of the computer community. The effervescent optimism of the early 1980s gave way to more sober appraisal. AI did not scale. In spite of impressive achievements in some fields, designers could not make systems work at a level of complexity approaching human intelligence. Machines excelled at data storage and retrieval; they lagged in judgment, learning, and complex pattern recognition.
…During SC, AI had proved unable to exploit the powerful machines developed in SC’s architectures program to achieve Kahn’s generic capability in machine intelligence. On the fine-grained level, AI, including many developments from the SC program, is ubiquitous in modern life. It inhabits everything from automobiles and consumer electronics to medical devices and instruments of the fine arts. Ironically, AI now performs miracles unimagined when SC began, though it can’t do what SC promised.

Given how people keep reaching back to the AI Winter in discussions of connectionism—I mean, deep learning—it’s interesting to contrast the two paradigms.
While working on the Wikipedia article for Lisp machines (and articles on related high-profile successes like MYCIN/DENDRAL) back in 2009, I read many journals & magazines from the 1980s, the Lisp machine heyday, and even played with a Genera OS image in a VM; the more I read about AI, the MIT AI Lab, Lisp machines, the ‘AI winter’, and so on, the more impressed I was by the operating systems & tools (such as the sophisticated hypertext documentation & text editors and capabilities of Common Lisp & its predecessors, which still put contemporary OS ecosystems on Windows/Mac/Linux to shame in many ways31) and the less I was impressed by the actual AI algorithms of the era. In contrast, with deep learning, I am increasingly unimpressed by the surrounding ecosystem of software tools (with its endless layers of buggy Python & rigid C++) the more I use it, but more and more impressed by what is possible with deep learning.
Deep learning has long ago escaped into the commercial market, indeed, is primarily driven by industry researchers at this point. The case studies are innumerable (and many are secret due to their considerable commercial value). DL handles grounding problems & raw sensory data well and indeed struggles most on problems with richly formalized structures like hierarchies/categories/directed graphs (ML practitioners currently tend to use decision tree methods like XGBoost for those), or which require using rules & logical reasoning (somewhat like humans). Perhaps most importantly from the perspective of SCI and HPC, deep learning scales: it parallelizes in a number of ways, and it can soak up indefinite amounts of computing power & data. You can train a CNN on a few hundred or thousand images usefully32, but Facebook & Google have run experiments going from millions to large datasets such as hundreds of millions and billions of images (eg Gao et al 2017, Gross et al 2017, Sun et al 2017, Mahajan et al 2018, Laanait et al 2019, Anonymous et al 2019), and the CNNs steadily improve their performance on both their assigned task and in accomplishing transfer learning33. Similarly in reinforcement learning, the richer the resources available, the richer a NN can be trained (OA chart; consider how deep Zero’s NN is compared to the original AlphaGo, or Ape-X or Impala for learning many ALE games simultaneously, or OpenAI’s 5×5 DoTA progress via essentially brute force). Even self-driving car programs which are a byword for incompetence deal just fine with all the issues that bedeviled ALV by using, well, ‘a single universal mechanism for problem solving’ (which we call CNNs, which can do anything from image segmentation to human language translation). These points are all the more striking as there is no sign that hardware improvements are over or that any inherent limits have been hit; even the large-scale experiments criticized as ‘boil the oceans’ projects nevertheless spend what are trivial amounts of money by both global economic and R&D criteria, like a few million dollars of GPU time. But none of this could have been done in the 1980s, or early 1990s. (As Hinton says, why didn’t connectionism work back then? Because the computers were thousands of times too slow, the datasets were thousands of times too small, and some of the neural network details like initializations & activations were broken.)
Considering all this, it’s not a surprise that the AI part of SC didn’t pan out and eventually got axed, as it should have. Sometimes the time is not ripe. Hero can invent the steam engine, but you don’t get steam engine trains until it’s steam engine train time, and the best intentions of all the bureaucrats in the world can’t affect that much. The turnover in managers and political interference may well have been enough to “disrupt the careful orchestration that its ambitious agenda required”, but this was more in the nature of shooting a dead horse. R&S seem, somewhat reluctantly, to ultimately assent to the view they critiqued at the beginning, held by the ARPA staff, that the failure of SC is primarily a demonstration of technological determinism than social & political contingency, and more about the technology than people:

…Thus, for all of their agency, their story appears to be one driven by the technology. If they were unable to socially construct this technology, to maintain agency over technological choice, does it then follow that some technological imperative shaped the SC trajectory, diverting it in the end from machine intelligence to high performance computing? Institutionally, SC is best understood as an analog of the development programs for the Polaris and Atlas ballistic missiles. An elaborate structure was created to sell the program, but in practice the plan bore little resemblance to day-to-day operations. Conceptually, SC is best understood by mixing Thomas Hughes’s framework of large-scale technological systems with Giovanni Dosi’s notions of research trajectories. Its experience does not quite map on Hughes’s model because the managers could not or would not bring their reverse salients on line. It does not quite map on Dosi because the managers regularly dealt with more trajectories and more variables than Dosi anticipates in his analyses. In essence, the managers of SC were trying to research and develop a complex technological system. They succeeded in developing some components; they failed to connect them in a system. The overall program history suggests that at this level of basic or fundamental research it is best to aim for a broad range of capabilities within the technology base and leave integration to others…While the Fifth Generation program contributed significantly to Japan’s national infrastructure in computer technology, it did not vault that country past the United States…SC played an important role, but even some SC supporters have noted that the Japanese were in any event headed on the wrong trajectory even before the United States mobilized itself to meet their challenge.
…In some ways the varying records of the SC applications shed light on the program models advanced by Kahn and Cooper at the outset. Cooper believed that the applications would pull technology development; Kahn believed that the evolving technology base would reveal what applications were possible. Kahn’s appraisal looks more realistic in retrospect. It is clear that expert systems enjoyed significant success in planning applications. This made possible applications ranging from Naval Battle Management to DART. Vision did not make comparable progress, thus precluding achievement of the ambitious goals set for the ALV. Once again, the program went where the technology allowed. Some reverse salients resisted efforts to orchestrate advance of the entire field in concert. If one component in a system did not connect, the system did not connect.
In the final analysis, SC failed for want of connection.

Reading about SC furnishes an unexpected lesson about the importance of believing in Moore’s Law and having techniques which can scale. What are we doing now which won’t scale, and what waves are we paddling up instead of surfing?
Reverse Salients

Excerpts from The First Miracle Drugs: How the Sulfa Drugs Transformed Medicine, Lesch 2006, describing Heinrich Hörlein’s drug development programs & Thomas Edison’s electrical programs as strategically aimed at “reverse salients”, necessary steps which hold back the practical application of progress in areas, where research efforts have disproportional payoffs by removing a bottleneck.

From pg48, “A System of Invention”, The First Miracle Drugs: How the Sulfa Drugs Transformed Medicine, Lesch 2006:

Hörlein’s attitude was based not simply, or even primarily, on the situation of any particular area of research considered in isolation, but on his comprehensive overview of advance in areas in which chemistry and biomedicine intersected. These areas shared a number of generic problems and solutions, for example, the need to isolate a substance (natural product, synthetic product, body substance) in chemically pure form, the need to synthesize the substance and to do so economically if it was to go on the market, and the need for pharmacological, chemotherapeutic, toxicological, and clinical testing of the substance. Hörlein’s efforts to translate success in certain areas (vitamin deficiency disease, chemotherapy of protozoal infections) into optimism about possibilities in other areas (cancer, antibacterial chemotherapy) was characteristic. He regarded the chemical attack on disease as a many-fronted battle in which there was a generally advancing line but also many points at which advance was slow or arrested.
In this sense, Hörlein might be said to have thought—as Thomas Hughes has shown that Edison did—in terms of reverse salients and critical problems. Reverse salients are areas of research and development that are lagging in some obvious way behind the general line of advance. Critical problems are the research questions, cast in terms of the concrete particulars of currently available knowledge and technique and of specific exemplars or models (e.g., insulin, chemotherapy of kala-azar and malaria) that are solvable and whose solutions would eliminate the reverse salients.18
On Edison, see Thomas P. Hughes, Networks of Power: Electrification in Western Society 1880–1930 (Baltimore, MD: Johns Hopkins University Press, 1983), 18–46.
Ibid; and Thomas P. Hughes, “The evolution of large technological systems”, in Wiebe E. Bijker, Thomas P. Hughes, and Trevor Pinch, editors, The Social Construction of Technological Systems (Cambridge, MA: The MIT Press, 1987)
…What was systemic in Hörlein’s way of thinking was his concept of the organizational pattern or patterns that will best facilitate the production of valuable results in the areas in which medicine and chemistry interact. A valuable outcome is a result that has practical importance for clinical or preventive medicine and, implicitly, commercial value for industry. Hörlein perceived a need for a set of mutually complementary institutions and trained personnel whose interaction produces the desired results. The organizational pattern that emerges more or less clearly from Hörlein’s lectures is closely associated with his view of the typical phases or cycles of development of research in chemotherapy or physiological chemistry. He saw a need for friendly and mutually supportive relations between industrial research and development organizations, academic institutions, and clinicians. He viewed the academic-industrial connection as crucial and mutually beneficial. Underlying this view was his definition and differentiation of the relevant disciplines and his belief in their generally excellent condition in Germany. He saw a need for government support of appropriate institutions, especially research institutes in universities. Within industrial research organizations—and, implicitly, within academic ones—Hörlein called for special institutional arrangements to encourage appropriate interactions between chemistry and biomedicine.
An element of crucial—and to Hörlein, personal—importance in these interactions was the role of the research manager or “team leader.” When Hörlein spoke of the research done under his direction as “our work,” he used the possessive advisedly to convey a strong sense of his own participation. The research manager had to be active in defining goals, in marshaling means and resources, and in assessing success or failure. He had to intervene where necessary to minimize friction between chemists and medical researchers, an especially important task for chemotherapy as a composite entity. He had to publicize the company’s successes—a necessity for what was ultimately a commercial enterprise—and act as liaison between company laboratories and the academic and medical communities. Through it all, he had to take a long view of the value of research, not insisting on immediate results of medical or commercial value.
As a research manager with training and experience in pharmaceutical chemistry, a lively interest in medicine, and rapport with the medical community, Hörlein was well positioned to survey the field where chemistry and medicine joined battle against disease. He could spot the points where the enemy’s line was broken, and the reverse salients in his own. What he could not do—or could not do alone—was to direct the day-to-day operations of his troops, that is, to define the critical problems to be solved, to identify the terms of their solution, and to do the work that would carry the day. In the case of chemotherapy, these things could be effected only by the medical researcher and the chemist, each working on his own domain, and cooperatively. For his attack on one of the most important reverse salients—the chemotherapy of bacterial infections—Hörlein called upon the medical researcher Domagk and the chemists Mietzsch and Klarer.

“Investing in Good Ideas That Look Like Bad Ideas”

Summary by one VC of a16z investment strategy.

Secrets of Sand Hill Road: Venture Capital and How to Get It, by Scott Kupor 2019 (a16z), excerpts

In a strange way, sometimes familiarity can breed contempt—and conversely, the distance from the problem that comes from having a completely different professional background might actually make one a better founder. Though not venture backed, Southwest Airlines was cofounded in 1967 by Herb Kelleher and of course has gone on to become a very successful business. When interviewed many years later about why, despite being a lawyer by training, he was the natural founder for an airline business, Kelleher quipped: “I knew nothing about airlines, which I think made me eminently qualified to start one, because what we tried to do at Southwest was get away from the traditional way that airlines had done business.”
This has historically been less typical in the venture world, but, increasingly, as entrepreneurs take on more established industries—particularly those that are regulated—bringing a view of the market that is unconstrained by previous professional experiences may in fact be a plus. We often joke at a16z that there is a tendency to “fight the last battle” in an area in which one has long-standing professional exposure; the scars from previous mistakes run too deep and can make it harder for one to develop creative ways to address the business problem at hand. Perhaps had Kelleher known intimately of all the challenges of entering the airline business, he would have run screaming from the challenge versus deciding to take on the full set of risks.
Whatever the evidence, the fundamental question VCs are trying to answer is: Why back this founder against this problem set versus waiting to see who else may come along with a better organic understanding of the problem? Can I conceive of a team better equipped to address the market needs that might walk through our doors tomorrow? If the answer is no, then this is the team to back.
The third big area of team investigation for VCs focuses on the founder’s leadership abilities. In particular, VCs are trying to determine whether this founder will be able to create a compelling story around the company mission in order to attract great engineers, executives, sales and marketing people, etc. In the same vein, the founder has to be able to attract customers to buy the product, partners to help distribute the product, and, eventually, other VCs to fund the business beyond the initial round of financing. Will the founder be able to explain her vision in a way that causes others to want to join her on this mission? And will she walk through walls when the going gets tough—which it inevitably will in nearly all startups—and simply refuse to even consider quitting?
When Marc and Ben first started Andreessen Horowitz, they described this founder leadership capability as “egomaniacal.” Their theory—notwithstanding the choice of words—was that to make the decision to be a founder (a job fraught with likely failure), an individual needed to be so confident in her abilities to succeed that she would border on being so self-absorbed as to be truly egomaniacal. As you might imagine, the use of that term in our fund-raising deck for our first fund struck a chord with a number of our potential investors, who worried that we would back insufferable founders. We ultimately chose to abandon our word choice, but the principle remains today: You have to be partly delusional to start a company given the prospects of success and the need to keep pushing forward in the wake of the constant stream of doubters.
After all, nonobvious ideas that could in fact become big businesses are by definition nonobvious. My partner Chris Dixon describes our job as VCs as investing in good ideas that look like bad ideas. If you think about the spectrum of things in which you could invest, there are good ideas that look like good ideas. These are tempting, but likely can’t generate outsize returns because they are simply too obvious and invite too much competition that squeezes out the economic rents. Bad ideas that look like bad ideas are also easily dismissed; as the description implies, they are simply bad and thus likely to be trapdoors through which your investment dollars will vanish. The tempting deals are the bad ideas that look like good ideas, yet they ultimately contain some hidden flaw that reveals their true “badness”. This leaves good VCs to invest in good ideas that look like bad ideas—hidden gems that probably take a slightly delusional or unconventional founder to pursue. For if they were obviously good ideas, they would never produce venture returns.

Read More