Method: Structural diversity

Measuring technological complexity has been a challenge for long. In large parts, this was due to a lack of appropriate data. However, the increasing availability of patent information eased the “data issue”, however, the lack of an empirical measure of (knowledge) complexity was not addressed until Fleming & Sorenson (2001). These authors approached the issue by conceptualizing technological advancement as a search process for (novel) knowledge (re-)combination. They argue that the difficulty of combining knowledge reflects knowledge complexity, with more difficult combinations being required to advance more complex technologies. By comparing past knowledge element re-combination frequencies to current ones and using patent data, they construct a measure of technological complexity. However, economic reasons unrelated to difficulties in inventive processes might influence the frequency of knowledge combinations. For instance, the lack of some combinations’ market potential can result in minimal attention from researchers. Alternatively it may instead be the range of applications shaping the combinatorial frequency, which may or may not reflect complexity (Sorenson et al. 2006).

In another (more recent) approach Balland and Rigby (2017) apply Hidalgo and Hausmmann’s (2009) economic complexity index, which was originally designed to assess the complexity of countries’ export and employment patterns, to patent data, and thereby obtain an index of knowledge complexity. The approach rests on the assumptions that complex knowledge is relatively scarce geographically and that it tends to co-concentrate with other complex knowledge in space. However, the spatial distribution of knowledge may have multiple explanations, including complexity. For instance, the diffusion of knowledge in space and, hence, its geographic distribution, depend on its degree of maturity, popularity, natural conditions, geographic distance, place of origin, and crucially, economic potential. From an empirical perspective, constructing a complexity index on the basis of the spatial distribution of knowledge raises two additional issues: It represents a potentially endogenous variable in many spatial research settings and its values are conditional on the delineation of the employed spatial units.

In sum, there have been few approaches to measure the complexity of knowledge in general and that of technologies in particular. Dissatisfied with the existing measures, I (with a lot of help from my PhD students and colleagues) started to develop an alternative measure of technological complexity in 2016, which eventually (2019!) resulted in the introduction of structural diversity.

Structural diversity as measure of technological complexity

How the measure of structural diversity is dervied is described in detail in Broekel (2019). The following is just a brief and simplified version of this. In essence, the approach relies on information theory and assesses the diversity of knowledge combinations of a technology. It rests on the idea of technologies consisting of components that are combined with each other (Hargadon 2003, Arthur 2009). Consequently, they can be represented as networks with (knowledge) components as nodes and their combinations as links (nodes and links together represent the so-called combinatorial network). For instance, a table can be seen as a combination of four poles and one table plate, i.e. its components. The idea of structural diversity is to measure the diversity of how these components are combined with each other. In case of a table, all four poles are directly “combined” with the table plate but not among each other. Accordingly, the combinatorial network of a table corresponds to a star-like network composed of one central and four peripheral components (Figure 1).

Fig 1 Star-like network

Since little information is required for its description, the star-like network of a table represents a relatively simple network structure. In contrast, some components of a car might also be related in a star-like manner (front, back and side windows with the car body), while others may rather be connected in form of a “line”: steering wheel to steering column to steering gear. The diversity of these combinatorial structures (topologies), in addition to their size and interdependency, determines the amount of information required for their description. The core idea of structural diversity is that combinatorial networks of complex technologies are characterized by a greater diversity in their (sub)topologies than those of simple technologies. This is because their (knowledge) components are more heterogenous, which translates into a greater variety in the ways they interrelate with each other – which in turn – shows in larger numbers of distinct topologies (stars, lattices, circles, lines) being present in a technology’s combinatorial network. As this increases the information content of the network (amount of information required for its description), the network and thereby the corresponding technology is seen to be more complex. Accordingly, structural diversity captures the complexity of technologies by approximating the diversity of (network) topologies in their combinatorial networks. Notably, these topologies might overlap and overlay each other making a visual identification impossible.

An empirical approximation of structural diversity

Unfortunately, there is no direct empirical measure of the diversity in networks’ (sub)topologies. However, I argue that the network diversity score (NDS) of Emmert-Streib and Dehmer (2012) seems to be a suitable approximation. Given patent data, it can be estimated the following (please see Broekel (2019) for more details.

For each technology T (four-digit CPC class), its combinatorial network is constructed in period t, by extracting all patents with at least one patent subclass (ten-digit CPC subclass) belonging to technology T. Next, the co-occurrence matrix of all patent subclasses appearing on these patents is created and dichotomized with all positive entries being one and all others remaining zero. This matrix represents the combinatorial network of technology T which complexity is assessed by means of the network diversity score (NDS). In the estimation of the NDS multiple subsamples are drawn from the networks’ main component. For each subsample i, the share of modules (ɑmodule), the variability of module sizes (vmodule), the variability of the Laplacian matrix (VLaplacian), and the relation of graphlets of size three and four (rgraphlets) are calculated and the individual NDS score is estimated as:

The iNDS is subsequently averaged over sample networks giving the NDS for this technology’s combinatorial network.

The final NDS is log-transformed and multiplied by -1 to obtain the final complexity measure of structural diversity, which signals higher complexity with larger values.