Notes
-
[1]
b has the same dimension as v1.
-
[2]
The assumption that the elements of ε be mutually independent may be empirically restrictive. In applications of AKM, it is common to only rely on between-job-spell variation in log wages in estimation, in order not to restrict the within-job-spell correlation in ε (Kline, Saggio and Sølvsten [2020]; Bonhomme et al. [2023]).
-
[3]
When X1 is a network matrix, ensuring the assumption that X1 has full column rank often requires to restrict the sample to a connected subnetwork. See Abowd, Kramarz and Margolis [1999] and Abowd, Creecy and Kramarz [2002] for methods to compute connected subnetworks in settings with workers and firms.
-
[4]
Logit models with directed links (Charbonneau [2017]) have a similar structure.
-
[5]
If Xit does vary within spells, then the conditional logit estimator can be used for consistent estimation of θ.
Introduction
1 Network data is increasingly prevalent in applied economics. In this paper we focus on models where agents (e.g., workers and firms) sort and interact on a network. In such settings, accounting for unobserved heterogeneity is empirically key. However, existing approaches to estimation in the presence of flexible heterogeneity are imperfect.
2 A first approach consists in treating the heterogeneity as “fixed effects” parameters to be estimated. Bias reduction methods, initially developed for single-agent panel data (Hahn and Newey [2004]; Dhaene and Jochmans [2015]; Fernández-Val and Weidner [2016]), have been recently extended to networks (e.g., Graham [2017] and Hughes [2022]). The fixed-effects approach is appealing since it does not require modeling the distribution of heterogeneity and how it correlates with conditioning variables. Additionally, in models where agents interact on an exogenous network, the fixed-effects approach does not require specifying a model of network formation.
3 However, the performance of bias reduction methods in fixed-effects models hinges crucially on the network being sufficiently dense. This requirement is at odds with the nature of several empirical networks. For instance, in applications to wage determination in the presence of worker and firm heterogeneity (Abowd, Kramarz and Margolis [1999]), a dense network approximation is typically inappropriate, and fixed-effects estimates suffer from a “limited mobility bias” that may be substantial (Bonhomme et al. [2023]). More generally, sparsity is a feature of many empirical networks (Graham [2020]).
4 A second approach consists in postulating a “random effects” model for the unobserved heterogeneity. For example, Bonhomme, Lamadon and Manresa [2019] and Lentz, Piyapromdee and Robin [2022] propose and estimate random-effects models of worker heterogeneity in the presence of firm heterogeneity to account for sorting and complementarity on the labor market. Studying a different setting, Bonhomme [2021] develops random-effects models of agent heterogeneity in team production networks. Random-effects methods enjoy theoretical guarantees in sparse networks, under the assumption that the model is correctly specified.
5 However, modeling the full distribution of heterogeneity given conditioning variables can be challenging. In models of wage determination in the presence of worker and firm heterogeneity, this requires modeling the heterogeneity conditional on the entire network of employment relationships and job transitions, as proposed, for example, by Woodcock [2008] and Bonhomme et al. [2023]. More generally, the random-effects approach effectively requires modeling the network formation process, which can be a difficult task due to dimensionality and equilibrium multiplicity challenges.
6 In this paper our aim is to achieve the best of these two approaches, in the sense that we seek estimators that are fully robust to the form of heterogeneity, and that behave well in denser and sparser networks. Such estimators currently exist only in very special cases. Notably, Andrews et al. [2008] and Kline, Saggio and Sølvsten [2020] propose exact bias corrections for fixed-effects estimators of variance components in linear regressions on networks, while Graham [2017] proposes a “tetrad logit” estimator in a logistic model of network formation. These strategies mimic panel data methods that are consistent in fixed-length panels, such as conditional logit estimators (Rasch [1960]; Andersen [1970]) or estimators of variance components (Arellano and Bonhomme [2012]).
7 In panel data, the functional differencing approach (Bonhomme [2012]) provides a general methodology to find moment restrictions on parameters that are robust to any distribution of heterogeneity and correlation with conditioning variables, and hold in fixed-length panels. Recent applications of the approach include the derivation of new moment restrictions in binary and discrete choice models, both static and dynamic (Honoré and Weidner [2020]; Honoré, Muris and Weidner [2021]; Dano [2023]). In certain models, no exact moment restrictions exist. However, Dhaene and Weidner [2023] show how to regularize the functional differencing moments to provide restrictions that are satisfied up to a vanishing approximation error as the length of the panel tends to infinity.
8 The starting point of this paper is the observation that the scope of functional differencing is not limited to panel data, and the approach can be applied to any setting with a parametric conditional distribution involving latent variables. Our main goal is to apply the functional differencing approach to derive moment restrictions on parameters in some network settings. Specifically, we consider linear and binary choice logit models on networks, including a novel “AKM logit model” that provides a counterpart to the AKM estimator of Abowd, Kramarz and Margolis [1999] for binary outcomes. In those models, we characterize the available moment restrictions on parameters.
9 In addition, we study average effects that depend on the joint distribution of unobserved heterogeneity and observed covariates. In panel binary choice models, average effects have been studied by various authors (see, e.g., Chernozhukov et al. [2013], Davezies, D’Haultfoeuille and Laage [2021], Dobronyi, Gu and Kim [2021], Aguirregabiria and Carro [2021], and Pakel and Weidner [2021]). The functional differencing approach can be applied to average effects, as initially shown in the working paper version of Bonhomme [2012]. This approach applies to general panel data models where the outcome distribution is parametrically specified and the distribution of heterogeneity and covariates is unrestricted. Here we use it to derive moment conditions on average effects in network settings.
10 Lastly, as in its panel data applications, in the settings that we consider in this paper the functional differencing approach delivers moment restrictions on parameters, yet it does not guarantee identification or consistent estimation of the parameters given those restrictions. Although, in several examples that we study, identification can be verified directly and analog estimators can be constructed, applying the approach to other models will generally require careful analysis of statistical properties. We only briefly touch on estimation at the end of the paper, and leave a deeper analysis of identification and estimation to future work. At the same time, we see the functional differencing approach as a promising building block for researchers to discover novel moment restrictions and estimators in network settings in the future.
11 The outline of the paper is as follows. In the second section we present a class of models with multi-sided heterogeneity in networks. In the third section we describe the functional differencing approach. In the fourth, fifth, sixth and seventh sections we illustrate the approach with various examples. Finally, in the eighth section we briefly sketch how to construct estimators based on functional differencing moment restrictions.
Models with multi-sided heterogeneity in networks
12 In this section we introduce and describe a framework for heterogeneous agents interacting on a network. The subsequent sections show how to derive moment restrictions on parameters and average effects in this framework.
Description and Examples
13 The model consists of two layers: a model of the network, and a model of agents’ outcomes on the network.
14 The network is represented by a graph, or more generally a hypergraph, featuring nodes and edges. Nodes in the network correspond to economic agents, and edges represent their links or collaborations. The network can be static or dynamic, and we will see that, under the assumption that the network is exogenous, a specification of the network formation model is not needed.
15 A key feature of the framework is the presence of agent-specific types, which govern sorting patterns and affect outcomes, yet are latent to the econometrician.
16 Throughout the paper, we will refer to economic agents as workers and firms as a leading example. In settings with workers and firms, we model the network as bipartite, links are employment relationships, and both workers and firms are heterogeneous. We show an example in Figure 1. Employment, job mobility and wages all depend on the workers’ and firms’ latent types. Prominent models of this kind were proposed in Becker [1973], Shimer and Smith [2000], Postel-Vinay and Robin [2002], and Lentz, Piyapromdee and Robin [2022], among many others.
A worker-firm network
A worker-firm network
Note: Nodes are workers (numbered from 1 to 5) and firms (numbered from 1 to 3). Edges are lines linking workers to firms, representing employment relationships. Workers 1, 4, 5 stay in a single firm, while workers 2 and 3 move between two firms.17 Outcomes in the network depend on the agents’ latent types, and on covariates that are observed by the econometrician. Outcomes also depend on idiosyncratic errors, or “shocks.”
18 A key assumption that we will maintain in this paper is that the network is exogenous, in the sense that network links are independent of the shocks conditional on agents’ types and covariates. In models with workers and firms, Bonhomme, Lamadon and Manresa [2019] show that this assumption is satisfied in the models of Shimer and Smith [2000] and Lentz, Piyapromdee and Robin [2022] but that it fails in the model with sequential bargaining of Postel-Vinay and Robin [2002], for example.
19 Our main focus will be on estimating parameters governing the model of outcomes while conditioning on the network. This approach only restricts the network insofar as exogeneity with respect to idiosyncratic shocks is required. However, it allows for general forms of matching and sorting patterns. Beyond the example of workers and firms, this setup is relevant to other bipartite network settings appearing in the economics of education, empirical finance, and trade (Bonhomme [2020]). Non-bipartite settings, such as models of team production, are also encompassed by this framework (Ahmadpoor and Jones [2019]; Bonhomme [2021]).
20 In applications, the network formation model may also be of interest. Since the framework accounts for unobserved heterogeneity, it can be used to study network formation models with additive or non-additive heterogeneity and independent link-specific shocks (Graham [2017]; Bickel and Chen [2009]). However, since our approach relies on a parametric likelihood for the distribution of outcomes, it is not well-suited to study the determinants of link formation in models with strategic interactions (De Paula, Richards-Shubik and Tamer [2018]; Gualdani [2021]; Sheng [2020]).
Probabilistic Framework
21 Let Y denote a vector of outcomes, one for every link (or edge) in the network. Let A denote the vector of latent types, one for every agent (or node). Finally, let X denote a matrix indicating which agents are linked together. In models with covariates, which may be agent-specific or link-specific, we include those in X.
22 We postulate a parametric model for outcomes conditional on the latent types and the network links (and possibly covariates),
23 where fθ is a parametric distribution indexed by a finite-dimensional parameter vector θ.
24 We leave the joint distribution of latent types and network links (and possibly covariates) fully unrestricted. Formally,
25 where π is an unknown (i.e., nonparametric) distribution.
26 The combination of a parametric outcome distribution and a nonparametric distribution of heterogeneity and covariates is common in the panel data literature. Indeed, Equations 1 and 2 nests the standard panel data case, where X simply indicates which observations correspond to the same worker. Nevertheless, the current framework is not limited to panel data.
27 A key assumption implied by the specification of a likelihood conditional on X (and A) is that the network is assumed exogenous. In panel data models, this corresponds to the assumption of strict exogeneity of covariates. This assumption is commonly relaxed in linear panel models, for example using the sequential moment restrictions introduced by Arellano and Bond [1991]. A subsequent literature allows for sequentially exogenous network links in linear models (e.g., Kuersteiner and Prucha [2020]). Allowing for sequential exogeneity of covariates in nonlinear panel data models is still a frontier research area (see Bonhomme, Dano and Graham [2023]).
28 In a setting with workers and firms, a leading example of Equation 1 is a model of wage determination. Following the pioneering approach of Abowd, Kramarz and Margolis [1999], a linear specification for log-wages is
29 where i ∈ {1,…, N} are workers, t ∈ {1,…, T} are time periods, j(i, t) ∈ {1,…, J} denotes the firm where i is employed at t, εit is an idiosyncratic shock, and we are abstracting from exogenous covariates for simplicity. In this model, network exogeneity is often referred to as “exogenous mobility,” meaning that job transitions (represented by the firm indicators j(i, t)) are assumed independent of the εit’s conditional on the αi’s and the ψj’s.
30 In order to allow for complementarity patterns, one may be interested in a nonlinear extension of Equation 3, such as a Constant Elasticity of Substitution (CES) specification in logarithms,
31 Nonlinear log wage specifications have been proposed and estimated by Bonhomme, Lamadon and Manresa [2019] and Lentz, Piyapromdee and Robin [2022]. However, a difference between their approaches and the one we propose here is that those authors restrict the distribution of heterogeneity and how it relates to employment relationships and job transitions. That is, in both papers, π in Equation 2 is restricted, while it is fully unrestricted in the current framework. In addition, both Bonhomme, Lamadon and Manresa [2019] and Lentz, Piyapromdee and Robin [2022] rely on large firms to recover firm heterogeneity, whereas our approach will yield valid moment restrictions irrespective of the degree of sparsity of the network.
32 In models 3 and 4 one can form an NT × (N + J) matrix X, by having each row denoting an observation (or edge) (i, t), and having both sets of agents (or nodes) i and j in the columns of X. In this case, one can equivalently write Equation 3 and 4 as
33 where the vector A contains the α’s and ψ’s, and the function g takes a known (CES) form. Since we assume that the model is parametric, we specify the distribution of ε conditional on X and A, for example as a normal distribution with zero mean and diagonal covariance matrix with variance σ2. In this case, θ = σ2 in the linear model, and θ = (λ, ρ, γ, σ2) in the CES model.
34 The framework in Equations 1 and 2 also nests certain models of network formation. As an example, consider the model of link formation in Graham [2017]. Undirected binary links Yij ∈ {0, 1} between agents i and j are determined based on covariates Xij and agent-specific types Ai and Aj as
35 where the εit are standard logistic, independent of A and X, and i.i.d. across pairs of agents (i, j) ∈ {1,…, N}2, i ≠ j Graham [2017] leaves the joint distribution of X and A unrestricted, and his model is thus a special case of the framework in Equations 1 and 2.
Functional differencing: a general presentation
36 In the framework 1-2 we ask two questions. First, how can one derive moment restrictions on θ ? Second, how can one find moment restrictions on quantities depending on θ and (A, X), such as average effects? Both questions can be answered using the functional differencing approach, and we address them in turn.
Parameter θ
37 To answer the first question, the following proposition shows how to characterize moment restrictions on θ that are valid irrespective of the heterogeneity distribution.
- i. For any joint distribution of (A, X) we have
- ii. Almost surely in A, X,
- iii. For almost all a and x,
38 It is immediate that (ii) implies (i), and that (ii) and (iii) are equivalent. The implication from (i) to (ii) comes from the fact that we require the moment restriction to hold irrespective of the distribution of A and X. A formal argument is provided in Appendix I. Note also that (ii) implies moment restrictions conditional on X,
. Proposition 1 implies that, to look for moment restrictions on θ, it is necessary and sufficient to find solutions to the linear functional equation in Equation 6.39 Proposition 1 is the main insight of the functional differencing approach (Bonhomme [2012]). Indeed, finding a ϕ satisfying Equation 6 amounts to finding an element in the null space of the conditional expectation operator associated with the parametric conditional model of Y given (A, X). To this end, Bonhomme [2012] proposed numerical projection methods, while Honoré and Weidner [2020] relied on symbolic computing to find analytical ϕ functions in dynamic discrete choice settings. In some models, however, one can show that no non-trivial solution ϕ exists, which implies the absence of informative moment equality restrictions on θ. To deal with such cases, Dhaene and Weidner [2023] propose an approximate functional differencing approach. In dynamic panel data logit models, Dobronyi, Gu and Kim [2021] derive the identified set on the parameters, which combines moment equality and inequality restrictions.
40 While most applications of the functional differencing approach so far are confined to panel data settings without interactions between agents, Proposition 1 makes clear that the scope of the approach is not limited to those settings. A conventional panel data model consists of a collection of individual-specific submodels indexed by the individual heterogeneity Ai, the vector of individual covariates (Xʹi1,…, XʹiT)ʹ, and the parameter θ that is common across individuals. In contrast, in network settings all units may potentially be related, and both the network matrix X and the vector of heterogeneity A may affect all outcomes in the network. As Proposition 1 illustrates, this difference between panel data and network settings is immaterial from the point of view of the applicability of functional differencing. In the next sections we will provide examples to illustrate the usefulness of the functional differencing approach when applied to networks.
Average Effects
41 We now turn our attention to linear functionals of the distribution of A and X, and answer the second question. Using functional differencing, we show how to obtain, for given θ, moment restrictions on an average effect of the form
42 where mθ(·) is known given θ, and the expectation is taken with respect to the joint distribution π of (A, X). As an example, in the CES model of wage determination 4, one may be interested in estimating average marginal effects of worker or firm heterogeneity on log wages, while accounting for the presence of complementarity between worker and firm effects. The following proposition provides a counterpart to Proposition 1 for such target parameters.
43 The equivalence between the three parts is again easy to see (see Appendix I), and the usefulness of the result comes from the fact that Equation 7 is a linear functional equation. In this case as well, the linear operator in Equation 7 is known given θ. Proposition 2 shows that the functional differencing approach initially applied in Bonhomme [2012] to obtain restrictions on model parameters in panel data models can be applied to derive restrictions on average effects in other models with latent variables, including network settings.
44 Finding a moment representation for µ amounts to solving a linear functional system, which is a Fredholm integral equation of the first kind. Numerical and analytical methods can be used to construct ψ functions that satisfy Equation 7. In dynamic panel logit models, Aguirregabiria and Carro [2021], Dobronyi, Gu and Kim [2021] and Dano [2023] show that the computation of average marginal effects is analytically straightforward. However, in models with continuous outcomes the inverse problem in Equation 7 is generally ill-posed (Carrasco, Florens and Renault [2007]; Engl, Hanke and Neubauer [1996]). Given a solution ψ to the functional system, using it for estimation in a finite sample thus typically requires regularization. In this paper we focus on finding moment functions ϕ and ψ, and leave a detailed study of estimators and their properties to future work. See the eighth section for further discussion.
Linear network models
45 In this section and the next three we illustrate Propositions 1 and 2 through various examples. Consider first a linear model with an X matrix that consists of two parts, X = (X1, X2), where X1 is a matrix of network links and X2 is a matrix of covariates. An example is the AKM model of Abowd, Kramarz and Margolis [1999], given by an augmented version of Equation 3 that includes covariates. We specify
46 where n is the number of observations, In denotes the n × n identity matrix, and we denote θ = (βʹ, σ2)ʹ.
47 In this model we will focus on the parameters β and σ2 and on quadratic forms
for a symmetric matrix Q. Variance components, which can be written as quadratic forms, are of interest for, e.g., decomposing the variance of log wages into components reflecting worker heterogeneity, firm heterogeneity, and sorting patterns between heterogeneous workers and firms (Abowd, Kramarz and Margolis [1999]; Card, Heining and Kline [2013]; Song et al. [2019]).Parameters β and σ2
48 In this subsection we derive moment restrictions on β and σ2. For this purpose we rely on Proposition 1. We start by noting that Equation 6 can be equivalently written as
49 It is useful to introduce the Moore-Penrose pseudo-inverse x†i of x1 and to write the two orthogonal projectors associated with x1 as x1x†i = u1uʹ1 and In – x1x†i = u2uʹ2, where u = (u1, u2) is orthogonal. Let ν = y − x2β, ν1 = uʹ1ν, and ν2 = uʹ2ν.
50 We then note that Equation 6 can equivalently be written as
51 Since u = (u1, u2) is orthogonal, we equivalently obtain
52 In Proposition 1 we are looking for restrictions holding for all real vectors a. Since the rows of uʹ1x1 are linearly independent, we equivalently search for the following equation being satisfied for all real vectors b: [1]
53 This convolution equation has the unique solution:
54 We have thus shown the following.
- i. .
- ii. , where the function φθ is such that .
55 By Proposition 1, Proposition 3 characterizes all the available moment restrictions on the parameter θ = (βʹ, σ2) in model 8. Now, there are many possible choices for φθ, leading to many choices of moment functions in this model.
56 As a first example, let us take
57 We obtain
58 This implies the following conditional moment restrictions on β:
59 In a panel data setting, Chamberlain [1992] shows that the efficiency bound for β based on the quasi-differencing restrictions 9 coincides with the bound based on a semiparametric model where ε has mean zero but is otherwise not restricted. In our setup, Equation 9 remains valid in general non-Gaussian linear network regression models where
. Moreover, the Gaussian assumption on ε provides additional restrictions that are fully characterized by Proposition 3. Some of those additional restrictions arise from the variance matrix of outcomes.60 As a second example, let n2 = dimν2 = T race(In – x1x†1), and take
61 We obtain
62 This implies the following conditional moment restrictions on β and σ2:
63 Note that Equation 10 remains valid under non-Gaussianity, provided
. Restrictions akin to Equation 10 were considered in Arellano and Bonhomme [2012] in a panel data setting. In a network context, Andrews et al. [2008] derived an unconditional version of Equation 10, and applied it to the decomposition of the variance of log wages. [2]Quadratic Forms
64 In this subsection we derive moment restrictions on a quadratic form
, where mθ(a, x) = aʹQa for an m × m symmetric matrix Q, for m the dimension of A. For this purpose we rely on Proposition 2. We start by noting that Equation 7 is equivalent to65 Using the above reparameterization in terms of (v1, v2), we equivalently have
66 We then have the following result, shown in Appendix I.
- i. .
- ii. .
67 By Proposition 2, Proposition 4 characterizes all the available moment restrictions on
. [3] The proof relies on Fourier transforms. A special case of Proposition 4 is obtained when ψθ(y, x) is a function of uʹ1(y – x2β) and x only, which implies68 The trace correction in Equation 12 is a well-known formula to obtain unbiased estimators of quadratic forms. Andrews et al. [2008], and Kline, Saggio and Sølvsten [2020] in a heteroskedastic context, apply such corrections to estimate variance components in log wage variance decompositions.
69 Moreover, given the particular solution 12, any other solution is of the form
70 where, as in Proposition 3, the function φθ satisfies
Logit network models
71 In this section and the next two we study logit models for network data. Letting X = (X1, X2), we assume
72 Model 14 contains several popular binary choice models as special cases. A first example is a static panel data logit model, which obtains when the columns of X1 are individual indicators. Another example is the logistic network formation model of Graham [2017], see Equation 5, which obtains when Y are link outcomes and the elements of X1A are the individual sums A1 + A2, for i ≠ j.
73 Equation 14 also covers models of binary outcomes on a network. To illustrate, we will consider the following binary choice counterpart to the AKM model of Abowd, Kramarz and Margolis [1999]):
74 where, as in the linear case, i are workers, t are time periods, and j(i, t) denotes the firm where i is employed at t. It is easy to see that Equation 15 can be written in the form of Equation 14 for a suitable definition of X1. In this setting, the network is a bipartite multigraph: there may be multiple edges pointing from a worker i to a firm j indicating that worker i was in an employment relationship with firm j over multiple periods.
75 While, in many applications of AKM, outcomes Yit are log earnings or log wages, it may be of interest to account for worker and firm heterogeneity when studying other labor market outcomes. For example, Lachowska et al. [2023] apply AKM to the analysis of log hours worked. In this setting, the AKM logit specification 15 could be employed to analyze the determinants of part-time and full-time work using a binary measure of working time. In other applications, one may be interested in applying AKM to study determinants of worker promotions (e.g., Benson, Li and Shue [2019]) or of the type of labor contract such as fixed-term or permanent contract (e.g., Güell and Petrongolo [2007]). The logit specification 15 can also be useful in applications of AKM to other fields (including education, innovation, urban economics, trade, and empirical finance) where binary outcomes are common.
76 In the next section we will first focus on the covariates’ coefficients θ in model 14. For example, Margolis [1996] studies the earnings returns to seniority in France while accounting for worker and firm heterogeneity. A binary specification such as Equation 15 allows one to document the effects of seniority on binary labor market outcomes, such as working part-time or full-time, being awarded a promotion, or working under a permanent or temporary contract. In the seventh section we will study average effects, which are functions of worker and firm heterogeneity. We will see that deriving non-trivial moment restrictions on average effects seems more challenging than obtaining informative restrictions about the θ parameter.
Model parameters in logit network models
77 In this section we first provide a characterization of all moment restrictions available on θ in model 14, and then discuss several examples.
Characterization
78 We start by noting that, in model 14, Equation 6 can be equivalently written as
79 where we have denoted as yi the ith element of y, xʹi1 the ith row of x1, and xʹi2 the ith row of x2, for i ∈ {1,…, n}.
80 Equivalently, we have
81 that is,
82 Letting, for all
denote the set of possible values of xʹ1y, this implies that Equation 6 can be equivalently written as83 Finally, since exp(sʹa), for
, are linearly independent functions of a, we obtain the following characterization.- i. .
- ii. for all .
84 Proposition 5 provides an exhaustive characterization of available moment restrictions in the logit model 14. For a non-zero ϕ to exist, it is necessary that, for some
, yʹx2 varies given that xʹ1y = s. In this model, S = Xʹ1Y is sufficient for A. Below we will illustrate Proposition 5 using several examples: the static panel data logit model, the logistic network formation model, and the AKM logit model.Panel Data: Conditional Logit
85 Consider first the panel data model
86 Let i ∈ {1,…, N}, yi = (yi1,…,yiT)ʹ and xi = (xʹi1,…,xʹiT)ʹ. For simplicity we search for functions ϕθ(yi, xi) that only depend on (y, x) through (yi, xi). We thus look for ϕθ(yi, xi) such that
87 Consider the T = 2 case, and take s = 1. We obtain
88 which coincides with the moment function of conditional logit (Rasch [1960]; Andersen [1970]).
Network Formation: Tetrad Logit
89 Consider next the logistic network formation model 5 introduced in Graham [2017]. Links are undirected, [4] and there are n = N(N – 1)/2 observations (one for each dyad), where N is the number of agents. In this model, the sufficient statistic s = xʹ1y in Proposition 5 is the vector of degrees, i.e., the degree sequence of the network.
90 For simplicity we focus on functions of tetrads formed by four agents
. We thus look for that satisfy91 Taking first s1 = s2 = s3 = s4 = 1, we obtain
92 Considering next s1 = s2 = s3 = s4 = 2, we obtain
93 Finally, for s1 = s2 = 2, s3 = s4 = 1, we obtain
94 and there will be analogous restrictions associated with permutations of the degree sequence (2, 2, 1, 1). All other possible degree sequences have no identifying content for θ.
95 Together, taking ϕ as in Equations 18, 19 and 20 (alongside its permutations), implies the moment restrictions underpinning the “tetrad logit” estimator in Graham [2017]. However, Proposition 5 clarifies that these restrictions may not be unique, and it provides all available moment restrictions in the logistic network formation model.
Binary Choice on a Network: AKM Logit
96 In this subsection we derive moment restrictions on θ in model 15. We focus on the case where Xit does not vary within job spells. For example, when controlling for the worker’s age, job seniority only varies between spells. [5] We focus the analysis on the T = 2 case, and we consider several subnetwork configurations of the data displayed in Figure 2. Given a subnetwork configuration, we verify if moment conditions on θ exist, and what form they take.
97 Configuration A: one worker in the same firm. Suppose worker i stays in the same firm j in both periods. We look for ϕθ(yit, yi,t+1, x) such that, for s ∈ {0, 1, 2},
98 Given that x2it = x2i, t + 1 (since the covariate does not vary within spell), this implies
99 Hence θ drops out from the equation, and there is no information to estimate θ in this configuration.
Configurations of worker-firm subnetworks in the logit model with worker and firm heterogeneity
Configurations of worker-firm subnetworks in the logit model with worker and firm heterogeneity
Note: Worker nodes are in the left columns, indicated in grey. Firm nodes are in the right columns, indicated in black. In panels C and F we derive non-trivial moment restrictions on the parameter θ in model 15, while no such restrictions exist in panels A, B, D, E.100 Configuration B: one worker moving between two firms. Suppose worker i moves between firms j and j ′. We look for ϕθ(yit, yi,t+1, x) such that
101 However, in this case, (s1, s2) fully determines (yit, yi t+1). Hence, for each (s1, s2) we obtain ϕθ(yit, yi,t+1, x) = 0, which shows there is no information about θ in this configuration.
102 Configuration C: two workers moving between the same two firms. Suppose workers i and i′ both move between the same firms j and j′. We look for a function ϕθ(yit, yi,t+1, yiʹt, yiʹ,t+1, x) such that
103 It turns out that, in this subnetwork configuration, there exist non-trivial moment restrictions on θ. To see this, take s1 = s2 = s3 = s4 = 1. We obtain
104 This implies the conditional moment restriction
105 Configuration D: two workers moving between different firms. Suppose worker i moves between j and j′, and worker i′ moves between different firms j″ and j″′. We look for ϕθ(yit, yi,t+1, yiʹt, yiʹ,t+1, x) such that
106 It is easy to see there is no non-trivial ϕ function in this case. Intuitively, since workers never share a firm, it is not possible to “difference out” the firm component of heterogeneity.
107 Configuration E: two workers moving to different firms from the same firm. Suppose worker i moves between j and j′, and worker i′ moves from the same firm j to a different firm j″. We look for ϕθ(yit, yi,t+1, yiʹt, yiʹ,t+1, x) such that
108 It is easy to see there is no information about θ in this configuration.
109 Configuration F: three workers in a loop. There are many other subnetwork configurations providing information beyond configuration C. Indeed, consider three workers who move as follows: i moves between firms j and j′, i′ moves between j′ and j″, and i″ moves between j″ and j. We look for ϕθ(yit, yi,t+1, yiʹt, yiʹ,t+1, yiʺt, yiʺ,t+1 x) such that
110 Taking s1 = s2 = s3 = s4 = s5 = s6 =,1 one obtains
111 This implies the conditional moment restriction
Average effects in logit network models
112 In this section we again consider model 14, and we study average effects of the form
113 for some known function mθ. In this case, Equation 7 can be equivalently written as
114 This equation characterizes the set of available moment restrictions on μ, i.e., the set of ψ functions such that Equation 13 holds.
115 As a simple example, consider the case where T = 2 in the static panel logit model 16, with a binary covariate Xit, and consider
116 so that μ is an average partial effect. We show in Appendix III that no function ψ satisfies Equation 21. Intuitively, this comes from the fact that the distribution of A given Xi1 = Xi2 (i.e., for “stayers”) is unidentified.
117 In contrast, as we also show in Appendix III, the average partial effect of “movers,” corresponding to
118 admits a characterization as in Equation 21, whenever ψ satisfies
119 A simple example satisfying those conditions is
120 as pointed out (in a more general nonparametric model) by Chernozhukov et al. [2013]. However, Equations 22 and 23 imply additional moment restrictions. For example, one can take
121 which provides an additional moment restriction on μ under the logit model’s assumptions.
122 It appears difficult to obtain moment equality restrictions on average partial effects in logit models on networks outside of the panel data case. As an example, consider the subnetwork configuration C in Figure 2. In this case we have seen in the previous section how to obtain moment restrictions on θ. However, we show in Appendix III that no function ψ satisfies Equation 21 for the average partial effect corresponding to
123 where, in this model, a1 is worker i’s fixed effect and a3 is firm j’s fixed effect.
124 In models where no functional differencing restrictions are available, one may still be able to construct bounds on the average effect of interest. In panel data settings, this strategy was pursued by Chernozhukov et al. [2013], Davezies, D’Haultfoeuille and Laage [2021], and Dobronyi, Gu and Kim [2021], among others. However, implementing bounds approaches often requires estimating conditional moments given X. When X represents a network matrix, conditional moment estimation may be especially challenging. In a panel data setting, Pakel and Weidner [2021] propose a bounding strategy that avoids the curse of dimensionality associated with conditioning covariates. Extending their approach to network settings is an interesting question for future work.
125 Lastly, in this section we have focused on binary choice models. The situation may be more favorable, in the sense of there existing informative functions ϕ and, in models with continuous outcomes such as the CES specification 4.
Remarks on estimation
126 To close our discussion, we briefly outline some possibilities for estimation of parameters and average effects, without providing details.
127 Given a moment function θ as in Proposition 1, and a realization (y, x) from the joint distribution of (Y, X), one can estimate θ based on
128 for some norm ‖·‖. In some models, this approach will deliver familiar estimators. For example, in the linear model 8, an estimator of β based on Equation 9 is the “quasi-differencing” estimator
129 and an estimator of σ2 based on Equation 10 is the “degree-of-freedom-corrected” estimator
130 When constructing a function ϕ using the entire data is impractical, one can construct a set of functions ϕ(k)θ(y, x) that depend on y and x only through a subset of the data. An estimator of θ is then
131 In the logit network formation model 5, taking ϕ as in Equations 18, 19 and 20 (alongside its permutations), leads to estimators in the spirit of the tetrad logit estimator of Graham [2017].
132 When focusing on average effects, a possible estimation approach based on Proposition 2 consists in setting
133 for some estimator
. For example, in the linear model 8, an estimator of the quadratic form based on Equation 12 is134 where
and are given by Equations 24 and 25, respectively. This corresponds to the bias-corrected estimator of Andrews et al. [2008]. For other average effects, regularization is typically needed for reliable estimation.135 For all these estimators, there are important questions that remain to be addressed. What are their asymptotic properties (under suitable assumptions on how the network grows with the sample size)? How to conduct feasible inference on the population parameters? And, out of the available functions ϕ and ψ, how to choose a small subset of those (for tractability) without sacrificing too much precision (for efficiency)? Answering these questions will be an important task for future work.
Appendices
I. Proofs
136 Proofs of Propositions 1 and 2
137 Propositions 1 and 2 follow directly from the following elementary lemma whose proof we include for completeness.
138 Proof. Let k ∈ {1,…, p} and, for all
, let gk(z) denote the k-th element of g(z). For all , let , ℓ+(z) = max(ℓ(z),0), and ℓ–(z) = –min(ℓ(z),0). For all , we have139 where we have used that
and are non-negative and integrate to one (with the convention 0/0 = 0 whenever or . Since gk is bounded, it follows that140 so gk = 0 almost everywhere on
. Lastly, since this holds for all k ∈ {1,…, p}, it follows that g = 0 almost everywhere on .141 Proof of Proposition 1. It is sufficient to show that (i) implies (ii). Suppose that (i) holds. Let Z = (A, X), and
. It follows from (i) and Lemma 1 that g = 0 almost everywhere on . This shows (ii) and completes the proof.142 Proof of Proposition 2. Let Z = (A, X), and
. It follows from (i) and Lemma 1 that g = 0 almost everywhere on . This shows (ii) and completes the proof.143 Proof of Proposition 4
144 Let n1 = n – n2. Then u1 is an n × n1 matrix, and v1 is an n1 × 1 vector.
145 Let
146 It follows from Equation 11 that Equation 7 is equivalent to
147 As a result, Equation 7 is equivalent to
148 where b = uʹ1x1a, and we have used that x1 has full column rank.
149 Let
denote the Fourier transform operator. For any integrable function we have, for all ,150 where i is a complex root of –1, and the integral is over
.151 We have
152 Let C = ((uʹ1x1)†)ʹQ(uʹ1x1)†. We have, for δ(·), the Dirac delta function,
153 Since this holds for all s, the Fourier inversion theorem gives
154 Now, by integration by parts, we have
155 Hence,
156 and
157 This concludes the proof of Proposition 4.
II. Other average effects in the linear model
158 Following the arguments in the proof of Proposition 4, we have
159 Hence, provided the Fourier inversion theorem can be applied, we have
160 and thus
161 As a special case, suppose ψθ(y, x) is a function of uʹ1(y – x2β) and x only. Then
III. Average effects in logit models
162 Let
163 In the panel data case with T = 2, Equation 21 can equivalently be written as
164 That is,
165 Let Z = exp(a). The coefficient of Z0 on the left-hand side of Equation 26 is equal to zero, and the coefficient on the right-hand side is ψθ(0, 0, xi). It thus follows that ψθ(0, 0, xi) = 0.
166 The coefficient of Z4 on the left-hand side of Equation 26 is equal to zero, and the coefficient on the right-hand side is (exp(xi1 + xi2 + 1)θ)ψθ (1, 1, xi). It thus follows that ψθ (1, 1, xi) = 0.
167 Hence,
168 So the existence of valid moment functions requires that the ratio
169 does not depend on a. This is not possible if xi1 = xi2. However, if xi1 ≠ xi2, then Equation 27 simplifies to
170 which is indeed identical to Equation 23.
171 Next, consider Configuration C in Figure 2, and let
172 By Equation 21, we look for ψ such that
173 That is,
174 Let Zk = exp(ak) for k = {1,…, 4}. The left-hand side and right-hand side in Equation 28 are polynomials in (Z1,…, Z4). Since they are equal to each other for all a1,…, a4 in ℝ, hence for all Z1,…, Z4 in ℝ>0, they are equal to each other for all Z1,…, Z4 in ℝ, as well. Suppose next that θ ≠ 0. The right-hand side in Equation 28 is a multiple of (1 + Z1Z3) and (1 + exp(θ)Z1Z3). However, the left-hand side in Equation 28 is either a multiple of (1 + Z1Z3) or a multiple of (1 + exp(θ)Z1Z3) (depending on the value of x12) but it is not a multiple of both terms. It follows that, when θ = 0 there is no ψ satisfying Equation 28. A direct comparison of the monomial terms of the polynomials on both sides of Equation 28 confirms this argument.
Bibliographie
References
- Abowd, J. M., Creecy, R. H. and Kramarz, F. [2002]. “Computing Person and Firm Effects Using Linked Longitudinal Employer-Employee Data,” Longitudinal Employer-Household Dynamics Technical Paper, 2002-06, Center for Economic Studies, US Census Bureau.
- Abowd, J. M., Kramarz, F. and Margolis, D. N. [1999]. “High Wage Workers and High Wage Firms,” Econometrica, 67 (2): 251–333.
- Aguirregabiria, V. and Carro, J. M. [2021]. “Identification of Average Marginal Effects in Fixed Effects Dynamic Discrete Choice Models,” arXiv preprint, arXiv:2107.06141.
- Ahmadpoor, M. and Jones, B. F. [2019]. “Decoding Team and Individual Impact in Science and Invention,” PNAS, 116 (28): 13885–13890.
- Andersen, E. B. [1970]. “Asymptotic Properties of Conditional Maximum-Likelihood Estimators,” Journal of the Royal Statistical Society: Series B (Methodological), 32 (2): 283–301.
- Andrews, M. J., Gill, L., Schank, T. and Upward, R. [2008]. “High Wage Workers and Low Wage Firms: Negative Assortative Matching or Limited Mobility Bias?” Journal of the Royal Statistical Society: Series A (Statistics in Society), 171 (3): 673–697.
- Arellano, M. and Bond, S. [1991]. “Some Tests of Specification for Panel Data: Monte Carlo Evidence and an Application to Employment Equations,” The Review of Economic Studies, 58 (2): 277–297.
- Arellano, M. and Bonhomme, S. [2012]. “Identifying Distributional Characteristics in Random Coefficients Panel Data Models,” The Review of Economic Studies, 79 (3): 987–1020.
- Becker, G. S. [1973]. “A Theory of Marriage: Part I,” Journal of Political Economy, 81 (4): 813–846.
- Benson, A., Li, D. and Shue, K. [2019]. “Promotions and the Peter Principle,” The Quarterly Journal of Economics, 134 (4): 2085–2134.
- Bickel, P. J. and Chen, A. [2009]. “A Nonparametric View of Network Models and Newman–Girvan and Other Modularities,” PNAS, 106 (50): 21068–21073.
- Bonhomme, S. [2012]. “Functional Differencing,” Econometrica, 80 (4): 1337–1385.
- Bonhomme, S. [2020]. “Econometric Analysis of Bipartite Networks.” In Graham, B. and de Paula, Á. (eds). The Econometric Analysis of Network Data. Amsterdam: Elsevier, p. 83–121.
- Bonhomme, S. [2021]. “Teams: Heterogeneity, Sorting, and Complementarity,” University of Chicago, Becker Friedman Institute for Economics Working Paper, 2021-15.
- Bonhomme, S., Dano, K. and Graham, B. S. [2023]. “Identification in a Binary Choice Panel Data Model with a Predetermined Covariate,” NBER Working Paper, 31027.
- Bonhomme, S., Holzheu, K., Lamadon, T., Manresa, E., Mogstad, M. and Setzler, B. [2023]. “How Much Should We Trust Estimates of Firm Effects and Worker Sorting?” Journal of Labor Economics, 41 (2): 291–322.
- Bonhomme, S., Lamadon, T. and Manresa, E. [2019]. “A Distributional Framework for Matched Employer Employee Data,” Econometrica, 87 (3): 699–739.
- Card, D., Heining, J. and Kline, P. [2013]. “Workplace Heterogeneity and the Rise of West German Wage Inequality,” The Quarterly Journal of Economics, 128 (3): 967–1015.
- Carrasco, M., Florens, J.-P. and Renault, E. [2007]. “Linear Inverse Problems in Structural Econometrics Estimation Based on Spectral Decomposition and Regularization.” In Heckman, J. and Leamer, E. E. (eds). Handbook of Econometrics. Amsterdam: North-Holland, vol. 6B, p. 5633–5751.
- Chamberlain, G. [1992]. “Efficiency Bounds for Semiparametric Regression,” Econometrica, 60 (3): 567–596.
- Charbonneau, K. B. [2017]. “Multiple Fixed Effects in Binary Response Panel Data Models,” The Econometrics Journal, 20 (3): S1–S13.
- Chernozhukov, V., Fernández-Val, I., Hahn, J. and Newey, W. [2013]. “Average and Quantile Effects in Nonseparable Panel Models,” Econometrica, 81 (2): 535–580.
- Dano, K. [2023]. “Transition Probabilities and Identifying Moments in Dynamic Fixed Effects Logit Models,” arXiv preprint, arXiv:2303.00083.
- Davezies, L., D’Haultfoeuille, X. and Laage, L. [2021]. “Identification and Estimation of Average Marginal Effects in Fixed Effects Logit Models,” arXiv preprint, arXiv:2105.00879.
- Davezies, L., D’Haultfoeuille, X. and Mugnier, M. [2020]. “Fixed Effects Binary Choice Models with Three or More Periods,” arXiv preprint, arXiv:2009.08108.
- De Paula, A., Richards-Shubik, S. and Tamer, E. [2018]. “Identifying Preferences in Networks with Bounded Degree,” Econometrica, 86 (1): 263–288.
- Dhaene, G., and Jochmans, K. [2015]. “Split-Panel Jackknife Estimation of Fixed-Effect Models,” The Review of Economic Studies, 82 (3): 991–1030.
- Dhaene, G., and Weidner, M. [2023]. “Approximate Functional Differencing,” arXiv preprint, arXiv:2301.13736.
- Dobronyi, C., Gu, J. and Kim, K. I. [2021] “Identification of Dynamic Panel Logit Models with Fixed Effects,” arXiv preprint, arXiv:2104.04590.
- Engl, H. W., Hanke, M. and Neubauer, A. [1996]. Regularization of Inverse Problems. Dordrecht: Kluwer Academic Publishers.
- Fernández-Val, I. and Weidner, M. [2016]. “Individual and Time Effects in Nonlinear Panel Models with Large N, T,” Journal of Econometrics, 192 (1): 291–312.
- Graham, B. S. [2017]. “An Econometric Model of Network Formation with Degree Heterogeneity,” Econometrica, 85 (4): 1033–1063.
- Graham, B. S. [2020]. “Sparse Network Asymptotics for Logistic Regression,” NBER Working Paper, 27962.
- Gualdani, C. [2021]. “An Econometric Model of Network Formation with an Application to Board Interlocks Between Firms,” Journal of Econometrics, 224 (2): 345–370.
- Güell, M. and Petrongolo, B. [2007]. “How Binding Are Legal Limits? Transitions from Temporary to Permanent Work in Spain,” Labour Economics, 14 (2): 153–183.
- Hahn, J. and Newey, W. [2004]. “Jackknife and Analytical Bias Reduction for Nonlinear Panel Models,” Econometrica, 72 (4): 1295–1319.
- Honoré, B. E., Muris, C. and Weidner, M. [2021]. “Dynamic Ordered Panel Logit Models,” arXiv preprint, arXiv:2107.03253.
- Honoré, B. E. and Weidner, M. [2020]. “Moment Conditions for Dynamic Panel Logit Models with Fixed Effects,” arXiv preprint, arXiv:2005.05942.
- Hughes, D. W. [2022]. “Estimating Nonlinear Network Data Models with Fixed Effects,” arXiv preprint, arXiv:2203.15603.
- Kline, P., Saggio, R. and Sølvsten, M. [2020]. “Leave-Out Estimation of Variance Components,” Econometrica, 88 (5): 1859–1898.
- Kuersteiner, G. M. and Prucha, I. R. [2020]. “Dynamic Spatial Panel Models: Networks, Common Shocks, and Sequential Exogeneity,” Econometrica, 88 (5): 2109–2146.
- Lachowska, M., Mas, A., Saggio, R. and Woodbury, S. A. [2023]. “Work Hours Mismatch,” NBER Working Paper, 31205.
- Lentz, R., Piyapromdee, S. and Robin, J.-M. [2022]. “The Anatomy of Sorting Evidence from Danish Data,” working paper, hal-03869383.
- Margolis, D. N. [1996]. “Cohort Effects and Returns to Seniority in France,” Annales d’Économie et de Statistique, 41/42: 443–464.
- Pakel, C. and Weidner, M. [2021]. “Bounds on Average Effects in Discrete Choice Panel Data Models,” arXiv preprint, arXiv:2309.09299.
- Postel-Vinay, F. and Robin, J.-M. [2002]. “Equilibrium Wage Dispersion with Worker and Employer Heterogeneity,” Econometrica, 70 (6): 2295–2350.
- Rasch, G. [1960]. Studies in Mathematical Psychology: I. Probabilistic Models for Some Intelligence and Attainment Tests. Copenhagen: Nielsen & Lydiche.
- Sheng, S. [2020]. “A Structural Econometric Analysis of Network Formation Games through Subnetworks,” Econometrica, 88 (5): 1829–1858.
- Shimer, R. and Smith, L. [2000]. “Assortative Matching and Search,” Econometrica, 68 (2): 343–369.
- Song, J., Price, D. J., Guvenen, F., Bloom, N. and Von Wachter, T. [2019]. “Firming Up Inequality,” The Quarterly Journal of Economics, 134 (1): 1–50.
- Woodcock, S. D. [2008]. “Wage Differentials in the Presence of Unobserved Worker, Firm, and Match Heterogeneity,” Labour Economics, 15 (4): 771–793.
Mots-clés éditeurs : modèles économétriques de réseaux, hétérogénéité, différences fonctionnelles, appariement
Mise en ligne 15/03/2024
https://doi.org/10.3917/reco.751.0147Notes
-
[1]
b has the same dimension as v1.
-
[2]
The assumption that the elements of ε be mutually independent may be empirically restrictive. In applications of AKM, it is common to only rely on between-job-spell variation in log wages in estimation, in order not to restrict the within-job-spell correlation in ε (Kline, Saggio and Sølvsten [2020]; Bonhomme et al. [2023]).
-
[3]
When X1 is a network matrix, ensuring the assumption that X1 has full column rank often requires to restrict the sample to a connected subnetwork. See Abowd, Kramarz and Margolis [1999] and Abowd, Creecy and Kramarz [2002] for methods to compute connected subnetworks in settings with workers and firms.
-
[4]
Logit models with directed links (Charbonneau [2017]) have a similar structure.
-
[5]
If Xit does vary within spells, then the conditional logit estimator can be used for consistent estimation of θ.