Journal article

Measuring International Migration through Sample Surveys: Some Lessons from the Spanish Case

Pages 435 to 463

Cite this article


  • Martí, M.
  • and Ródenas, C.
(2012). Measuring International Migration Through Sample Surveys: Some Lessons From the Spanish Case. Population, . 67(3), 435-463. https://doi.org/10.3917/popu.1203.0517.

  • Martí, Mónica.
  • et al.
« Measuring International Migration through Sample Surveys: Some Lessons from the Spanish Case ». Population, 2012/3 Vol. 67, 2012. p.435-463. CAIRN.INFO, shs.cairn.info/journal-population-2012-3-page-435?lang=en.

  • MARTÍ, Mónica
  • and RÓDENAS, Carmen,
2012. Measuring International Migration through Sample Surveys: Some Lessons from the Spanish Case. Population, 2012/3 Vol. 67, p.435-463. DOI : 10.3917/popu.1203.0517. URL : https://shs.cairn.info/journal-population-2012-3-page-435?lang=en.

https://doi.org/10.3917/popu.1203.0517


Notes

  • [*]
    University of Alicante, Spain.
    Correspondence: Mónica Martí, Applied Economics Department, University Institute for Peace and Social Development, University of Alicante, P.O. Box 99, E-03080 Alicante, Spain, tel.: +34 965 90 97 09, e-mail: mmarti@ua.es
  • [1]
    See Martí and Ródenas (2004 and 2007), and also Ródenas and Martí (1997).
  • [2]
    The nationality (Spanish or not) was included as a new auxiliary variable for calibration (Eurostat 2010a).
  • [3]
    The differences for sex, age, place of residence and country of birth are very small (Appendix 3). Immigrants are not distributed in the same way by country of birth in the LFS and the MPR, but the differences are not very large. As explained in the following section, this is very probably because the nationality used as an auxiliary variable in the LFS post-stratification only distinguishes between national or non-national.
  • [4]
    The former is a live register referring to September 2006, a date very close to the ENI-2007 data collection period (November 2006-February 2007) and the second is revised quarterly using the Electoral Census and MPR.
  • [5]
    See INE (2009b) for the methodology of the LFS.
  • [6]
    In theory, new occupants of a dwelling that was included in the sample before their arrival may be interviewed as newcomers, and these persons may be new immigrants. This would reduce the rotation wave effect. It is unlikely that this was done in Spain, however.
  • [7]
    The expansion factor is the inverse of the selection probability. As it is a simple random sampling, the selection probability is the same for each frame element.
  • [8]
    See Appendix 5 for details.
  • [9]
    However, if analyses are carried out controlling for duration of stay, and the recently-arrived immigrants surveyed are representative of their category, we can gain a good idea of the situation of those recently arrived. However, when we estimate the total population, the situation of the recently-arrived immigrants will have less weight than it should; in other words, the “snapshot” of the situation of the total immigrant population will still be distorted.
  • [10]
    See Steinbuka (2009), Radermacher and Thorogood (2009), or Eurostat (2010b).
  • [11]
    We assume that F will be distributed between the three groups. However, the bidimensional analysis (if there is only one fij >?0 instead of two or three) simplifies the procedure to obtain conditions for meeting the inequality (2). That is why we only show the three extreme cases.

1Many countries are concerned by the difficulties of measuring international immigration flows and recording immigrants’ socioeconomic characteristics. Confronted by this problem over the last two decades, Spain has developed a comprehensive statistical system for recording immigrant stocks and flows which includes high-quality population registers and specific surveys, notably those conducted in 2005 and 2007. In this article, Mónica Martí and Carmen Ródenas reveal the disparities between immigrant flows estimated via these three data sources, before focusing on possible sources of estimation bias in the surveys (coverage, non-response, sampling). For immigrant distributions by age, sex, country of birth or place of residence, the disparities are small, but for annual flows by year of arrival they are much larger. The authors demonstrate the need to use specific sampling designs to study immigration, taking account, wherever possible, of the immigrants’ year of arrival.

2The substantial recent wave of immigration to Spain has put the capacity of the country’s statistical sources to the test in terms of correctly capturing the intensity of this phenomenon. Previous studies [1] have analysed the disparate information regarding migration flows provided by the Spanish Labour Force Survey (LFS) and the Residence Variation Statistics (Estadística de Variaciones Residenciales, EVR), both conducted by the Spanish National Statistics Institute (Instituto Nacional de Estadística, INE). These studies show the superior quality of the administrative source, the EVR, which is also the only one in Europe that registers all immigrants, irrespective of their legal situation. But the recent publication of the National Immigrant Survey for 2007 (Encuesta Nacional de Inmigrantes, ENI-2007), one of the first targeted migrant surveys carried out in Europe, and the introduction in 2005 of methodological changes in the Spanish LFS to better reflect the specific weight of foreign immigrants, [2] have created the need for a new comparison of the migratory information provided by the Spanish statistical system.

3Based on this comparison and on an analysis of survey methodologies used to study characteristics of the immigrant population, this study reveals that the standard designs of social surveys are insufficient to obtain good estimates of international migration flows. In addition, it proposes changes that should be introduced to ensure that the efforts made to survey a difficult population group like that of immigrants, are not expended in vain. The objective is, therefore, to improve the estimation of this phenomenon through sampling techniques, given that comprehensive information regarding the multiple dimensions of migration may only be obtained through surveys. The analysis is not limited to the Spanish case. It can also provide some conclusions about the design of future national surveys targeting migrants in other European countries.

4The study is structured as follows. After a brief description of the statistical sources in Section I, Section II will compare the basic characteristics of the immigrant population. The third section provides methodological reasons to explain the significant discrepancies that were detected with respect to the year of arrival. More specifically, the origin of a hypothetical bias in the estimate of the recently arrived immigrants in the LFS and the ENI-2007 will be studied. The main conclusions and recommendations will be presented in the final section of the paper.

I – Recent immigration to Spain

1 – Statistical sources

5The sample surveys offering migration information in Spain are the LFS and the ENI-2007. The LFS is a well-known quarterly sample survey designed to estimate the labour market characteristics of the population aged 16 and over residing in principal family dwellings. Although it was never intended to measure mobility, the questionnaire now incorporates questions which can be used to estimate the stock of foreign immigrants and of domestic and international flows. The survey includes the variable of nationality and country of birth, so the stock of the non-national population or the population born abroad may be estimated. With regard to flows, the survey asks all foreigners to give their year of arrival in Spain and asks all respondents to indicate the country of residence one year before the survey.

6The ENI-2007 (Appendix 1) supplies information on the demographic and social characteristics of the foreign-born population aged 16 and over residing in family dwellings in Spain on 1 January 2007. The ENI-2007 is characterized by its wide geographical and population scope and its extensive subject matter, making it possible to study the dimensions of immigration across different groups throughout the whole of Spain.

7The registers that provide migration information in Spain are the Municipal Population Registers (MPR) and the residence variation statistics (EVR) derived from them. While the MPRs are the administrative registers of inhabitants in the municipality (usual domicile) and their data constitute proof of residence therein, the EVR is compiled from registrations and de-registrations due to changes in the place of residence (municipality) of people included in the MPRs. It is compulsory to register changes of residence in Spain, but there is no guarantee of a complete coverage of flows since compliance depends on the incentives/deterrents for registering movements. To register, the only documents required are proof of identity (identity card, driving licence, passport, etc.) and some kind of proof of residence at the address (title deeds, rental agreement, utility bills, or a letter from the first adult already registered at the address). A local residency certificate is required to access basic services such as public education or health care, to vote in elections, to renew identity cards, to obtain grants, public employment, parking permits or home purchasing grants.

8All of the above-mentioned sources are generated by the INE and, with the exception of the ENI-2007, are periodic or are continually updated. They all include the entire immigrant population, irrespective of their legal situation in Spain, and individuals can be selected by their place of birth. The demographic characteristics of the immigrants included in the surveys can thus be compared with those of the registers (Appendix 2). With the date of ENI-2007 serving as a reference point, the equivalent period is the fourth quarter of 2006 (4Q-2006) for the LFS, 1 January 2007 for the MPR and, for annual immigration flows, the net registrations recorded each year by the EVR.

2 – Do the surveys give the same results as the registers?

9While the differences between the ENI-2007, the LFS and the MPR are acceptable in terms of the basic demographic characteristics of the stock of immigrants, [3] when the time dimension is incorporated and the migration flows are compared, serious discrepancies appear. Figure 1 shows the number of immigrant arrivals provided by the ENI-2007, the LFS 4Q-2006 and the EVR between 1988 and 2010. Although their levels are different, the series reveal similar trends until 2001. After that year, while the ENI-2007 and the LFS show a decreasing trend, the EVR basically continues to rise until 2007. This disparity in trends is surprising, particularly in the last few years when the sources should become more similar. In fact, the differences actually increase between 2004 and 2006.

Figure 1

Figure 1

Figure 1

Year of arrival in Spain according to ENI-2007, LFS 4Q-2006 and EVR (1988-2010)
Source: INE (ENI-2007, LFS, EVR and Spanish National Accounts) and authors’ own calculations.

10A perfunctory analysis might conclude that the two surveys better reflect reality as their time profiles are highly similar. But it is unlikely that the entries, as estimated by the ENI and the LFS, slowed down after 2001 at a time when the Spanish economy was expanding, unemployment was decreasing and there were no substantial economic improvements in countries of origin. The EVR, on the other hand, shows a trend that is in line with the economic cycle, increasing until 2007 and falling sharply after 2008, coinciding with the change in the economic situation.

11With regard to this divergence, it is true that the surveys interviewed the “surviving” immigrants at a specific moment in time, while the EVR records the moment of registration. Only in the case where a significant share of the immigrants who arrived between 2002 and 2006 were no longer residing in Spain at the end of 2006 would it have been possible for them to be included in the registers but excluded from the ENI-2007 or LFS surveys because no longer present in the country. But once again, mass exits of this kind would have been unlikely in a period when the Spanish economy was experiencing an economic boom.

12In addition, statistical sources based on registers have some drawbacks associated with their administrative nature, as residential variations do not always correspond to a real change of municipality, but may be due to the individual advantages derived from residing in a particular place. However, Ródenas and Martí (2009) show that these types of registration are rare, and are likely to be much lower among recently arrived immigrants because they are unaware of these advantages. Another drawback of register-based data is the possibility of a delay between an immigrant’s arrival and his/her registration. However, it should be short for recently arrived immigrants as registration in the MPRs provides access to basic public services.

13The above factors should not significantly affect the general trend of the EVR, but this source could have been distorted by the exceptional immigrant regularization programmes in Spain in recent years and by changes in the legal register coverage. It is thus likely that the rise in 2000-2001 and 2004-2005 can be explained by the fact that registration in the MPR has become proof of residence in Spain accepted by the successive immigrant regularization procedures. This must certainly have represented a new and strong incentive for registration.

14Despite these shortcomings, we consider that the EVR records arrivals more accurately than the surveys. This is also confirmed by its consistency with additional external sources: firstly, with the official population estimates published by the INE, and secondly, with the residence visas issued annually by the Spanish Diplomatic Missions or Consular Offices. The INE (2009a) has been carrying out a Population Now-Cast (ePOBa) since 2002. These official estimates are based on the MPR, although the INE has implemented some procedures to estimate the number of unnotified departures of foreigners. This increases the international emigration flows registered in the EVR. By selecting only the population aged 16 and over residing in Spain from these estimates, it can be confirmed (Figure 2) that between 2002 and 2007 the population grew by 600,000 each year on average; that this growth was sustained and certainly did not decrease, reaching a maximum precisely in 2007. The time profile of growth of the ePOBa only begins to fall after 2008.

Figure 2

Figure 2

Figure 2

Year of arrival in Spain (ENI-2007, LFS 4Q-2006, EVR and ePOBa) and residence visas issued annually (2002-2010)
Source: INE (ENI-2007, LFS, EVR, ePOBa and residence visas) and authors’ own calculations.

15While the ePOBa series shows the inter-annual increases of the population aged 16 and over on a national level, the rises can only be attributed to the net flows of immigration from abroad, never to births via the incorporation of newborns. For the purpose of the comparison, we do not know whether these immigrants are Spanish or foreign, or whether they were born in Spain or not, as these characteristics are not estimated in the ePOBa. However the ePOBa series reveals an annual inflow almost identical to the number of immigrants born abroad recorded in the EVR.

16Analysis of the residence visas series leads to a similar observation (Figure 2). While surveys suggest that the entry of immigrants has slowed since 2001, the number of visas issued continued rising until 2008. This implies an increase in intentions to reside in Spain until that year. We think that the rise in the number of visas of non-nationals is consistent with the increase in immigration flows indicated by the EVR until the onset of the economic crisis.

17In summary, the stock of immigrants estimated by the different statistical sources does not reveal major disparities in terms of distribution by sex, age, place of residence and place of birth. However, when compared with the arrival flows into Spain, some major discrepancies come to light. The discussion of these differences and their analysis in the light of additional information from external sources, such as the visas issued and the ePOBa, suggests that the problem could reside in the surveys. Their design may be ill-adapted for correct estimation of recently arrived immigrants, in which case not only are the values of the variable “year of arrival” estimated incorrectly, but the problem is actually much more serious. The bias may extend to all variables that are liable to change with the length of time spent at destination. There is no doubt that recently arrived immigrants will be less integrated in society and that their employment situation will be more precarious than those who have been residing in the country for longer. And to the extent that they are not adequately represented in the sample, if this is not duly corrected in the post-survey adjustments, the analysis of the degree of social integration, the legal situation, the family regrouping processes, the transfer of funds, the employment situation or living conditions could reflect a more optimistic vision than is actually the case.

18However, this problem barely affects the basic demographic characteristics. Not only because they do not change over time, but also because – as opposed to the variable “year of arrival” – these variables were taken into account in the sample design and in the adjustments made after both surveys. In fact, when a more significant difference in distributions is observed between the LFS and the MPR, such as for the country of birth, we find that, in the LFS, only two categories are considered for the nationality variable (holding Spanish nationality or not) and there is no breakdown by nationality of the persons residing in Spain, as is the case in the ENI-2007.

II – Year of arrival: a possible explanation of estimation bias in the sample surveys

19As we already know, the use of sample techniques to estimate any population characteristic is subject to error, although error does not compromise the quality and reliability of the estimate if it remains minimal. However, from 2001, the differences in immigrant inflows are large. Our hypothesis is that these estimates may suffer from a significant bias which was not corrected through post-survey adjustments, so their value is systematically and increasingly under-estimated. This bias may limit the quality and precision of the surveys, as many of the dimensions of the migration phenomenon are incorrectly estimated.

20In order to confirm this hypothesis we will examine those sources of error that may lead to a bias in the estimate of immigrant inflows, and not those that modify the variability of the estimates, as the error is systematic and not random. Given the diversity of causes of bias, we will first analyse those that are unrelated to sampling and then those that may be derived from the sampling technique (Groves et al., 2009; Levy and Lemeshow, 2008). The section closes with an explanation of why the flow estimates might be less accurate when a variable other than the year of arrival is used in the post-stratification.

1 – Non-sampling errors

21Social desirability bias is included among the non-sampling errors. For the case in hand, given that the immigrant’s legal situation is usually correlated to the year of arrival, it is possible that some of the immigrants in the ENI or the LFS samples who have recently arrived in Spain declare a year of arrival that is far removed from the truth, to avoid arousing suspicions that their situation is illegal. This is difficult to prove, but when we analyse the “documentation situation” variable in the ENI, we can show that the number of illegal immigrants rises as the interval between their arrival and the interview becomes shorter. We suspect that this bias is not very large, however.

Coverage bias

22Incomplete coverage of the sampling frame may lead to a bias in the estimate. This occurs when the sampling frame does not cover all of the target population, and when the value of the characteristic being investigated is different among the non-covered population from that of the population within the frame. This source of bias should not be disregarded in the case of the ENI-2007 and the LFS. Although both sampling frames – corresponding to the MPR and to the Population and Housing Census, respectively – seem to cover the overall target population and can be considered sufficiently up to date, [4] in both sampling frames a minimum time period is required (higher for the LFS) between the immigrant’s arrival and his/her registration in the frame. Recent immigrants are thus less likely to be included in the frame.

23Furthermore, a careful review of the methodology [5] of both surveys reveals that the frames do not perfectly cover the target population, as collective households are not included in the sample. A failure to consider these households during the sampling process may not alter the majority of the characteristics being researched, but could generate some bias in the “year of arrival” variable because some collective households – shelters, hostels or other similar establishments – are the first places of residence of many recently arrived immigrants (OECD, 2009). If they are omitted, there will be a degree of underestimation of the recently arrived immigrants which could be easily reduced by including these households in the sampling frame.

24This is not a problem in some European countries like Denmark, Germany, Estonia, Finland, Sweden, Iceland and Norway, as the population in institutional households is surveyed. However, in the other countries which provide Eurostat with LFS microdata for publication (EU-27, candidate and EFTA countries), 26 out of 33 national surveys (Appendix 4) only cover private dwellings, so they could also be affected by this coverage bias.

Non-response bias

25Non-response is the source of bias that most concerns researchers as it is difficult to control. Basically, non-response exists for three reasons: difficulties in locating the dwelling or human group; negative response or refusal to collaborate with the survey or to respond to certain questions, and lastly, an incapacity to answer (no knowledge of the language, inability to remember particular events in the past, etc.). The non-response of surveyed households generates bias when the value of the statistic estimated for the sample that is effectively surveyed (households with individuals who respond) differs from that based on the complete sample.

26In practice, in surveys carried out with individuals it is almost impossible to obtain a response from 100% of the sample and for 100% of the items in the questionnaire. Frequently, some of the people selected are unable to collaborate, are reluctant or even refuse to be interviewed or only partially collaborate and do not answer some items. As we can see in Table 1, the non-response rate in the LFS is approximately 18.5%, while in the ENI-2007 it reaches almost 28% of total eligible households.

Table 1

Response rate and non-response rate for ENI-2007 and LFS (first wave). Percentages of final sample(*)

Table 1
Response rate Non-response rate Total Total Refusals Non-contacts LFS (annual averages) 2006 80.40 19.60* 7.97 11.63 2007 81.15 18.85* 7.68 11.17 2008 82.99 17.01* 6.87 10.14 ENI-2007 72.10 27.90* na na

Response rate and non-response rate for ENI-2007 and LFS (first wave). Percentages of final sample(*)

(*) Includes non-contacts, refusals and inability to respond.
na: not available.
Source: INE (ENI-2007 and LFS).

27A higher non-response rate in the ENI-2007 than in the LFS is to be expected, as procedures are more complex and communication between interviewers and target population in the former are more difficult. Language problems are particularly prevalent and it is known that household surveys in which any member may act as the informer (LFS) usually obtain higher response rates than in those which randomly select one member (ENI-2007) (Groves et al., 2009). Fortunately, addressing non-response by substituting one household with another prevents the sample from shrinking to an unacceptably small size, which would increase the variability of the estimates. However, if this substitution is carried out with a tendency to incorporate households where the respondent arrived longer ago rather than more recently, a bias will be generated in terms of the year of arrival.

28With the information available it is not possible to estimate the covariance between the probability of being surveyed and the year of arrival. However, there is a reasonable probability that this covariance will be high, to the extent that recently arrived immigrants will be immersed in the first phases of the migration process (seeking employment, applying for papers or working long hours), making it more difficult to contact them as they do not have established and predictable timetables and routines. Furthermore it is more than likely that many of them will decide not to respond to surveys out of mistrust – especially those who are not legally residing in the country – or because they do not know the language. Later, as they adapt to their new environment, establish a routine and begin to master the language, immigrants’ mistrust towards the interviewer will diminish and they will be more inclined to respond, but they will have arrived longer ago. Taking this reasoning into account, and in view of the non-response levels, the risk of this type of bias is high in the LFS, but particularly so in ENI-2007.

2 – Sampling bias

29If an inappropriate sampling design is used, such that the selection probability of certain units of the frame does not correspond to their weight in the target population, this could represent another source of bias if the value of the characteristic being researched among these units differs from the value for the rest of the population. In the surveys that we are analysing, the sampling designs are quite complex in order to obtain quality information about highly sensitive variables at the lowest possible cost. In the case of the ENI-2007, the sampling design (Appendix 1) does not give reason to believe that recently arrived individuals are less likely to be included in the sample than the others.

30However, we reach a different conclusion when we analyse the design of the Spanish LFS. In this case, the INE uses a two-stage sample with stratification of the first stage units. In the same way as the ENI, the primary sample units are census sections and the secondary (and last) sample units are dwellings. All people living in the same dwelling are interviewed. The sample is selected such that all dwellings within each stratum have the same probability of being selected. In addition, in order to avoid over-burdening the families, the sample is divided into six sub-samples called waves. Every quarter, one-sixth of the selected dwellings in each section are renewed. Each dwelling thus forms part of the sample for six consecutive quarters after which it is replaced by another dwelling from the same section. In every quarterly interview, the non-national immigrants are asked to provide their year of arrival in Spain, irrespective of the interview number.

31This pattern of sample updating may reduce the probability of finding recently arrived immigrants in the sample because such individuals can only be found in dwellings newly incorporated into the sample. For example, assuming that the interview is carried out in the fourth quarter of year N, all the individuals of the sample may declare that they entered Spain during the year Ni (i = 1, 2, 3…). However, when we try to estimate how many arrived in year N, only two-thirds of the sample can declare that they did so: those surveyed for the first, second, third and fourth time. The rest of the sample cannot do this as the rotation scheme means that those who take part in the fifth and sixth interview – two-sixths of the sample – cannot have arrived in Spain during year N because to be interviewed five or six times by the LFS they must already have been living in the country for more than a year.

32Bias is therefore inevitable in the estimate of arrivals in year N, as the probability of obtaining N as the year of entry depends on the wave in which the surveyed individuals are placed. However, a higher estimate of the entry flow will be obtained for year N in the LFS in the 4th quarter of year N + 1. While this is contrary to the demographic prediction of “losing” immigrants over time as the number of “surviving” immigrants decreases, it is brought about by the rotation scheme: as time passes, the share of the sample able to give the date of arrival as year N progressively increases. This part of the bias related to LFS sampling design has not been adequately taken into account in the calculation of the expansion factors intended to adjust for differences in the probability of the year of arrival arising from the rotation scheme.

33With regard to the different sampling design in each national LFS, the greater or lesser weight of the bias in the EU-LFS will depend on the percentage of the sample that is replaced each year. Of all the countries that provide information to Eurostat, the only ones that are free from this bias are Belgium and Luxembourg (Appendix 4), as they renew the whole sample each year, while the most affected are Switzerland and Germany which renew 20% and 25% of their samples each year, respectively. [6]

34This effect in the Spanish LFS – together with the coverage and non-response biases – causes the estimate of those who arrived in a particular year to increase in successive surveys. Figure 3 shows how the volume of immigrants who arrive in a certain year increases as the year of reference of the LFS progresses, at least until the four or five subsequent years and, sometimes, more. For example, there were an estimated 244,672 entries in 2001 according to the LFS 4Q-2001, and seven years later this figure had doubled to 522,703 according to the LFS 4Q-2008. Moreover, despite the economic recession that should have intensified exits, the number of immigrants captured in the LFS 4Q-2010 who arrived during 2009 doubled with respect to the LFS 4Q-2009. Finally, and more importantly, the entries estimated by the LFS in 2006 increased from 260,723 according to the 4Q-2006 to 430,374 according to the 4Q-2010, a figure that is a lot closer to the 569,541 net entries registered in the EVR for that year.

Figure 3

Figure 3

Figure 3

Immigrants by year of arrival estimated from LFS on consecutive quarters (2000-2010)
Source: INE (LFS) and authors’ own calculations.

35We have seen that, on average, between 2000 and 2010, the final estimate of those who arrived in a particular year may be as much as double that of the first annual estimation. However, this growth cannot be entirely attributed to the sample rotation effect, as part of it will be generated by coverage and non-response biases. In an attempt to differentiate the effect of each of these sources of bias in the LFS, we will assume an ideal situation in which there is no coverage or non-response bias. Theoretically, the sample bias may be identified in the following way.

36Surveying the sample in the fourth quarter of year N, 4Q-N, (assuming that the surveys are conducted on the last day of each quarter) the rotation system only captures arrivals during certain quarters of year N for specific subsamples (wi). These are shaded in grey in the upper part of Figure 4, and we can observe that mobility during the first two quarters of year N is captured better than in the last two. A year later, in 4Q-N+1, when the new LFS sample is surveyed with respect to mobility in year N, we find that (except for the subsample w2 which can only have arrived in the first three quarters of N), they could have arrived in any quarter of year N (lower part of Figure 4). The difference between the mobility estimate of year N based on 4Q-N and based on 4Q-N+1, is due primarily to differences in the estimation of mobility in the third and fourth quarters.

Figure 4

Figure 4

Figure 4

Estimate of immigrant arrivals in year N based on the samples of LFS 4Q-N and LFS 4Q-N+1

37Assuming that the annual arrivals are distributed uniformly over the quarters, we observe in the diagram that the estimate for 4Q-N+1 will be more than twice that for 4Q-N. However, if we assume, for example, that arrivals are more highly concentrated in the last two quarters of the year, there is reason to believe that the estimate for both 4Q-N+1 and for 4Q-N+2, may be much higher. The sample rotation effect disappears completely after N + 2. Any further rise in the estimate of arrivals in year N after this time is due to the progressive reduction in coverage and non-response biases.

38In practical terms, one way of differentiating the sample bias from the coverage and non-response bias in the LFS is to calculate the inter-annual growth rate of the subsequent estimates of the group of immigrants arriving in a given year N. In reality, the estimate of the year following arrival, N + 1, reveals coverage, non-response and sample bias. But we know that, in theory, the increase due to sample bias will begin to lessen in the second year, N + 2, and will disappear in N + 3. We can therefore consider that the difference between the growth in the number of immigrants in the first year and those in subsequent years is fundamentally due to the bias inherent in the rotation scheme.

39We calculated these rates using the LFS 4Q between 2005 and 2011. In Figure 5 we observe that the sample bias is very strong: a year after arrival, the LFS captures a volume of immigrants which has grown by 70.3% on average with respect to the initial year. The growth in the number of immigrants captured in other years is lower: 9.2% in N + 2, 3.3% in N + 3, 1.9% in N + 4, 0.6% in N + 5 and finally –3% in N + 6. These percentages could be largely due to the coverage and non-response biases. However, the effect of these biases would perhaps be greater if we were not immersed in an economic crisis which is causing immigrants to leave the country.

40This exercise to differentiate the effects of sample bias from those of coverage and non-response biases cannot be applied to the ENI; because, as indicated at the beginning of section III, the ENI does not exhibit sample bias. Therefore, the underestimation of recently-arrived immigrants in this survey is due to the other two biases: coverage and non-response. In fact, as we have already seen, there is a high non-response in the ENI (28% of the individuals surveyed, ten points above that of the LFS). And, also the concept of collective households is broader than in the LFS (Appendix 2), so coverage bias will also be higher.

Figure 5

Figure 5

Figure 5

Annual growth rate (%) of the number of immigrants by year of arrival computed in successive LFS-4Q (2005-2011)
Source: INE (LFS) and authors’ own calculations.

3 – Reducing bias through post-stratification

41Coverage and non-response errors are generally corrected using techniques for re-weighting or calibrating the expansion factors which require external information referring to the target population. Based on this information, these adjustments seek to raise the weight of under-represented sample groups and to reduce the weight of those which are over-represented. In the case of the ENI, the variables used in the re-weighting procedure are the population projections by groups of nationalities and sex, while in the LFS the variables in the post-stratification are sex, age, region of residence and nationality (Spanish/non-Spanish). Using this information as a base, this technique has been applied to correct the non-response, with similar estimates being obtained from these variables in both surveys (Appendix 3). However, this technique is based on the assumption that within each category the surveyed individuals are a representative sample of the population of this sub-group. If, as we suspect, within each category the recently arrived immigrants are under-represented, this correction might not be successful in obtaining accurate entry flow estimates. This can be illustrated with a simple example.

42Let us assume that we draw a sample whose size is equal to n. To simplify, we also assume that there are only two variables, for example, the year of arrival and the nationality, and that both variables have only two categories. If during the data collection there were no problems (no coverage error or non-response error), the distribution of the sample could be like that shown in Table 2. This sample could be considered as a representative sample of the population (P). The weight of each category in the frame will thus be equal to nij/n.

Table 2

A hypothetical representative sample of the population

Table 2
Immigrants arrived… Within last 5 years More than 5 years ago Total British immigrants n11 n12 n1 Moroccan immigrants n21 n22 n2

A hypothetical representative sample of the population

43Now, let us imagine that there is a poor response rate among the Moroccan immigrants who arrived during the last five-year period. In particular, F individuals do not collaborate with the survey, so only n21 – F individuals answer the survey in this category. As a reasonable sample size is needed, it is decided to replace these elements with others (fij). As we are assuming that recently arrived immigrants are more difficult to survey, this usually means that they will be systematically replaced by Moroccan immigrants who arrived at earlier dates (f22) or by British immigrants (f11 and f12), so the final distribution of the units in the sample (Table 3) will be quite different to that of the frame.

Table 3

Hypothetical distribution of immigrants in the sample with non-response in one category

Table 3
Immigrants arrived… Within last 5 years More than 5 years ago Total British immigrants n11 + f11 n12 + f12 n1 Moroccan immigrants n21 – F n22 + f22 n2

Hypothetical distribution of immigrants in the sample with non-response in one category

44If we multiply the number of surveyed individuals by their corresponding expansion factor [7] (P/n), we will obtain the distribution of the estimated population by nationality and period of arrival whose totals will not coincide with those of the frame. Obviously, there will be a clear bias resulting in an over-estimation of those with British nationality and those who arrived more than five years earlier from any other country.

45To avoid this, let us assume that we have external information regarding the distribution of the population in the frame by nationality. To correct for this non-response, we apply the re-weighting procedure using a correction factor based only on nationality and not on year of arrival. In this case, the correction factor is the ratio between the weight of a particular nationality in the frame and the weight of the same group in the final sample. This post-stratification makes the flow estimates less accurate if the number of recent immigrants multiplied by the expansion factor is larger than the number of recent immigrants multiplied by the expansion factor and the correction factor, as we can see in (1).

47Simplifying (1) we obtain the inequality (2), stating the condition in which the re-weighting procedure worsens the estimations:

49After brief mathematical processing [8] our results revealed that in the Spanish case, the probability of generating worse estimates with post-stratification is very high. This result is related first to the distribution of the ENI sample into large groups by nationality (1 = rich countries; 2 = poor countries) and by period of arrival (1 = between 2002-2006; 2 = before 2002), and second, to the fact that when the ENI interviewers use the replacement households during the data collection, the probability that non-response is covered by immigrants of the same origin is very high since nationalities tend to concentrate spatially, especially in the case of poor countries. If this is the case, a non-response of just 5.77% among recently-arrived immigrants from poor countries covered with oversampling of immigrants of the same nationality but who have been residents for longer will be enough to underestimate the recent immigrants when using the post-stratification.

III – Conclusions and main recommendations

50Foreign immigration has become the principal demographic phenomenon in Spain in recent years, and it is important to understand its basic characteristics. The ENI-2007, a specific survey targeting immigrants, was developed by INE for this purpose. With the publication of its data and the introduction of a methodological change in the Spanish LFS to better reflect the foreign immigrants, a new comparison of the migration information generated by the Spanish statistical system became necessary as the statistical information previously available did not provide an acceptably consistent image of international mobility.

51The comparison of the LFS and ENI survey estimates with the basic figures of the MPR reveals, on the whole, an acceptable similarity between several of the variables analysed: there are no large disparities in the distribution by age, gender, country of birth or place of residence. However, in the distribution by year of arrival we found significant discrepancies. Specifically, from the year 2001, the annual entry flows estimated by the surveys begin to fall while that of the EVR shows an uptrend until 2007.

52The lack of synchronization with the economic cycle observed in both surveys is completely illogical. It can be explained by the fact that there are biases in the specific estimate of recently arrived immigrants in the surveys for different reasons. Firstly, recently arrived immigrants are inevitably less likely to form part of the sampling frame and to be surveyed. Furthermore, the fact that collective households are not surveyed could generate sampling frame coverage bias in the surveys that is concentrated in the group of recently arrived immigrants.

53Moreover, it is highly probable that the ENI and the LFS have a non-response problem that leads to an under-estimate of more recently arrived immigrants. Non-response would not represent a problem if the households were substituted by other immigrant dwellings with similar characteristics, which has possibly been the case, except for the variable “year of arrival”. Although there is no information to prove the correlation between non-response and recent arrival in Spain, we can assume that the non-response may be associated with many situations involving recently arrived immigrants. When a household has been substituted by another which has provided a response, we would assume that its members are much more likely to be stable and settled and therefore associated with a longer length of residence in Spain.

54Finally, we have shown that the use of the post-stratification procedure based on variables other than the year of arrival, might have increased the underestimation of recent immigrants.

55In the case of the LFS, the bias derived from the sampling design must be added to the aforementioned biases. The rotation scheme may reduce the probability of finding recently arrived immigrants in the sample. The reason resides in the use of a sample divided into waves; the characteristic of being recently arrived can only be found in dwellings which have just been incorporated in the sample, not in those that have already been interviewed several times. Taking this limitation into account, under-estimation of recently arrived immigrants is inevitable. The existence of biases in the Spanish LFS becomes clear when we observe that its estimates of entry flows in a specific year increase as the survey reference year progresses.

56Based on the above arguments, reasons have been provided to explain why neither of the two surveys adequately captures the recent pace of entry flows and therefore cannot reliably reflect the situation of all immigrants residing in the country at a given moment. The Spanish case shows that obtaining demographic and social information corresponding to the immigrant population through sampling techniques requires a specific sample design which, in order to guarantee sample representativeness, must use not only the standard demographic variables but also the variable “year of arrival”. If this variable is not considered in the initial design or in the post-stratification, there is a high risk that its estimate will be biased and, consequently, all the variables that are correlated with it. If this is the case, the picture of the migrant population does not correspond to reality. [9] Given that Eurostat is promoting the use of immigration surveys similar to the ENI in other EU countries, [10] it is essential to take these recommendations into account.

Acknowledgements

We are grateful to José Peris for his helpful comments.

Appendix 1 - National Immigrant Survey (ENI-2007)

Lead organization National Statistics Institute (Instituto Nacional de Estadística, INE) Collaborating organizations Department of Employment Complutense University of Madrid Objectives To provide information on socio-demographic characteristics, living conditions and socioeconomic situation of the community of immigrants. To contextualize important aspects of the migratory experience as regards the weight of networks in influencing the decisions and strategies that those immigrants themselves decide. To generate information regarding certain strategies and aspects of the migratory experience, as well as the functioning of their family networks and the characteristics of the group they belong to. To analyse the itineraries followed by the immigrants as well as certain aspects of their migratory experience. To generate information on the ties that the immigrants maintain with their countries of origin, and among themselves in Spain, as well as their documentation situation and their strategies for the medium-term future. Date of reference 1 January 2007 Target population People born abroad and aged 16 years old and over who, at the time of the survey, had lived or intended to live in Spain for at least one year. Persons born outside Spain who had Spanish nationality from birth, and who arrived in Spain before age two were excluded from the target population. Sampling frame The MPR available at the time of the survey, with the reference date of September 2006. People living in collective households were excluded. In this survey, collective households are dwellings with 15 or more occupants. Sample design Two independent samples were designed: sample of dwellings in which at least one foreign national resides. sample of dwellings in which only Spanish people reside. Three-stage sampling was used in both cases, with stratification of primary sample units. The primary sample units are the census sections, the secondary units are permanent private households and the third (last) sample unit is the person selected with equal probability from among foreign residents. The primary sample units are grouped into strata according to the size of the municipality to which the section belongs. Within each stratum, the sections are grouped into substrata according to the nationality in the sample of dwellings with foreign nationals, and according to the age groups and sex of the population in the sample of dwellings only inhabited by Spaniards. The sections were selected with a probability proportional to the size measured by the number of eligible foreign nationals in the sample of dwellings with foreigners, and by the number of dwellings in the sample of dwellings with only Spanish people. In each section, the dwellings were selected with equal probability. Mode of administration Face-to-face interview in respondent’s home. Computer assistance Computer-assisted personal interview (CAPI). Reporting unit Person age 16 or older randomly selected in household. Time dimension One cross-sectional survey. Levels of observation Person and household. Web link http://www.ine.es/jaxi/menu.do?type=pcaxispath=%2Ft20%2Fp319file= inebaseL=
Source: INE (2007 and 2008).

Appendix 2 - Statistical Sources: Adjustments made to compare the data

Statistical source Coverage Definition of international migrant Adjustments to make the data comparable under the immigrant definition used in ENI-2007 Sample surveys ENI-2007 People born abroad and aged 16 years old and over who, at the time of the survey, had lived or intended to live in Spain for at least one year. Persons born outside Spain who had Spanish nationality from birth, and who arrived in Spain before age two were excluded from the target population. People living in collective households are not sampled. In this survey, collective households are dwellings with 15 or more occupants. Stock: foreign-born population. Flow: foreign-born population. LFS Population residing in private households, including servants. People living in collective households and persons who are temporarily absent are sampled via relatives living in private households. Foreign nationals are included in the resident population if they have lived or intend to live in Spain for more than one year. Stock: non-national population, persons born abroad. Flow: non-national immigrants. Stock: in the 4Q-2006, population born abroad aged 15 and over. Flow: in the 4Q-2006, non-national immigrants aged 16 and over. Registers MPR Everybody who resides in Spain is obliged to register in the municipality where they habitually reside (more than six months per year) Stock: non-national population, persons born abroad. Stock: on 1 January 2007, population born abroad aged 15 and over. EVR All internal annual residential variations and movements from or to foreign countries. Flow: non-national population, persons born abroad. Flow: annual registrations of newly arrived immigrants born abroad aged 16 or more on 1 January 2007. Annual deaths and exits of foreigners are subtracted from the annual entries.
Source: INE.

Appendix 3 - Socio-demographic characteristics of the stock of immigrants

Figure A3.1

Figure A3.1

Figure A3.1

Sex and age pyramid of population born abroad, ENI-2007 and MPR
Sources: Figures A.3.1 to A3.5: INE (ENI-2007, LFS and MPR) and authors’ own calculations.

Figure A3.2

Figure A3.2

Figure A3.2

Sex and age pyramid of population born abroad, LFS 4Q-2006 and MPR
Sources: Figures A.3.1 to A3.5: INE (ENI-2007, LFS and MPR) and authors’ own calculations.

Figure A3. 3

Figure A3. 3

Figure A3. 3

Population born abroad by region of residence, ENI-2007, LFS 4Q-2006 and MPR
Sources: Figures A.3.1 to A3.5: INE (ENI-2007, LFS and MPR) and authors’ own calculations.

Figure A3.4

Figure A3.4

Figure A3.4

Population born abroad by place of birth, LFS 4Q-2006 and MPR
Sources: Figures A.3.1 to A3.5: INE (ENI-2007, LFS and MPR) and authors’ own calculations.

Figure A3.5

Figure A3.5

Figure A3.5

Population born abroad by country of birth, ENI-2007 and MPR
Sources: Figures A.3.1 to A3.5: INE (ENI-2007, LFS and MPR) and authors’ own calculations.

Appendix 4 - Coverage, response rate, rotation scheme and weighting in the EU-LFS by country, 2008

Country Population in institutional households included in sample Response rate in 2008 (%) Rotation scheme Percentage of sample replaced each year Post-stratification variables Austria No 94.7 5- 80 Sex, age group, region, nationality Belgium No 74.2 1- 100 Sex, age, region Bulgaria Via the household (students/conscripts) 80.7 2-(2)-2 50 Administrative districts, sex, 4 age groups, urban/rural Croatia No 81.4 2-(2)-2 50 NUTS 3 Cyprus Conscripts only via the household 95.2 6- 66.66 Districts, age group, sex Czech Republic No 79.0 5- 80 District, age group, sex Denmark Sampled 54.5 2-(2)-2 50 Sex, age, income, sector of activity, vocational educ., reg. unemployment Estonia Sampled 66.0 2-(2)-2 50 Age group, sex, region, urban/rural, nationals/non nationals Finland Sampled 81.0 3-(1)-2 80 Sex, age group, region, registered unemployment Former Yugoslav Republic of Macedonia No 89.7 2-(2)-2 50 Sex, age group, NUTS 3 region, number of households at regional level and number of households by size France Via the household 84.3 6- 66.66 Size of urban entity, size and type of housing, number of new dwellings, age, sex, regions Germany Yes 97.1 4-(annual) 25 Sex, 3 age groups, region, nationality Greece No 88.2 6- 66.66 Strata, sex, age groups (10 year) Hungary No 80.2 6- 66.66 Region, age*sex, N of households, Population in cities (> or < 50,000) Iceland Sampled 80.7 3-(2)-2 60 Sex, age group Ireland No 83.2 5- 80 Sex, age, region Italy No 88.2 2-(2)-2 50 Sex, age, region, nationality, number of households Latvia No 67.7 2-(2)-2 50 Age group, sex, region, urban/rural Lithuania No 77.7 2-(2)-2 50 Age group, sex, urban/rural, regions Luxembourg No 32.0 2- 100 Sex, age group, nationality Malta No 83.2 2-(2)-2 50 District, age group, sex Norway Sampled 87.0 8- 50 Region, age, sex, employment status Poland No 74.3 2-(2)-2 50 Age group, sex, urban/rural, size of the locality of residence

Coverage, response rate, rotation scheme and weighting in the EU-LFS by country, 2008

Country Population in institutional households included in sample Response rate in 2008 (%) Rotation scheme Percentage of sample replaced each year Post-stratification variables Portugal Via the household 87.1 6- 66.66 Sex, age, region Romania Via the household 94.3 2-(2)-2 50 County, age group, sex, urban/rural Slovakia Via the household 93.4 5- 80 Region, age group, sex Slovenia No 80.4 3-(1)-2 80 Region, age group, sex Spain Via the household 83.0 6- 66.66 Sex, age, region, nationality Sweden Sampled 80.8 8- 50 Sex, age group, sector of activity, registered unemployment Switzerland No 75.1 5-(annual) 20 Region, marital status, age-groups, sex, nationality-groups. The Netherlands No 81.4 5- 80 Sex, age group, region, ethnic background, marital status Turkey No 87.0 2-(2)-2 50 Region, urban/rural, sex, age group United Kingdom Population in hospitals: sampled; Students: via the household 68.0 5- 80 Sex, age group, region
Source: Eurostat (2010a).

Appendix 5 - Mathematical conditions for which the inequality (2) is met

57The inequality (2) will be met depending on the distributions of elements nij and fij, as shown in Table A.5.1 below.

Table A.5.1

Summary of conditions to meet the inequality (2). Extreme cases

Table A.5.1
Structure of the sample (n) distribution: Distribution of elements that replace non-response (F) in the sample: F = f11 F = f12 F = f22 f11 > 0 with n12 - n21 0: f12 > 0 f22 > 0 with n12 - n21 < 0: (see below) or

Summary of conditions to meet the inequality (2). Extreme cases

58With regard to n, with two nationalities and two periods of arrival, we have three possibilities: the ratio n21/n2 may be equal to, higher or lower than the ratio n11/n1. However, F may be distributed in many ways, but to simplify, [11] we assume that only one fij is positive.

59After a brief mathematical treatment we can obtain the values of fij that meet the inequality (2), or the mathematical conditions for which the inequality (2) is met:

60* If F = f11 > 0, the expression (2) can be written as:

When, the numerator of (3) is never negative.
Giving n22 + n12 > 0, any value of f11 > 0 meets the inequality (2). Only when, must f11 satisfy (3) in order to meet (2).
62* If F= f12> 0, the expression (2) can be written as:
Graphically the expression (3) describes a U-shaped parabola. This parabola may or may not have x-intercepts depending on the distribution of the elements in the sample by nationality and year of arrival.
When with n21 n1 - n11n2 ? 0, if n12 - n21 ? 0, any value of f12 satisfies (4)
and therefore the inequality (2) is met (graphically, this is the case in which the parabola does not have x-intercepts).
If n12 - n21 < 0 or , the parabola have x-intercepts. The intercepts are z1 and z2.
The inequality (4) is met when the value of f12 is not in the interval [z1, z2].
Then the expression (4) is satisfied when f12 < z1 or f12 > z2.

63* If F= f22 > 0, we can write the expression (2) as:

When , the numerator of (5) is never positive. And giving n1 > 0, any value of f22 > 0 satisfies inequality (2). Only when , must f22 satisfy (5) in order to meet (2).
Since the ENI sample theoretically has the same distribution as the MPR by large groups of nationality (1 = rich countries; 2 = poor countries) and by period of arrival (1 = between 2002-2006; 2 = before 2002), the case of Spain would be in the last row of Table A.5.1. For this distribution, the inequality will be met if the non-response among recent immigrants from poor countries (n21) is above 1.95% if F = f11, above 5.77% if F = f22, or above 68.54% if F = f12.
This means that the probability of obtaining less accurate estimates with post-stratification is very high if the oversampling occurs in categories n11 or n22. On the other hand, only if the non-response is more than 68.54% of n21 and is replaced by n12 units, will the post-stratification be a problem. In our opinion, when the interviewers have to use replacement households during data collection, the probability that non-response will be covered by immigrants of the same origin is very high since nationalities tend to concentrate spatially, especially in the case of poor countries. In this case, the non-response in n21 will be replaced mainly by n22 individuals and, consequently, a f22 equal to 5.77% of n21 will be enough to underestimate the recent immigrants using post-stratification.

References

  • Eurostat, 2010a, Labour Force Survey in the EU, Candidate and EFTA Countries – Main Characteristics of the National Surveys, 2008, Luxembourg.
  • Eurostat, 2010b, “Migration statistics mainstreaming”, Joint UNECE/Eurostat Work Session on Migration Statistics, Geneva, Switzerland, 14-16 April.
  • Groves R. M., Fowler F. J., Couper M. P., Lepkowski J. M., Singer E., 2009, Survey Methodology, 2nd ed., New Jersey, John Wiley and Sons Inc.
  • INE, 2007, Encuesta Nacional de Inmigrantes 2007, Metodología, http://www.ine.es/daco/daco42/inmigrantes/inmigra_meto.pdf (accessed 10 September 2011).
  • INE, 2008, Informe Encuesta Nacional de Inmigrantes 2007, Documentos de trabajo 2/08.
  • INE, 2009a, Estimación de la Población Actual. Metodología detallada, http://www.ine.es/daco/daco43/epoba/metodo.pdf (accessed 10 September 2011).
  • INE, 2009b, Encuesta de Población Activa. Diseño de la Encuesta y Evaluación de la Calidad de los Datos. Informe Técnico, http://www.ine.es/docutrab/epa05_disenc/epa05_disenc.pdf (accessed 10 September 2011).
  • Levy P. S., Lemeshow S., 2008, Sampling of Populations, 4th ed., New Jersey, John Wiley and Sons Inc.
  • Martí M., Ródenas C., 2004, “Migrantes y migraciones: de nuevo la divergencia en las fuentes estadísticas”, Estadística Española, 46(156), pp. 293-321.
  • Martí M., Ródenas C., 2007, “Migration estimation based on the Labour Force Survey: An EU-15 perspective”, International Migration Review, 41(1), pp. 101-126.
  • OECD, 2009, Sources and Comparability of Migration Statistics, http://www.oecd.org/dataoecd/59/38/43180015.pdf (accessed 10 September 2011).
  • Radermacher W., Thorogood D., 2009, “Meeting the growing needs for better statistics on migrants”, 95th DGINS Conference Migration – Statistical Mainstreaming, Malta, 1 October 2009.
  • Ródenas C., Martí M., 1997, “¿Son bajos los flujos migratorios en España?”, Revista de Economía Aplicada, 15, pp. 155-171.
  • Ródenas C., Martí M., 2009, “Estimating false migrations in Spain”, Population, English Edition, 64(2), pp. 397-412.
  • Steinbuka I., 2009, “How to improve social surveys to provide better statistics on migrants”, 95th DGINS Conference Migration – Statistical Mainstreaming, Malta, 1 October 2009.

Publisher keywords: estimation bias, immigrant survey, labour force survey, migration flows, year of arrival

Logo Souscrire pour ouvrir

This article is available in open access under our model Subscribe To Open.

Uploaded: 02/05/2013

https://doi.org/10.3917/popu.1203.0517