Réseaux 2013/1 No 177

Couverture de E_RES_177

Journal article

Inside the Mind of PageRank

A study of Google’s algorithm

Pages 63 to 95

Notes

  • [1]
    This work was carried out under the ANR project “Algorithm Policy” (ALGOPOL – ANR 2012 CORD 01804).
  • [2]
    See the article by D. Pontille and D. Torny in this issue.
  • [3]
    Video of a conference by Sergey Brin at Berkeley University’s School of Information on 3 October 2005: UCBerkeley, “SIMS 141 – Search, Google, and Life: Sergey Brin – Google”, YouTube, 20 August 2007.
  • [4]
    For a more comprehensive discussion on this topic see Cardon (2013).
  • [5]
    Particularly the article by Farrell and Drezner (2008), whose findings were challenged by Hindman (2008).
  • [6]
    Goodwin (2003) supports a “disjointed deliberation” model, whereby a plurality of conversation circles can circulate the expectations of their debates between one another owing to certain members’ co-belonging to different groups.
  • [7]
    Google Inc., Letter from the Founders: “An Owner’s Manual” for Google’s Shareholders, in Forms S-1 Registration Statement Under the Securities Act of 1933.
  • [8]
    More or less surprisingly for an IT article, the founders of Google slipped a citation from Ben H. Bagdikian’s book The Media Monopoly into the bibliography of the WWW conference’s most famous paper (Brin & Page, 1998). This book severely denounced journalistic biases and the effects of economic concentration on the pluralism of the press.
  • [9]
    Google Inc., “Why We Sell Advertising, Not Results”, Google.com, 2004 [http://www.google.com/honestresults.html].
  • [10]
    Singhal (A.), “Introduction to Google ranking”, Official Google Blog, 9 July 2008.
  • [11]
    Webmaster Tools, “Webmaster Guidelines”, http://www.google.com/support/webmasters/bin/answer.py?answer=35769
  • [12]
    Google Webmaster Central, “More guidance on building high-quality sites”: http://googlewebmastercentral.blogspot.gr/2011/05/more-guidance-on-building-high-quality.html
  • [13]
    All these citation excerpts are taken from Google Webmaster Central, “More guidance on building high-quality sites”: http://googlewebmastercentral.blogspot.gr/2011/05/more-guidance-on-building-high-quality.html
  • [14]
    However the superposition of the authority and audience rankings only applies for the head of the results lists. They can be distinguished as soon as we leave the top of the rankings or the keywords become more complex and less central (Pennock et al., 2002).
  • [15]
    See Citton (2013) on the distinction between apparatuses and machines.
  • [16]
    On the claim of learning algorithms to governing reality by embracing its every dimension, see the article by T. Berns and A. Rouvroy in this issue.

1What is Google’s dream [1]? How does the company which best symbolises the most advanced form of cognitive capitalism conceive of the web and what does it want it to look like? How much importance should we give to the words of bold young entrepreneurs pitching to the financial markets, claiming to want “to make the world a better place”? In this article, I argue that in order to answer this question we must venture deep into the computational architecture of PageRank, the algorithm that is making Google’s fortune and can be seen as its crown jewel. PageRank is a moral machine: it embraces a value system, based on giving prominence to those deemed worthy thereof by others, and implements the desire to make the web a space where the exchange of merit is neither impeded nor deformed. However the spirit animating PageRank is so deeply entangled with so many other considerations, interests and points of concern that its echo has become virtually inaudible. The debates surrounding the company Google have stifled PageRank. This study seeks to understand what Google has done to the web and with the web, by reviving the principles underpinning its algorithm and governing many of the Mountain View firm’s behaviours, decisions and strategic choices. The way Google has imposed an order of its own on the web can be grasped by exploring the algorithm’s procedures, its mode of functioning, its decisions, its taboos, and the whole apparatus it has built to codify Internet users’ behaviour. This also sheds light on how, under the effect of recent transformations of the web, other data-sorting principles have weakened the spirit of PageRank.

The Invention of PageRank

2Before Google, the web was a vast lottery. Answers to Internet users’ requests were hazardous, often fanciful, massively rigged and occasionally relevant. The first search engines operated with keywords and measured the density of the presence of the search term on the different webpages. In 1990, pioneers Archie and Veronica only indexed the title of the document, before Brain Pinkerton’s WebCrawler began to take the page’s full text into account in 1994. The innovative search engine AltaVista, designed by Louis Monnier for DEC, was the first to endeavour to index the entire web. When it opened to the public in December 1995, it was less clumsy than the others at picking from an index of 16 million documents (Batelle, 2005: 40). Until the young researcher from Cornell University, Jon Kleinberg, proposed it to IBM in 1996, no one had really thought of paying attention to the structure of hypertext links rather than to the semantic analysis of pages. Yet in 1998, many were searching for an efficient solution to improve search engines, for their quality was constantly deteriorating as the number of pages grew. This was making it very easy to deceive lexical algorithms. Webmasters simply had to copy the terms most searched for in white against a white background multiple times to improve their site’s visibility. In order to overcome the deadlock of lexical research, a revolution in the algorithm’s design was needed. And it was finally put on the market by two Stanford students, Sergey Brin and Larry Page (1998). But this breakthrough was guided by a deep-rooted intuition originating in the spirit that had fostered the design of the network of networks: one which sought to take advantage of the relational structure of pages, which are held together by hypertext links, to extract an order which builds on the meaning of interactions between the Internet users who publish on the web.

Sociometry and Scientometrics

3The history of Google’s creation, which has provided all possible codes for the manufacturing of the Silicon Valley myth, no longer needs to be written (Levy, 2011). Hence my focus here is on the way the design of PageRank has been lastingly associated with a particular representation of Internet which has a structuring effect on the ecosystem now formed by the web and its dominant search engine. The founding intuition which saw the birth of Google considers the incoming link (rather than the outgoing link, as Lycos did back then) as the support for all operations to classify Internet sites. But the underlying principle is not new, and is rooted in two different traditions: sociometry, which was to bring psychology and graph mathematics together around the properties of the network form, and scientometrics, which was to apply the knowledge of library science to the assessment of scientific activity. Although these two traditions had not come into contact much, they converged on at least one point that proved decisive for PageRank: defining the metrics to use to describe the relational forms of the social. Whether based on influence in sociometry or the citation in scientometrics, a shift occurred, as the analysis moved away from fixed and self-sufficient objects, be it social actors or documents, towards the relations between them.

4The idea of using the citation link to define the ranking of information goes back to Moreno’s sociometric revolution. In the 1930s, Moreno had sought to describe the structure of society based on the links between individuals, rather than the categories used to identify and differentiate people (Mayer, 2009). He asked individuals to point out the people around them whom they liked the most (people they admired, with whom they had the most contact, etc.) and the ones they liked (admired, etc.) the least. In so doing, Moreno’s group psychology introduced both the idea of representing individuals’ social network in the form of a sociogramme, and that of organising it according to a principle of attractiveness and repulsion that attests to the reciprocal influence the ones have on the others. As Bernhard Rieder (2012) pointed out in his study of PageRank’s computational genealogy, while Moreno’s sociology did not involve mathematisation, others, particularly Elaine Forsyth and Leo Katz (1946), used it to develop a set of matrices and calculations that were to contribute to the birth of a social mathematics in the burgeoning graph theory.

5But PageRank is primarily embedded in another research tradition that was destined for great success, scientometrics, which took shape with the Science Citation Index (SCI) founded in 1964 by Eugene Garfield, at the Institute of Scientific Information. Garfield’s project was to facilitate circulation within scientific content by encouraging movement from citation to citation, between scientific articles. Garfield, a freelance consultant in the field of documentation, launched several initiatives with the aim of producing a scientific citation index. The idea at the time was not to measure researchers’ reputation, but to “mak[e] it possible for the conscientious scholar to be aware of criticisms of earlier papers” (Garfield, 1955, cited in Wouters, 2006: 14). After protracted efforts, he managed to convince the National Science Foundation and the National Institute of Health to support the implementation of an instrument both to centralise scientific production with a publication citation database, and to objectivise it through a series of measures. For the first edition of the Science Citation Index (SCI), in 1964, 1.4 million citations from the articles of 613 journals published in 1961 were manually collected, at a great cost. The results were recorded on a magnetic tape, to constitute one of the first large databases in the history of computer science.

Externality, Abstraction, Proceduralism, Neutrality, Honesty

6Let us look at what the Science Citation Index’s ambition to represent science through its web of citations entailed, before it became an assessment metric for scientific bureaucracy. While maintaining a referential link with the world it recorded, this representation also invented a very particular cognitive framework, in which five epistemic properties characterising PageRank can be identified. The first assumption is the claim to a position of externality. The SCI positions the science objectification instrument outside of science, so as to measure its quality without drawing normative support from within the scientific field, which any peer assessment entails. This externality also endows it with a comprehensive overview of scientific activity, which researchers trapped in their disciplines cannot access. The second assumption is contingent upon this comprehensive view: that of abstracting the citation from the context in which it was issued. The SCI’s main operation consists in transforming a list of references in an article, which are simple and immediately accessible data, into a list of citations that articles receive from other publications (information that is not visible from the articles themselves and can only be calculated with access to all the citing texts). The SCI therefore proceeds to a cognitive operation which consists in simply turning the reference (a mention of Article B in Article A) around into a citation (the fact of Article B having been cited by A).

7This tiny operation, which requires phenomenal data collection work in the analogical world, was to be overhauled by the hypertext link that made it possible to activate the relationship between the citing text and the cited text. Whereas the reference is endowed with multiple meanings by the one who places it in his or her text (honouring, criticising, marking precedence, specifying, parading, etc.), the citation erases the diversity of contextual meanings of the reference that created it, turning it into a univocal abstraction. As Paul Wouters pointed out, while all references are not equal, by abstracting their emission context, the inversion of references into citations calls for considering all citations as equal. This transformation therefore contributes to unifying the meaning of the citation to make it a sort of “currency of scientific activity” that is standardised, decontextualised, univocal and equal (Wouters, 1999: 108-109). Whereas the reference refers to its emission context, the citation only refers to itself. Having become a simple sign, the citation’s value becomes self-referential and lends itself to computation.

8The third assumption underpinning this system to objectify science through its citations is that it is resolutely procedural. The SCI measures not the substantial content of the scientific assessments researchers exchange in their articles, but the sole self-referential citation with its indexable properties (author name, host institution, title, type of publication, etc.). “Whereas the scientific literature depicts science with a focus on its cognitive claims (the content of the articles and books published)”, wrote Paul Wouters (1999: 7), “the SCI represents the scientific literature by obliterating content in favour of its formal properties”. The SCI deliberately reduces the substantial variety of scientific discourse to make it an exploration tool likely to circulate in all disciplinary communities without having to worry about their idiosyncrasies. This formalism must totally disregard scientific arguments, from which it draws no legitimacy. There is no need to know the multiplicity of meanings that researchers put into the act of citing, for the overall calculation of the number of citations – which benefits from the effects of the statistics of large numbers – to constitute a good approximation of whatever they are being used to measure. “The brilliant utility of the citation index approach”, wrote Joshua Lederberg, a geneticist who actively supported Eugene Garfield in his enterprise, “is that it cuts across the problem of meaning by an automated procedure” (cited in Wouters, 1999: 20).

9Fourth, the representation of science through a web of citations is thought of as a product not of those who index but of those who cite one another: the researchers who publish. With all its simplifications, the citation index embodies the transparent ideal of non-interventionist objectivity. It is the cumulative outcome of the actions of the researchers mutually citing one another, without it having acted, in principle, according to this representation which is external to their activity. The SCI thus imports the naturalist assumption of “mechanical objectivity” (Daston, Galison, 2012) into the world of knowledge of science. However, because it observes a social world which is reflexive, unlike the natural world, this cognitive technology’s assumption of transparency cannot escape the risk that those it records will act according to those measuring them. Hence the assumption of neutrality implies a complementary claim to invisibility, so as not to disturb the outside world whose doings it captures. As we shall see, being forgotten is PageRank’ dream, but a dream that is fulfilled less and less as time goes by. of.

10Lastly, the SCI relies on a basic assumption which supports all the others: cautious trust in the citation’s honesty. The proceduralism of the new technology to measure science requires an underpinning substantial justification that confers the values of academic ethos onto the scientific citation. Counting citations without trying to understand them is only possible with the following two assumptions: that despite the variety of ways in which they use these citations, researchers consider them overall as one of the most evident signs of their belonging to the community and of respect for one of its most fundamental rules; and that, subject to the community’s vigilant critique, the citations are based on principles that are justifiable to the community. Under this condition, researchers can endow the act of citing with all possible meanings, without bibliometricians concerning themselves with them. Bibliometricians are content with cautiously trusting the understanding of scientific activity as competition for peer recognition, put forward by Robert Merton (1957). His reasoning closely intertwines moral and cognitive constraint. Because science is public and not private knowledge, researchers must put their work in the public domain and have it recognised as their own. But since scientific production is a web of interdependencies between different works, it is crucial to cite the source that each publication inherits, at the risk of the community pointing out an irregularity. As Merton stressed (1977), “Citations and references thus operate within a jointly cognitive and moral framework”. Citations thus provide scientometricians with the trace of a regular and objectifiable practice that sufficiently incorporates the norms of scientific activity to undergo a computational procedure.

The Link Is a Vote

11The research subject that Sergey Brin and Larry Page originally presented to their teacher, Terry Winograd, was the design of a system to load annotations onto websites (Levy, 2011: 16-17). However, the young Stanford students soon realised that the hypertext link amounted to a citation and that, in its own way, it could be considered as a vote. The lineage from the Science Citation Index to PageRank is explicit. The founders of Google, both sons of academics, relentlessly highlighted the fact that “A large number of citations in scientific literature […] means your work was important, because other people thought it was worth mentioning” (Vise & Malseed, 2006: 34-35). In Jon Kleinberg’s article “Authoritative Sources in a Hyperlinked Environment” (1998), which was to influence Larry Page in the design of PageRank, researchers’ reputation measured in bibliometrics was very clearly advertised as the main source of inspiration. The article insisted on the fact that, just like the scientific citation, the hypertext link is both an act of recognition and a sign of authority:Hypertext links”, he wrote, “encode a considerable amount of latent human judgements and we claim that this is exactly the type of judgement required to formulate the notion of authority”. The hypertext link delimits the field of relevance of the citer’s text, recognises the value of the content cited and, when that content receives multiple and diverse approbations, asserts its importance on a meritocratic scale which honours those who have been identified by their peers. It constitutes exactly the kind of trace which, turned into a metric, can rank informational objects according to the prevailing rationality in the world of research, by drawing attention to the content with some sort of prominence. This prominence, conventionally measured by the number of citations which are themselves authorised, constitutes the best approximation of epistemic certainty. Jon Kleinberg (1998) specified that counting links alone is enough to capture any document’s authority: “the creation of a link on the www represents a concrete indication of a judgement of this kind: by including a link towards page q, the creator of page p has to some extent conferred authority to q. What is more, the links provide an opportunity to discover potential authority simply by way of the pages pointing towards it”. This intuition, inherited in every respect from the abstraction and proceduralism properties of the Science Citation Index, was decisively ground-breaking. It made the quality of the information found on sites not an internal property to search for inside the document, through an ever more detailed analysis of its lexical content, but an external property shaped by the respective attributions made by sites recognising one another. Quality is a social construct that interactions project onto the documents. Larry Page made this crystal clear in the patent which, without going into detail, describes the functioning of PageRank: “Intuitively, a document should be important (regardless of its content) if it is highly cited by other documents” (Page, 1998). The hypertext link is simply an envelope, a “concretion of intelligence” (Pasquinelli, 2009: 155), which must not be opened so as to preserve its computability. Its markers are easily identifiable by the robots that vacuum the web. There is no need to know why it was created, nor what amount of diverse and varied intentions, inferences, computations and appreciations have gone into its creation. Just as in a ballot box, it simply needs to be counted. The founders of Google audaciously expanded on this understanding of scientific authority by extending the metaphor of the link as a citation to that of the link as a vote. In the section “Why Google”, the Mountain View company readily presented its algorithm as the source code of democracy:

12

PageRank is a champion of democracy (…): any link pointing from page A to page B is considered as a vote by page A for page B. However Google does not limit its assessment to the number of ‘votes’ (links) received by the page; it also proceeds to an analysis of the page which contains the link. The links found in pages deemed important by Google have more ‘weight’, and thereby contribute to ‘electing’ other pages”.
(cited in Cassin, 2007: 102-103)

The Weighting of Pages

13Although PageRank makes hyperlinks vote, its political regime is not a democracy in which each voter has the same weight, but a meritocracy that does not grant equal power to each vote. Whereas the algorithm developed by Eugene Garfield to measure journals’ reputation, the Journal Impact Factor (JIF), considered each citation as equivalent, PageRank re-appropriated a recursive mechanism to allocate different weights to citing pages. He drew inspiration from the influence weight algorithm developed by Gabriel Pinski and Francis Narin (1976) [2]. Narin first suggested establishing a ratio of the relationship between incoming and outgoing citations to avoid certain journals receiving much prestige solely because they published many articles. The weight of a journal is measured here by the number of incoming citations divided by the number of outgoing citations. This ratio has the effect of making authority a circulating good, both received and distributed, which gives a positive balance to those who receive more than they distribute. Turned into a ratio, the authority index refers citations to themselves to make it a real currency. Francis Narin’s main idea was to consider that not all citations have the same weight, and that a recursive attribute needs to be applied to them to compute the citer’s authority within the network, according to the number of citations they have themselves received from the others.

14The equality of the citations counted by the Journal Impact Factor makes sense within the small community of the ISI’s Science Citation Index’s database, which only includes academic journals. The citation is egalitarian and is counted democratically albeit based on census, with the electoral body restricted to academics. It is wise to count equal votes when the authority filter has already been applied, by selecting the academic status of the citers, who are the only ones publishing in academic journals. Within the context of a restricted market, differentiating the authority of the votes would even seem contrary to the scientific community’s egalitarian principles. However while the number of journals collected by the Science Citation Index may, at the time, have seemed very large, it is infinitely small in comparison with the gigantic volume of pages linked together on the web. Here the citation market is considerably expanded, as the barrier of scientific status is lifted. Authority is no longer measured at entry point, but inside the database. The web, an inclusive space, considers all Internet users who publish content as citers and does not require any qualifications from them. The weighting of citing pages’ authority appears to be an effect of the democratisation of citers. In a world open to everyone and anyone, voting equality would instil a principle of popularity rather than authority (Cardon, 2011). How, then, can the prominence of certain documents be recognised when those citing them are not peers? In 2004, when the web had become so vast and diverse that it was increasingly difficult to map the model of the scientific world onto it, Sergey Brin used the metaphor of social recommendation to describe PageRank. The authority conferred by researchers’ judgement on their respective works was extended to the trust granted to experts in daily life. The algorithm was then no longer entrusted with making a community of preselected equals vote, but with singling out those recognised as expert by others, for their vote to carry greater weight. Although not everyone is equally worthy of trust, everyone is capable of identifying those who are:

15

If I’m looking for a doctor in the area”, Sergey Brin explained, “I might go around and ask my friends to recommend good doctors. They in turn may point me to other people who know more than they do – ‘This guy knows the whole field of Bay Area doctors’. I would then go to that person and ask him. The same thinking applies to websites. They refer to one another with links, a system that simulates referrals”.
(Sheff, 2004)

16Influence weight offers an algorithmic solution to resolve the tension between the democratisation of citers and the meritocratic features of the principle of authority. In the open world of the web, it is impossible to give everyone the same authority as in the confined world of science. PageRank provides a very elegant solution: opening to the diversity of citers must help identify the authority of the pages of the web, not of the Internet users who produced them. In 2005 Sergey Brin explained: “… [we] came up with the notion that all web pages aren’t created equal - you know, people are, but not web pages. Some web pages are inherently, not worse than others, but at least less important than others. And we developed this analysis of the graph of link structures of the web, that imputed an importance for every web page.” [3]. PageRank considers that publishing Internet users are equal but that their pages are not, and makes this distinction between the person and the page a way of preserving the principle of authority when the right to publish is open to all. To do so, it draws on an idealisation at the heart of the conception of the web shared by its pioneers. The hypertext link, the basic structure of a seamless network of documents, constitutes the most accomplished realisation of a utopia in which texts could relate to one another, by escaping the authority of their producer. As early as 1945, this dream animated Vannevar Bush’s visionary text As We May Think, which deeply influenced the pioneers of Internet. It then fuelled Ted Nelson’s Xanadu project (1965), Bill Atkinson’s HyperCard system (1986), and Tim Berners-Lee’s founding invention of the World Wide Web in 1990. According to this vision the graph of the web, a pure intertextuality, only consists of associations between terms, without there being a need to qualify the persons who produced them. The disappearance of the enunciator is at the heart of this idealised vision of a world of ideas that communicate with one another, through argumentation and reasoning freed of the weight of the interests, personality or psychology of those who emitted them (Lévy, 1991: 62). The graph of the web on which PageRank’s recursive indicator circulates is a graph of documents, not of people. The authority it measures proceeds from an operation which relies neither on the sole content of texts citing one another, as a semantic approach to the web would advocate, nor on the status of the people who wrote the texts, as the Science Citation Index does by restricting access to scientific journals to researchers. Drawing on the distinction made in the pragmatics of utterances, the social force of which PageRank measures the authority is not that of the utterance, nor of the utterer, but of the subject of the utterance [4]. PageRank presupposes the opening of a distance between the utterer and the subject of the utterance, so that the intertextual web of references may open a “space where the writing subject endlessly disappears” (Foucault, 2001: 821) for the author (i.e. the utterer). PageRank considers the hypertext link neither as a semantic association between utterances nor as an exchange of gratifications between people, but as a means of assessing the authority of a webpage. The hypertext link starts from an element of the citing text to identify the URL of a page as a whole. It thus attributes its strength to the page form by coupling a text and an author, thereby offering a realistic and incredibly efficient way to rank documents.

How Can the Strength of the Incoming Link Be Captured?

17In 1998, as soon as the first comparative tests were carried out, the quality of the results offered by PageRank for diverse requests appeared to be infinitely greater than that of their competitors’. “It was the difference between judging a stranger by his looks and gathering opinions from everyone who knew him” (Edwards, 2011: xii). In August 1999, Google’s servers received 3 million requests per day. In August 2000, after reaching an agreement with Yahoo!, 60 million requests were sent daily to the young start-up’s servers (Batelle, 2005: 126). For its algorithm to function, Google compiled a table of all the websites crawled and recorded in a huge index, to which it attributed a set of signals. It currently counts over 200 signals for each page recorded (Singhal, 2008). These inform two different dimensions of the quality of a search: the page’s relevance to the demand made in the request, and the importance of the page compared to the other pages that present the same level of relevance. The first dimension seeks to specify the meaning of the request as clearly as possible, so that the replies offered by the engine correspond closely to the Internet user’s question. In this domain, Google has developed a large panel of semantic indicators that have contributed to complexifying the algorithm and increasing the number of signals. The second dimension seeks to measure the authority of the reply, among the relevant propositions, by filtering incoming links as well as possible in order to differentiate between those that carry authority and those that do not. “The popularity score came to the rescue of the content score” (Langville & Meyer, 2006: 25). PageRank’s computation is concerned with this second dimension. It is therefore merely a signal among others. Despite growing debate around this point, it still plays a dominant role in the algorithm’s general functioning and its spirit indirectly influences a great number of other signals that provide it with enhanced precision and strength. PageRank is a score from 1 to 10 on a logarithmic scale which measures the number of links received by the page from other pages. It does so by considering that the sites send one another a force which soon came to be known as the “Google juice” or “link juice” in the referencing lingo.

The Organic and the Strategic

18PageRank was immediately able to radically improve the quality of web searches because it best espoused the Internet spirit by offering a cognitive artefact, just like a mirror, which turned the distribution of interactions between documents into a metric that showed them their respective authority. PageRank was designed to give Internet users feedback only on the judgements they have each made about one another through their links. James Grimmelmann (2009: 941) pointed out that “The genius of Google is that its creators didn’t come up with a great organizational scheme for the web. Instead, they got everyone else to do it for them”. Thus, Google has often been criticised for feeding on work that is not its own, increasing its relevance through others’ volunteered energy and, better yet, accumulates a wealth entirely owed to Internet users’ work (Pasquinelli, 2009; Moullier-Boutang, Rebiscoul, 2009; Kyrou, 2010; Vaidhyanathan, 2011). We should however further reflect on this paradox: by supporting the idea that the ranking produced by its algorithm is natural, or “organic”, according to the accepted terminology, with relentless energy and sometimes against the evidence, Google also endeavours to fulfil a statistical constraint that is necessary to PageRank’s pertinence: being absent from Internet users’ intentions.

Wisdom of the Crowds

19What principle can Google’s algorithmic approach articulate to justify its preference for the automatic aggregation of the uncertain, scattered and random judgements of the crowd of Internet users? Whereas the Science Citation Index ultimately relied on the assumption of the citation’s honesty, embedded in the normative structure of the scientific field’s functioning, PageRank does not have this kind of substantial foundation to justify its proceduralism. Instead it uses another type of justification, which is far more formal, as inspired mainly by statistics, and which has taken on the name of “wisdom of the crowds” (Surowiecki, 2008; Orrigi, 2008). The theory of Internet users’ collective intelligence, of which PageRank is always cited as the finest example, draws on a body of work combining mathematics and political philosophy to prove large numbers’ epistemic superiority. Two different hypotheses from the various conceptualisations of the “wisdom of the crowds” (Landemore & Elster, 2012) support PageRank’s claim to measuring authority on the web. The first primarily associates it with the miracle of aggregation, rooted in the Condorcet jury theorem which posits that finding the right solution to an epistemic question simply requires a vote by as many people as possible, provided the majority of participants have a positive probability of finding the right solution, and that they do not influence one another. If these conditions are met, the greater the number of voters, the more certainty there is that the majority vote will be the right one. This theorem is also the principle of the famous Galton experiment carried out in 1906, where the public in a cattle market was asked to vote to estimate the weight of a cow. The public, taken as a whole, is therefore epistemologically more reliable than each of its constituent members, however expert some of them may be (Landemore, 2010). Scott Page’s work added a new dimension to this property, by showing that it is more important to value voters’ cognitive diversity than their intelligence. This statistical underpinning informed a first understanding of the “miracle of aggregation” according to which it is important to avoid the coordination and influence effects that voters can have on one another, and which encourages the development of individualising judgement systems like predictive markets (Sunstein, 2006).

20Unlike this “aggregative” approach, a second, “deliberative” interpretation of the wisdom of the crowds hypothesis stresses the self-organised effects of the coordination of judgements on Internet. Inspired by the Habermassian model of discussion, a highly optimistic version of such coordination is found in Yochai Benkler’s La richesse des réseaux (2009: 309 and following). Using multiple analyses of the blogosphere as examples [5], he highlighted the self-organisation mechanisms through which the small scattered conversation circles on Internet communicate with one another so as to gain visibility in the Google search engine, through successive selection. By way of a series of iterations these forms of coordination between decentralised conversations [6] can be used to extract content from its initial production circle to make it known to others and facilitate its circulation in the stratified space of web visibility. However these journeys towards visibility should not be compared to the measurement of Internet users’ opinions, as produced by individualising questioning systems like opinion polls. They are the emerging product of a bottom-up, spontaneous coordination without a central organisation.

21The wisdom-of-the-crowds hypothesis is fundamentally contingent on a third aspect, probably the most stringent one. The judgements Internet users exchange through their links must be subject to “uncoordinated coordination” (Benkler, 2009: 33). The aggregation of exchanges between hyperlinks recorded by PageRank is the outcome of individual actions that did not hold this coordination as their intention, Jon Kleinberg and Steve Lawrence argued (2001: 1849). For the crucial condition of its functioning is that Internet users do not act according to PageRank, and that their choice of links “naturally” distributes honour and oblivion. If the judgements that users exchange through links were produced according to the meta-coordinator that aggregates them, this would profoundly alter the epistemic relevance of the outcome. The different models developed under the banner of wisdom of the crowds all distinguish between the local, non-intentional and immediate character of the formation of individual judgements (in the jury theorem) the discussion enclaves (in the decentralised deliberation model), and the formal aggregation tool used to represent these judgements, without having initiated them in any way. The externality assumption is a condition of the possibility of collective intelligence. A wisdom-of-the-crowds system, Daniel Andler emphasised (2012), “may be regarded as intelligent if one is prepared to sever the link between the two components of intelligence: the world understanding is achieved, in a distributed fashion, by the individual members of the group (each one possessing a partial yet genuine understanding), while the search for a solution is achieved by the architecture of the system in a purely formal (i.e. semantically blind) fashion. Examples which come to mind are search engines such as Google and other internet-based tools”. PageRank’s referencing choices are made all the more pertinent by the fact that the aggregator of Internet users’ judgements is absolutely external to their decisions. The web substitutes the assumption of the citation’s honesty required by the substantial norms of the scientific sphere with an expectation which is simply procedural though hard to verify, that of sincerity: that Internet users have not thought about Google.

The Google Machine

22Whereas in the scientific article on PageRank the founders posited that the principles of information search and advertising are incompatible (Brin & Page, 1998), in 2004 Google invented an advertising model with unmatched efficiency. It also drew on a singular algorithmic apparatus, inspired by Bill Gross’s Goto service. But Google added three particular improvements to Adwords: with so-called “Vickrey” keyword auctions (the winner pays the cost per click of the second auction), a lexical analysis of the advertisement page checks its informational relevance, at the risk of deranking the link if it does not deliver on what the keyword announces. Furthermore, an analysis of Internet users’ clicks on the different advertisement links can, through learning, change the order of the rankings. However the main difference from Goto introduced by Google was to firmly refuse to mix the results of the algorithm’s “natural” ranking with the ones resulting from the auctions sold to advertisers. Google distinguished itself from its competitors through this separation of the “natural” search from advertising links. In Google lingo this barrier, a real line of “separation between the Church and the State» (Cassin, 2007: 139), has come to be known as the “Great Wall of China”. Google not only presents users with an interface that isolates advertising from editorial content more efficiently than its competitors did, but has also drawn a boundary across the conflict zone between mathematics and the market, which runs through the corporate culture and the personality of its founders. While the science of algorithms must pursue its quest for perfection to best reflect Internet users’ actions, this must never involve Internet users acting according to Google, nor Google engineers interfering with the rankings. Google wishes to see this world as natural. In parallel, another world is open to advertisers wishing to fight over advertising auctions’ keywords. This world is openly fully strategic and instrumental. Seen from the Great Wall of China, there are two ways of acquiring visibility on Google’s pages: either through the reputation acquired from others, without Google, or by paying for visibility… to Google. The partitioning of the results page into two worlds, organic and strategic, conveys a vision of the web and Internet users which Google has imposed on the entire ecosystem of the web, through all possible means.

No Touching the Algorithm by Hand

23The separation between natural referencing and advertisement referencing was first presented as the digital answer to traditional media’s economic model. In 2004, the founders of Google wrote a letter to the future shareholders explaining that what Google measures in the left column should not be contaminated by what it sells in the right column. They justified this with the press editorial model: “Our search results are the best we know how to produce”, boasted Google. “They are unbiased and objective, and we do not accept payment for them (…). We also display advertising, which we work hard to make relevant, and we label it clearly. This is similar to a newspaper, where the advertisements are clear and the articles are not influenced by the advertisers’ payments” [7]. Google was smart to impress journalists with its trust in sharing that is central to the media’s professional ethics. But the reality of its design of an “objective” and “neutral” editorial world is different from that of professional journalism. The “objectivity” Google refers to is “mechanical” and rooted in the separation of scientific work into disciplines, initiated at the end of the 19th century, with the imperious desire to eliminate all human intervention in favour of methods and machines capable of directly imprinting nature on the searcher’s screen (Daston & Galison, 2012). The impartial detachment requiring a high level of self-control from journalists is not a virtue Google can hope to reach. However objective their deontology may be, the gatekeepers of traditional media will always be subject to passions, choices or interests introducing biases in their ordering of information [8]. The goal of objectivity, as Michel Porter has shown (1995), ultimately has less to do with the truth of nature than with the effort to evict human judgement, carried out by scientists against their own subjectivity. Fuelled by science, mathematics and large numbers, the Mountain View company’s fundamental conviction is that to neutralise the vagaries of human judgement, it is best to trust algorithms and to stick to that. Any attempt to correct an unsatisfactory result by hand is the start of a corruption of the service. In an article entitled “Why We Sell Advertising, Not Results”, Google refused to be held responsible for the results of “organic” searches. The company positioned itself outside the activities of its algorithm: “Our results reflect what the community considers to be important, not what we or our partners think you should see” [9]. This concern for delegating responsibility for the rankings to a computation rule saves the company from having to justify itself against the multiple accusations levelled against it. When the Google bombing affairs occur (i.e. coordinated action by Internet users to associate a site to a specific request, for example George Bush’s official page with the request “miserable failure”), when its search help system’s suggestions make anti-Semite terms come up, when a company deems its competitor unfairly better ranked, etc., Google refuses to make corrections by hand or to set a filter on its algorithm (Grimmelmann, 2009). Amit Singhal, the PageRank architect, answered the question “Does Google edit its results by hand?” with irony:

“Let me just answer that with our third philosophy: no manual intervention. In our view, the web is built by people. You are the ones creating pages and linking to pages. We are using all this human contribution through our algorithms. The final ordering of the results is decided by our algorithms using the contributions of the greater Internet community, not manually by us. We believe that the subjective judgement of any individual is, well… subjective, and information distilled by our algorithms from the vast amount of human knowledge encoded in the webpages and their links is better than individual subjectivity” [10].
Faced with glitches in the algorithm’s results, when local mistakes are detected in a given site’s ranking, Google refuses to patch the algorithm “by hand” to re-establish an accurate ranking. The Search team’s engineers always seek an automatable rule to tackle generally the imperfections detected. Machines have qualities which humans do not, but above all they have virtues stemming from their weakness. As Eric Schmidt (2004) spectacularly asserted, their strength comes from their stupidity: “I can assure you. It has no bias. These are computers, they’re boring. I’m sorry you just don’t get it.” Many criticisms “anthropomorphise” Google’s algorithm to ask it to behave like a human. The criticism is the same as that addressed by the sociology of media to a newspaper’s editorial staff: partiality, a taste for the common, forgetting the peripheries, conformism. But Google does not consider its algorithm as a human being, it adopts another ontology. The machine ranking’s decisions are procedural, whereas human judgements, however informed by rules or ethics they may be, remain substantial. What disqualifies human judgement is this embarrassing propensity to judge and evaluate substantially, to always want to appreciate the validity, rationality or common sense of the ranking of replies. This is something algorithms cannot do. They are stupid and this stupidity is the best guarantee of their “objectivity”.

Don’t Pay Attention to Us!

24Google’s best PR to Internet users would surely be to keep quiet. But it is futile to try to be forgotten, and Google is forced to communicate on its desire to be invisible. The company is constantly telling Internet users to ask themselves “Would I do this if search engines didn’t exist?” [11]. In 2011, Amit Singhal, the manager of the Search Quality department, insisted: “Our advice for publishers continues to be to focus on delivering the best possible user experience on your websites and not to focus too much on what they think are Google’s current ranking algorithms or signals” [12]. Google asks to be invisible and merely recommends a set of common sense practices, to be more visible to the search engine: optimising keywords by checking that one’s page contains the terms most frequently used by users, working on the site’s design for it to be clear and legible not only for users but also for Google’s robot, and increasing the site’s loading speed. What Google’s spokesperson to webmasters, Matt Cutts, calls for is cooperation between the engine’s robots and the designer of the site. PageRank has established its order on the web by domesticating webmasters’ writing techniques. They now structure their sites based on the subtleties of the engine, thus increasingly tailoring their expression to what the engine can read. They have learnt how PageRank browses webpages: first the URL, then the title and subtitles, the importance of characters in bold and italics, the fact that PageRank only keeps the first anchor of a link when the same page is cited several times, the importance of tags in image files, the impossibility to crawl pdf files, etc. This intense familiarisation with the algorithm’s slightest procedures has turned into know-how available in training, advice guides, and tools to measure sites’ Google-compatibility. Normalising the ecosystem formed by the web and its prevailing engine has become a major industrial issue. But the definition Google provides of a high-quality site also emphasises rigour, rationality and originality, criteria directly borrowed from the most traditional ethics of bibliographic culture. Google provides webmasters with a list of good questions to ask themselves, like a real re-education programme to take their mind off the bad questions about being visible to the algorithm. These questions bring the company’s conception of a high-quality site into relief. First, an article must be written by “an expert or enthusiast who knows the topic well”, a “competent authority”, and the site itself must be a “recognised authority on its topic”. The best way of knowing if this is the case is to ask: “Is this the sort of page you’d want to bookmark, share with a friend, or recommend?” One should be able to “trust” the information, and content should be subject to “quality control”. Google stresses that the lack of reliability leaves traces which robots have learnt to detect. Content should not be “duplicate[d]”, it should not contain “spelling, stylistic, or factual errors” and should deliver “original content or information, original reporting, original research, or original analysis”. It is best to offer “substantial value when compared to other pages”, to “describe both sides of a story” by offering “a complete or comprehensive description of the topic”. In an austere and professorial fashion, Google even calls for “insightful analysis or interesting information that is beyond obvious”. For Google does not like content that is not “edited well”, that appears “sloppy or hastily produced” without “great care and attention”, or that is “short, unsubstantial, or otherwise lacking in helpful specifics” and “mass-produced by or outsourced to a large number of creators” or that contains “an excessive amount of ads that distract from or interfere with the main content”. This documentary vision of informational quality could only be better expressed by asking in fine: “Would you expect to see this article in a printed magazine, encyclopedia or book?” [13]. Seen from the Googleplex, the quality of digital information should always be measured by the standards of paper editions.

Calculative Internet Users

25For Google, reputation is either earned or bought. This distinction governs the separation between natural referencing and that of adwords, which the company has turned into a highly effective “bi-faceted” economic model. This cleavage embraces the spirit of the pioneers of Internet, also assuming a clear division between the market world of companies invited to fulfil their desire for visibility by buying keywords, and the non-market world of Internet users, who do not calculate their visibility but exchange sincere links. However Internet users have not lived up to the moral virtues ascribed to them by PageRank. Their world is not so “natural” and some of their links are not “sincere”. Rather than separating two different populations, the distinction between the market and non-market worlds runs through many publishing Internet users seeking reputation and visibility. Many of them are constantly calculative, seeking to be seen and competing to obtain a prime position in the search engine’s “organic” results. To do so, they seek to deform the structure of web links to their benefit, so as to capture more of the authority dispensed by PageRank. By acting according to the algorithm, these strategist Internet users undo the position of externality and invisibility aspired to by PageRank, act reflexively on the web’s structure (Espeland, 2007) and present Google with a problem that is both mathematical and moral. From the moment that judgements, i.e. links, have been strategically produced, they provide biased information that erodes the relevance of the search’s overall result. But by seeking to correct it, Google is forced to relinquish its proceduralism to produce a substantial definition of the quality of links and set itself up as the web police.

26The entire history of the algorithm’s evolution can be recounted as a game of cat and mouse between webmasters and the Mountain View company, to try to act strategically on PageRank on the one hand, and on the other hand to detect and punish these behaviours by reforming the algorithm before Internet users with strategic behaviours detect a new weakness to exploit. We surely do not fully appreciate the technological, market and moral stakes of this low-intensity war that has endured throughout the entire history of the web, ever since search engines became the main channel to access digital information. We probably also do not properly grasp the power struggle between the over-powerful and dominating Goliath of the web and the thousands of crafty Davids assailing him with a thousand handmade arrows. For in many respects, PageRank’s position against its assailants is in fact extraordinarily fragile, with some observers arguing that Google has long since lost the battle for an optimal ranking of the web (Ippolita, 2011; Mowshowitz & Kawaguchi, 2002; Diaz, 2005; Granka, 2010). The development of the Search Engine Optimization (SEO) market has transformed part of the web into a gigantic competition between actors publishing on it, to be seen by the algorithms. Part of this advisory activity, called “white hat SEO”, consists in making websites conform to Google robots by introducing the most appropriate html code (URL, link anchor, choice of keywords, etc.). But another sector of SEO activity (“black hat SEO”) consists in selling reputation. Making oneself visible means obtaining “link juice” from others. If it does not come naturally, it must be extorted, bought, or artificially produced. The techniques that webmasters resort to in order to secure renown by producing false links have constantly improved, to the point of becoming a real industry. By first registering their sites on a galaxy of more or less fake directories, catalogues and indexes, webmasters draw a series of links to themselves. By then seeking to place a link towards their sites on other sites, for example in comments on renowned blogs or on Wikipedia, webmasters drew “link juice” for a long time from those that were better endowed (a practice called spamdexing). But Google made this practice unproductive by creating a <nofollow> tag allowing sites to devalue certain links leaving from them. The Wikipedia encyclopaedia, for example, is now entirely in <nofollow> and does not distribute its authority to those it cites. A link black market opened up, allowing two sites to exchange links even without sharing any proximity, and a site to sell links to another, to create “link farms” that organise a real Potemkine village of fake sites linked together to be seen by PageRank, and then redistribute their capitalised force towards the client-site (the money site in SEO lingo). When the Google algorithm improved enough to detect link farms, cheats set out to constitute “content farms” for the chain production, often using linguistic robots corrected by underpaid interns, of inept content based on synonymic proximity designed to send a link towards the client company’s site. These platforms with shapeless content, called PR sites (for Press Release), are actually only written for robots. Panda, the algorithm’s latest development, precisely seeks to derank these sites with weak and duplicated content.

27This competition between the link market and the algorithm causes tension between two contradictory principles governing visibility on the web: audience and authority. By championing the natural link, PageRank considers that it circulates acts of recognition on the web and must therefore be rooted in the citing text’s quality. The link market, on the other hand, envisages the hypertext link as a purveyor of traffic, a simple signpost that does not need to be anchored in a high-quality text. This competition around the definition of what hypertext links put into circulation explains the successive changes in Google’s algorithm. With every revision, the algorithm has operated increasingly detailed sorting to distinguish, within webpages, the links conveying recognition (URL, titles, subtitles, bold links, links incorporated into the page’s text content) from those conveying less or no recognition (links in the page’s paratext, commercial links, <nofollow> links, etc.). Resolutely defending a meritocratic understanding of the force that circulates in hypertext links, Google has also undertaken to punish those who create links that circulate “fake authority”, by brutally de-ranking sites which cheat with the rules set by Google, a punishment which can prove disastrous for the victim sites. However this policy puts Google in a difficult position, caught between two contradictory ethics. Because it is procedural, PageRank’s position claims to be exempt from any substantial appreciation by imposing an abstract formalism on the web (counting links without looking at their content). However, as it supports a meritocratic conception of the link, it is increasingly having to make a substantial judgement on the nature of real and fake citations. By becoming the legislator and the policeman of writing on the web, Google is constantly losing its position of externality.

The Crisis of PageRank

28I have just drawn the outlines of the mind of PageRank, exploring as sympathetically as possible the justifications that fuel it. This methodological principle is necessary to avoid too readily reducing Google’s actions to its economic interests before having analysed them. It is currently facing a crisis. The virtue of authority it has endeavoured to promote is being increasingly undermined by the tensions it is caught up in, stemming from other information ranking principles: the popularity promoted by the logic of audience ratings, the countering affinity supported by the compelling rise of social media and, finally, the effectiveness measured by Internet users’ satisfaction, which guides the predictive personalisation of the algorithms.

From Authority to Popularity

29The first tension stems from the revelation effect that PageRank has, by making the extraordinarily unequal distribution of links on the web both visible and measurable. It is fuelled by an imaginary which projects a pastoral vision, onto the web, of a graph of small producers exchanging links to mutually point out their best products to one another. But the links between sites are not distributed in an equal order, ensuring that everyone, at least initially, has an equal possibility of receiving links from others. Since 2000, research on the structure of the web has tirelessly repeated that it is in no way rhizomic: a very small number of pages attract a considerable number of links while the vast majority of sites are linked to very few sites and are often cited by none (Broder et al., 2000; Adamic & Huberman, 2001). This is unquestionably the case: 90% of the PageRank of the web is taken up by 10% of sites (Pandurangan et al., 2006). Albert-Laszló Barabási (2002: 58) cruelly highlighted that “hubs are the strongest argument against the utopian vision of an egalitarian cyberspace. Yes, we all have the right to put anything we wish on the Web. But will anybody notice? [Hubs] are very easy to find, no matter where you are on the Web. Compared to these hubs, the rest of the Web is invisible”. PageRank not only makes the distribution of authority as a power law visible, it also reinforces it through the host of concentration, asymmetry and hierarchy effects inherent to network structures. The most famous one was the Matthew effect identified by Robert Merton (1968) in scientometrics, whereby the system of recognition of the scientific community contributes to the fact that “eminent scientists get disproportionately great credit for their contributions to science while relatively unknown scientists tend to get disproportionately little credit for comparable contributions”. The “rich”, large actors of the web and sites with a very high popularity capital offline (companies, media, institutions) get even richer because the visibility they acquire on the network automatically attracts new additional links. The nodes that receive the most links have a halo effect which leads the other nodes to actively seek proximity to them so as to borrow some of their strength; this phenomenon results in some receiving undeserved authority (a phenomenon often called winner takes all), while many deserving others remain in the shadows. This effect is also reinforced by “preferential attachment” mechanisms which encourage sites to cite other sites with equal or greater authority than their own, and to refuse to cite those smaller than themselves (Cardon et al., 2011). As a consequence of these reinforcing effects, when looking at the top of the site ranking produced by search engines, authority (measured by the number of links) merges with popularity (the number of clicks by Internet users) (Hindman et al., 2003) [14]: the sites of companies, large media and institutions, as well as crucial web actors like Wikipedia, receive recognition (link juice) as much as audience (clicks) without it being possible to determine which variable acted on the other. The hypertext link conveys not the authority-prominence of the citation-based judgement with which the pioneers had endowed it, but the simple reflex-attention commanded by advertising’s mimicry mechanisms. Meritocratic authority is then merely a vain attempt to conceal the statutory authority of the powerful, which they owe to their central position in social life and to their economic capital (Diaz, 2005). By dominating the link hierarchy, they also impose on the web a ranking that gives excessive visibility to the central sites, which are average, conformist, uncontroversial and unoriginal.

Social Networks and Ranking People

30The second tension is a result of the democratisation of Internet user participation, made possible by the development of publication techniques that only require a very low involvement cost (Cardon, 2010). PageRank is elitist, only granting publishing Internet users (i.e. those producing hypertext links) the right to take part in the ranking of information on the web. With the massification of Internet usage, publishers’ monopoly on the order of information is increasingly being challenged. And with the development of new conversational writing formats on digital social networks (statuses, comments, “Like” or “+1” buttons and sharing tools like RT on Twitter), the publication act has become more like a simple audience feedback act. These new expressive forms have given new rights to younger audiences that are more geographically dispersed and tend to be from lower social strata than the “worthy” producers of hypertext links. But social media also organises an entirely different information ranking principle. Whereas PageRank measures links between documents, Facebook’s EdgeRank ranks documents according to subjective judgements exchanged by people connected by affinity. Instead of concealing the person behind the text, social networks’ conversational enunciation is more flexible, relaxed and immediate; it has conferred visibility to individuals’ subjectivity to make their judgements an identity signal that they project towards their sociability (Cardon, 2013). Whereas in the web of documents, the illocutionary force of the link is embedded in the authority of the citing text’s page, in the web of people, it is the enunciator’s digital authority, their e-reputation, that supports their enunciation. The social web’s metrics of affinity distribute towards the documents they rank an authority rooted in the people whom PageRank had sought to eclipse.

When the Apparatus Becomes a Machine

31The last tension weighing on the spirit of PageRank relates to the transformations Google engineers are constantly imposing on the algorithm, in their fight against referencing strategists and the link black market. With all the revisions and adjustments, the Google algorithm seems less and less like an apparatus set on the web to record it and increasingly like a machine steered with strategic precision by Amit Singhal’s Quality Search team [15]. Given the multiple pressures Google has been experiencing because of its commercial ambitions and its dominant position on the search engine market, and even though it denies it, the company is increasingly compelled to handle the results of its algorithm “by hand” and to make a cross on its concern for non-interventionist neutrality. Under pressure from national jurisdictions, it has had to censor certain racist and anti-Semitic sites in France and in Germany (Zittrain & Edelman, 2002). Pressure from the cultural industries has forced it to derank search results leading to sites which offer pirated content (Menell, 2012). Under pressure from companies, it has agreed to refuse the purchase of certain adwords by their competitors. The pressure of the industrialisation of cheating to artificially produce “link juice” has led Google to forego its position of externality, and to take on a policing position by filtering and punishing offenders. Recently, the company has even had to set up a reporting system allowing Internet users to report sites which fabricate reputation. There is hardly any doubt around the fact that the ideal of an apparatus of rules recording the web in order to rank it has been significantly undermined. While Google has sacrificed its position of externality because Internet users paid too much attention to Google, its industrial strategy has also contributed to making the claim of natural searches’ neutrality more and more fragile and rhetorical. This can particularly be seen in the fact that the company has developed many other services, the commercialisation of which may conflict with the search engine’s logic of neutrality. Moreover, Google’s algorithm increasingly incorporates so-called learning technology (machine learning) to compute the rankings presented to users. It is now no longer necessary to set multiple parameters attributing a weight to given signals, particularly PageRank, extracted from each page of the web to be deposited in Google’s Index. One can just let the learning techniques adjust these parameters case by case, based on the requests, on what Google knows about the user’s earlier practices, on the acquisition of knowledge afforded by the links clicked on by other Internet users for a same request (Granka, 2010) and ultimately on human judgements on the relevance of the sites, collected by the quality raters hired by Google (PotPieGirl, 2011). Google is increasingly replacing the principle of authority which made the strength of PageRank, with a principle of effectiveness that sends Internet users the choices that the algorithm has learnt from their behaviour, ever more appropriately [16]. The machine invented by Google has become so complex, so sensitive to the statistical tests that keep parameterising it, so variable- and trace-consuming, and so self-learning, that its behaviours can no longer be understood nor interpreted, not even by its creators.

Bibliography

References

  • ADAMIC L., HUBERMAN B. (2001), “The Web’s Hidden Order”, Communications of the ACM, vol. 44, n° 9, pp. 55-60.
  • ANDLER D. (2012), “What Has Collective Wisdom to Do with Wisdom?”, in LANDEMORE H., ELSTER J. (eds), Collective Wisdom: Principles and Mechanisms, Cambridge, Cambridge University Press.
  • BARABÁSI A.-L. (2002), Linked: The New Science of Networks, Cambridge, Perseus Publication, 2002.
  • BATELLE J. (2005), The Search. How Google and Its Rivals Rewrote the Rules of Business and Transformed Our Culture, New York, Portfolio, 2005.
  • BENKLER Y. (2009), La richesse des réseaux. Marchés et libertés à l’heure du partage social, Lyon, Presses Universitaires de Lyon.
  • BRIN S., PAGE L. (1998), “The Anatomy of a Large-Scale Hypertexual Web Search Engine”, Proceedings of the Seventh International Conference on World Wide Web.
  • BRODER A., KUMAR R., MAGHOUL F., RAGHAVAN P., RAJAGOPALAN S., STATA R., TOMPKINS A., WIENER J. (2000), “Graph Structure in the Web”, Computer Networks, vol. 33, n° 16, pp. 309-320.
  • CARDON D. (2010), La démocratie Internet. Promesses et limites, Paris, Seuil/République des idées.
  • CARDON D. (2011), “L’ordre du Web”, Médium, n°29, Octobre-December, pp. 191-202.
  • CARDON D. (2013), “Du lien au like. Deux mesures de la réputation sur Internet”, Communication, forthcoming.
  • CARDON D., FOUETILLOU G., ROTH C. (2011), “Two Paths of Glory. Structural Position and Trajectories of Websites within their Topical Community”, ICWSM 2011, Barcelona, 17-21 July.
  • CASSIN B. (2007), Google-moi. La deuxième mission de l’Amérique, Paris, Albin Michel.
  • CITTON Y. (2013), “Le retour de l’objectivité ?”, La Revue des livres, n° 9, January-February, pp. 3-12.
  • DASTON L., GALISON P. (2012), Objectivité, Paris, Les Presses du réel.
  • DIAZ A. M. (2005), Through the Google Goggles: Sociopolitical Bias in Search Engine Design, Thesis, Stanford University, May.
  • EDWARDS D. (2011), I’m Feeling Lucky. The Confession of Google Employee Number 59, London, Allen Lane/Penguin Books.
  • ESPELAND W. N. (2007), “Rankings and Reactivity: How Public Measures Recreate Social Worlds”, American Journal of Sociology, vol. 113, n°1, July, pp. 1-40.
  • FARRELL H., DREZNER D. W. (2008), “The Power and Politics of Blogs”, Public Choice, 134, pp. 15-30.
  • FORSYTH E., KATZ L. (1946), “A Matrix Approach to the Analysis of Sociometric Data: Preliminary Report”, Sociometry, n° 9, 1946, pp. 340-347.
  • FOUCAULT M. (2001), “Qu’est-ce qu’un auteur ?”, Dits et écrits I, 1954-1975, Paris, Gallimard/Quarto, pp. 817-849.
  • GARFIELD E. (1955), “Citation Indexes for Science. A New Dimension in Documentation through Association of Ideas”, Science, n°122.
  • GERHART S. (2004), “Do Web Search Engines Suppress Controversy?”, First Monday, vol. 9, n° 1.
  • GOODIN R. E. (2003), “Democratic Deliberation Within”, in FISHKIN J., LASLETT P. (eds), Debating Deliberative Democracy, Malden, Blackwell Publishing.
  • GRANKA L. A. (2010), “The Politics of Search: A Decade Retrospective”, The Information Society, 26, 2010, pp. 364-374.
  • GRIMMELMANN J. (2009), “The Google Dilemma”, New York Law School Law Review, vol. 53.
  • HINDMAN M. (2008), “What is the Online Public Sphere Good For ?”, in TUROW J., TSUI L. (eds), The Hyperlinked Society, Chicago, University of Michigan Press.
  • HINDMAN M. (2009), The Myth of Digital Democracy, Princeton, Princeton University Press.
  • HINDMAN M., TSIOUTSIOULIKLIS K., JOHNSON J. A. (2003), “’Googlearchy’: How a Few Heavily-Linked Sites Dominate Politics on the Web”, Paper presented at the annual meeting of the Midwest Political Science Association.
  • IPPOLITA (2011), Le côté obscur de Google, Paris, Rivages, 2011.
  • KLEINBERG J. (1998), “Authoritative Sources in a Hyperlinked Environment”, Proc. 9th ACM-SIAM Symposium on Discrete Algorithms, 1998. Also appears as IBM Research Report RJ 10076, May 1997.
  • KLEINBERG J., LAWRENCE S. (2001), “The Structure of the Web”, Science, 294, November.
  • KYROU A. (2010), Google God. Big Brother n’existe pas, il est partout, Paris, Inculte.
  • LANDEMORE H., “La raison démocratique : Les mécanismes de l’intelligence collective en politique”, Raisons Publiques, n°12.
  • LANDEMORE H., ELSTER J. (2012) (eds), Collective Wisdom: Principles and Mechanisms, Cambridge, Cambridge University Press.
  • LANGVILLE A. N., MEYER C. D. (2006), Google’s PageRank and Beyond: The Science of Search Engine Rankings, Princeton, Princeton University Press.
  • LÉVY P. (1991), “L’hypertexte, instrument et métaphore de la communication”, Réseaux, n°46-47, pp. 59-68.
  • LEVY S. (2011), In the Plex. How Google Thinks, Works and Shapes our Lives, New York, Simon & Schuster.
  • MAYER K. (2009) (Katja), “On the Sociometry of Search Engine. A Historical Review of Methods”, in BECKER K., STALDER F. (eds), Deep Search. The Politics of Search Beyond Google, Insbruck, StudienVerlag, 2009, p. 54-72.
  • MENELL P. S. (2012), “Google, PageRank and Symbiotic Technological Change”, UC Berkeley Public Law Research Paper, n° 2136185, August 26.
  • MERTON R. (1977), “The Sociology of Science: An Episodic Memoir”, in MERTON R. K., GASTON J. (eds), The Sociology of Science in Europe, Southern Illinois University Press, Carbondale, pp. 3-141.
  • MERTON R. K. (1968), “The Matthew Effect in Science”, Science, vol. 159, n° 3810, 1968, p. 56-63.
  • MERTON R. K. (1996, 1st ed. 1957), “The Reward System of Science (1957)”, in MERTON R., SZTOMPKA P. (eds), On Social Structure and Science, Chicago, Chicago University Press, pp. 286-304.
  • MOULLIER-BOUTANG Y., REBISCOUL A. (2009), “Peut-on faire l’économie de Google ?”, Multitudes, n°36, pp. 83-93.
  • MOWSHOWITZ A. KAWAGUCHI A. (2002), “Bias on the Web”, Communications of the ACM, vol. 45, n° 9.
  • ORIGGI G. (2008), “Sagesse en réseaux : la passion d’évaluer”, La Vie des Idées, 30 September.
  • PAGE L. (1998), “Method for Node Ranking in a Linked Database”, Patent #6285999, 9 January.
  • PAGE S. (2006), The Difference. How the Power of Diversity Creates Better groups, Firms, Schools and Societies, Princeton, Princeton University Press, 2006.
  • PANDURANGAN G., RAGHAVAN P., UPFAL E. (2006), Pandurangan (Gopal), “Using PageRank to Characterize Web Structure”, Internet Mathematics, vol. 3, n°1, pp. 1-20.
  • PASQUINELLI M. (2009), “Google’s PageRank. Diagram of the Cognitive Capitalism and Rentier of the Common Intellect”, in BECKER K., STALDER F. (eds), Deep Search. The Politics of Search beyond Google, Insbruck, StudienVerlag.
  • PENNOCK D. M., FLAKES G. W., LAWRENCE S., GLOVER E. J., GILES C. L. (2002), “Winners Don’t Take All: Characterizing the Competition for Links on the Web”, Proceedings of the National Academy of Sciences, vol. 99, n° 8, April, pp. 5207-5211.
  • PINSKI G. NARIN F. (1976), “Citation Influence for Journal Aggregates of Scientific Publications”, Information Processing and Management, 12, pp. 297-312.
  • PORTER T. M. (1995), Trust in Numbers. The Pursuit of Objectivity in Science and Public Life, Princeton, Princeton University Press.
  • POTPIEGIRL (2011), “Google Raters. Who Are They ?”, PotPieGirl.com, 17 November.
  • RIEDER B. (2012), “What Is in PageRank? A Historical and Conceptual Investigation of a Recursive Status Index”, Computational Culture. A Journal of Software Studies, n°2, 28 September.
  • SCHMIDT E. (2004), Keynote Address at the 2004 Conference on Entrepreneurship, Stanford University Graduate School of Business.
  • SHEFF D. (2004), “Playboy Interview: Google Guys”, Playboy, vol. 51, n°9, September, pp. 55-60.
  • SINGHAL A. (2008), “Introduction to Google Ranking”, Official Google Blog, 9 July.
  • SUNSTEIN C. R. (2006), Infotopia. How Many Minds Produce Knowledge, New York, Oxford University Press.
  • SUROWIECKI J. (2008), La sagesse des foules, Paris, Jean-Claude Lattès.
  • VAIDHYANATHAN S. (2011), The Googlization of Everything (and Why We Should Worry), Berkeley, University of California Press.
  • VISE D. A., MALSEED M. (2006), Google Story. Enquête sur l’entreprise qui est en train de changer le monde, Paris, Dunod.
  • WOUTERS P. (1999), The Citation Culture, doctoral thesis, University of Amsterdam.
  • WOUTERS P. (2006), “Aux origines de la scientométrie. La naissance du Science Citation Index”, Actes de la recherche en sciences sociales, n° 164.
  • ZITTRAIN J., EDELMAN B. (2002), “Localized Google Search Result Exclusions”, Berkman Center for Internet & Society at Harvard Law School, 26 October.

Mise en ligne 10/14/2013

Notes

  • [1]
    This work was carried out under the ANR project “Algorithm Policy” (ALGOPOL – ANR 2012 CORD 01804).
  • [2]
    See the article by D. Pontille and D. Torny in this issue.
  • [3]
    Video of a conference by Sergey Brin at Berkeley University’s School of Information on 3 October 2005: UCBerkeley, “SIMS 141 – Search, Google, and Life: Sergey Brin – Google”, YouTube, 20 August 2007.
  • [4]
    For a more comprehensive discussion on this topic see Cardon (2013).
  • [5]
    Particularly the article by Farrell and Drezner (2008), whose findings were challenged by Hindman (2008).
  • [6]
    Goodwin (2003) supports a “disjointed deliberation” model, whereby a plurality of conversation circles can circulate the expectations of their debates between one another owing to certain members’ co-belonging to different groups.
  • [7]
    Google Inc., Letter from the Founders: “An Owner’s Manual” for Google’s Shareholders, in Forms S-1 Registration Statement Under the Securities Act of 1933.
  • [8]
    More or less surprisingly for an IT article, the founders of Google slipped a citation from Ben H. Bagdikian’s book The Media Monopoly into the bibliography of the WWW conference’s most famous paper (Brin & Page, 1998). This book severely denounced journalistic biases and the effects of economic concentration on the pluralism of the press.
  • [9]
    Google Inc., “Why We Sell Advertising, Not Results”, Google.com, 2004 [http://www.google.com/honestresults.html].
  • [10]
    Singhal (A.), “Introduction to Google ranking”, Official Google Blog, 9 July 2008.
  • [11]
    Webmaster Tools, “Webmaster Guidelines”, http://www.google.com/support/webmasters/bin/answer.py?answer=35769
  • [12]
    Google Webmaster Central, “More guidance on building high-quality sites”: http://googlewebmastercentral.blogspot.gr/2011/05/more-guidance-on-building-high-quality.html
  • [13]
    All these citation excerpts are taken from Google Webmaster Central, “More guidance on building high-quality sites”: http://googlewebmastercentral.blogspot.gr/2011/05/more-guidance-on-building-high-quality.html
  • [14]
    However the superposition of the authority and audience rankings only applies for the head of the results lists. They can be distinguished as soon as we leave the top of the rankings or the keywords become more complex and less central (Pennock et al., 2002).
  • [15]
    See Citton (2013) on the distinction between apparatuses and machines.
  • [16]
    On the claim of learning algorithms to governing reality by embracing its every dimension, see the article by T. Berns and A. Rouvroy in this issue.
bb.footer.alt.logo.cairn

Cairn.info, a leading platform for French-language scientific publications, aims to promote the dissemination of high-quality research while fostering the independence and diversity of actors within the knowledge ecosystem.

Supported by

Find Cairn.info (in French) on

18.97.14.87

Institution Login

Search

All institutions