Return to UOCC HomeComputing News Home
Header bar

Web Presence and Influence: How Google Rates Leading American Universities in 2006

Joe St Sauver, Ph.D.
Director, User Services and Network Applications
joe@uoregon.edu

Of the 4,236 American colleges and universities[1], only a few are titans in the world of ".edu" websites.

Which domains form the core of the .edu online web? Which have the most clout? We looked at a total of 215 colleges and universities, focusing primarily on large national research universities connected to Internet2, plus some additional institutions of special relevance to Oregon. We think you may be surprised by what we found.

Web Presence

We began by considering "web presence," which for the purpose of this study was simply the raw number of web objects[2] or web pages a university has online. As most folks know, there's no definitive register of all university web pages. We do, however, have something that's not a bad approximation: the set of all web pages known to Google. By doing constrained Google searches using the Google "site:" query modifier, we can limit our search of Google to a particular domain or subdomain and find out how many web pages from that domain or subdomain are known by Google to exist.

For example, we can begin by checking to see approximately how many web pages are in the entire .edu domain by Googling for "site:.edu" At the time this article was being written, the answer to that query was a staggering 2.67 billion pages. If those 2.67 billion .edu web pages were uniformly distributed across all 4,236 American colleges and universities, that would imply each school would have about 630,000 web pages. Realistically, however, we know that the distribution of web pages isn't uniform: some schools have millions of web pages, while others may only have a few thousand, or (hard as it may be to believe) none at all.

What about the University of Oregon? If you check site:uoregon.edu Google will tell you that there are currently a respectable 5.2 million web pages living in the .uoregon.edu domain.[3] That's a lot of web pages! But as large as the UO's web presence may be, there are some institutions that have quietly been building absolutely huge online web presence:

Obviously, we're seeing a tremendous concentration of online information. It is also clear that some institutions, faculty, staff, students, or programs are doing an excellent job of bringing content online. Sheer institutional web page count isn't the only measure worth examining, however. What about "web influence?"

Web Influence

For the purpose of this study, we'll define an influential web page to be one with a high Google PageRank score. In Google's system, each page has a PageRank from 0 to 10, with the most important[5] websites having a PageRank score of 10. For example (not surprisingly), http://www.google.com/ itself is rated a 10, as is http://www.nytimes.com/ and http://www.whitehouse.gov/   By way of comparison, http://www.yahoo.com/, http://www.msn.com/, http://www.aol.com/, http://www.cnn.com/, and http://thomas.loc.gov -- all popular and important web pages -- have a PageRank score of "only" 9.

Curious about the PageRank of a website we haven't mentioned? Install Google's Toolbar.[6] Once the Google Toolbar has been installed as an add-on to your web browser, the Toolbar will automatically display the PageRank of each web page you visit. For example, if you install the Google Toolbar and test http://www.uoregon.edu/ you'll see the UO receives a very respectable (and typical for a large research university) PageRank score of 8.

What's the PageRank of Other American Universities?

• Two American university home pages have a Google PageRank of 10: MIT and Harvard.

• There are 45 American universities whose home pages have a Google PageRank of 9 or more: the two schools just mentioned, plus Berkeley, Stanford, Washington, Wisconsin, Texas, Illinois, Cornell, Michigan, Yale, Columbia, Penn State, UCLA, Chicago, Maryland, Penn, Minnesota, Princeton, Indiana, Michigan State, UC Davis, UC Irvine, Arizona, UC San Diego, UC Santa Barbara, Carnegie Mellon, North Carolina, Purdue, Duke, Johns Hopkins, Arizona State, Rutgers, Northwestern, USC, Iowa State, Pittsburgh, Iowa, Brown, Washington University at St Louis, Caltech, UC San Francisco, Colorado, Massachusetts, and Florida State.

• PageRank 8 schools (besides the UO) include Case Western, Dartmouth, Emory, GWU, Georgetown, LSU, Missouri, Nebraska, Notre Dame, NYU, Ohio State, Oregon State, RPI, Utah, and many others.

• Many PageRank 7 (and lower) schools are smaller liberal arts colleges, regional colleges, colleges with limited websites, and the like. Test some other college and university sites you're familiar with and see what you think.

A Relationship Between PageRank and Raw Institutional Page Count?

Some of the universities with the largest number of web pages are also among those with the highest PageRank scores. You might be tempted to generalize from that observation to assert that raw institutional page count strictly influences a site's PageRank. While that would be delightfully straightforward and might very well inspire a stampede of new page creation as universities attempt to catch up to MIT and Harvard, unfortunately things don't actually work that way.

For example, Berkeley and Stanford (both PageRank 9 schools) each have more pages online than Harvard (a PageRank 10 school). Similarly, there are some PageRank 9 schools which run as low as 3.51 million pages (Florida State), while there are other PageRank 8 schools with over 15 million pages each (such as Vanderbilt, Virginia and Florida).

The two measures, page count and PageRank, while generally positively correlated, are not strictly linked. Influence (as measured by a high PageRank score) requires more than just lots of web pages.

Discussion/Limitations

• Web page counts and PageRank values are dynamic. At least some of the sites in this study are currently under active revision, and the values reported in this study may have changed since our data was collected. Fortunately you can always reconfirm page counts and PageRank values for yourself using the approach described above.

• PageRank, like straight "A-B-C-D-F" grading without +'s and -'s, is a relatively crude measurement construct. That is, an "8" home page might be a very high 8--almost a 9--but PageRank will still report it as an 8. Similarly a "9" might be a very low 9, almost an 8, but it will still be reported as a 9 because the Google Toolbar reports only integer-valued PageRank scores.

• For this study, we measured the PageRank of the institution's default home page, normally http://www.<domain>.edu/ In some cases, however, a number of sites immediately redirect visitors away from that "normal" URL to some other semi-obscure page, a page that is virtually certain to look low-ranked to Google (even though it is effectively the institution's default home page). For example, if you go to http://www.tufts.edu/ you'll currently be redirected to http://www.tufts.edu/main.php?p=flash Although Tufts.edu has 6.19 million web pages, the http://www.tufts.edu/main.php?p=flash"home page" scores only a disappointingly low PageRank score of 5 rather than the more typical 8 or 9 that one would expect for a site of its size.

• Many institutions have institution-only web pages that are not known to Google or accessible to the public; obviously those internal pages are not reflected in what Google sees, counts, and evaluates. Institutions such as MIT that make their courseware broadly available are at a substantial advantage with respect to online presence relative to sites that hold their instructional web pages in a proprietary teaching and learning system such as Blackboard.

• Some institutions may use non- .edu domains as well as .edu domains, or an institution may have (and use!) more than one legacy .edu domain. Our study just looked at the primary .edu domain name for each school.

• Because our focus was largely on Internet2 member schools plus selected additional institutions of special relevance to Oregon, we may have missed one or more high PageRank institutions. If you're a university with a 9 or 10 PageRank and we missed you, please let us know. Please also note that we've not evaluated international universities such as Toronto or McGill in Canada or Cambridge or Oxford in the UK.

• We've made no attempt to dig down and figure out site-by-site what makes up the huge number of pages that some institutions are fielding.[7]

Notes: [back to top]

[1] Chronicle of Higher Education "Almanac Issue," August 26, 2005, page 4.

[2] Technically, some of the online objects indexed by Google don't look much like a traditional web page, but we will colloquially refer to them all as "web pages" for the remainder of this article.

[3] There may be UO web pages that are not known or are inaccessible to Google. For example, most UO Blackboard web pages will not get indexed by Google because they are all access-controlled.

[4] For a copy of the complete data, see http://www.uoregon.edu/~joe/google-data.html

[5] http://www.google.com/technology/

[6] For example, for the UO's recommended web browser, Mozilla Firefox, see http://www.google.com/tools/firefox/toolbar/index.html

[7] We'd love to hear from folks working on major web projects at .edu sites with ten million pages or more known to Google. What are you putting up? Digitized library resources? Open source software project pages? Archives of mailing list postings? Collections of digital images? Student portfolios? We'd love to hear about what's currently up and what's in process.


Spring 2006 Computing News | Computing Center Home Page