skip to content
 

Webmasters, and others maintaining information within the University, need to understand more about how our search service is provided than do other users. The search service is actually provided by one or other of a pair of servers running the Funnelback search software (and accessed via the web user-interface on search.cam.ac.uk. Another server is used for test and development work (and does not provide any public facilities, though it may be seen in web server logs making requests to web sites).

All the web-crawling (fetching pages from web sites) and index-building is done by one of the pair of live servers and the other handles search requests received via search.cam.ac.uk.  The indexing server has copies of the latest search index files so that it can take over handling search requests temporarily, if that is required.

The choice of which Funnelback server to use is made by search.cam.ac.uk, depending on their availability. The test/development server does similar web-crawling and indexing, though on a much smaller scale.

User-Agent strings

The User-Agent string used by the Funnelback servers in their HTTP(S) requests to web sites is

University of Cambridge search (search-support@ucs.cam.ac.uk)

IP addresses and access controls

Web-crawling (fetching web pages for indexing) is normally done by the primary member of the pair of servers providing the live search service, and on a much smaller scale by the test/dev server. The hostnames and IP addresses are may change over time, but as of May 2021 are

  • live "internal" web-crawling (from within the CUDN) the indexing server fb-live1.search.cam.ac.uk = 131.111.8.189.
  • test/development "internal" web-crawling - fb-dev1.search.cam.ac.uk = 131.111.8.27

For "external" web-crawling (to find web pages that are accessible to the general public from outside the CUDN, and without authentication), an alternative address is used - 192.153.213.251, having a dummy place-holder name (ext-proxy.web-search.invalid in DNS reverse lookup). The "external" address is actually associated with the University, but is within an address block used for infrastructure, and not advertised as being part of the CUDN. It should not be treated as part of the CUDN (or as *.cam.ac.uk hosts) for access control purposes.

The previous version of the search engine used a different set of addresses:

  • live "internal" web-crawling (from within the CUDN) - usually the primary server fb1.search.cam.ac.uk = 131.111.8.108 (or very rarely, the backup server fb2.search.cam.ac.uk =131.111.8.113). Such traffic might also be seen as coming from fb.search.cam.ac.uk and its address 131.111.8.118.
  • test/development "internal" web-crawling - fbdev.search.cam.ac.uk = 131.111.8.34 
  • an alternative pair of addresses were used - 192.153.213.250 and 192.153.213.250

Note that there are some documents that the search engines will not be able to index. There can be many reasons for this (see What can go wrong, but in particular this will be the case for documents that can only be accessed from particular department or college networks, or which are password protected and limit access to specific people. Such documents will not appear in either index and as a result will be difficult for University users to find. Wherever possible and appropriate, there are significant advantages to allowing at least the "internal" index to include restricted material - but note that extracts of the document text may be included in search results, to allow its relevance to the search to be assessed.

Last updated: April 2021

Phone padded  Service status line: (01223 7)67999
Website  Sign up for SMS/email status alerts
Website  Read major IT incident reports

UIS bITe-size bulletin

A regular newsletter aimed at the University's IT community, highlighting service and project news from UIS.

Sign up >

Latest news

Your University GoogleDrive: 20GB quota limit from December 2022

19 January 2022

Google is replacing its G Suite for Education model licensing model in October 2022. As a result, there will be a new limit of 20GB on personal GoogleDrive spaces provided with G Suite@Cambridge accounts. If your GoogleDrive usage exceeds 20GB after 1 December 2022, your University account GoogleDrive will become read-only until your usage is brought below 20GB.

Moodle offline for upgrade during 06:00–12:00 on Tuesday 11 January

10 January 2022

Moodle will be unavailable from 06:00 to 12:00 on Tuesday 11 January while we upgrade it to version 3.9. During the upgrade, you won’t be able to view or upload sessions on Panopto because access is managed via your Moodle login. Assessment Moodle, ICE Moodle and Clinical School Moodle users will be unaffected. An outline...

HEAT authentication method changing to Azure on 13 January

7 January 2022

We're changing the authentication method for the IT service management system, HEAT, to Microsoft Azure on Thursday 13 January 2022. What is changing? You should continue to use the same URL for accessing HEAT: https://uniofcam.saasiteu.com. However, the 'Sign in' screen you'll be directed to will look slightly different,...