Tips on making a website easy to archive

The following tips have been put together to help you archive your First World War project website. I’d encourage you to share links to your websites and any tips you might have about how you made your site accessible and archiveable.

Permissions

As you create your website, you will need to make sure that you are legally allowed to use all the content in your website, such as text, film or images. This means that you might need to get permission from the copyright holder, or check the content has a licence that lets you use it.

By signing up to our standard terms of grant, your organisation has granted HLF a license to use your website, and we are granting a license to the British Library to copy, store and share it.  

By archiving your site, the British Library does not gain the copyright or other rights for the use of material of your site. The copyright of the site and all works on it remains with the original authors.

While you create your website, we recommend that you read HLF’s Using digital technology in heritage projects good-practice guidance and in particular, that you read Appendix 5 – Copyright. You can also read about our digital requirements in the standard terms of grant for your grants programme, and in the Digital outputs appendix in the programme guidance

Making your website archiveable

The British Library will use a webcrawler to read and archive your site. This is a type of software that can read and index the internet.

If your website is complex then a webcrawler may not be able to read it. The list of tips below will help you build a website which a webcrawler can read, meaning that as much of your website as possible will be saved in the UK Web Archive.

Make sure each page of your site can be directly linked to and that the URL (in the address bar of the browser) doesn’t need parameters. If the URL contains a “?” character then it is using parameters, and the web crawler may not be able to distinguish between different pages.

Be aware that a webcrawler can only read information on your webpage that is available when the webpage first loads. It won’t be able to read anything extra. For example, if a webpage has a ‘read more’ button, then the webcrawler will not be able to read the text behind this button.

Bear in mind that content such as logins or sessions won’t be archived. This type of content is known as ‘dynamic’ content and can’t be read by webcrawlers and can’t be recreated in an archived version of the website.   

Put all your content on one website with one domain name and don’t redirect users to another domain name. The webcrawler will be told to read content with a certain domain name, and if you redirect it to another domain name, then it will stop reading. For example, if your domain name is 'hlffirstworldwar.com', then don’t redirect users to 'hlffirstworldwar.tumblr.com'.

Be aware that embedding content into a web page using a third party service means that it is unlikely that the webcrawler will be able to read it. Examples of embedding services include Flickr, Scribd, SlideShare, Storify, and SoundCloud. (This does not mean, however, that you should not embed such services in your web pages if they are important to your project).