Handling broken links in historical documents

Here’s an interesting issue that we’ve been debating here this week that I wanted to throw out to the crowd here.
You have an html version of an old Annual Report or some other content that exists in both print and digital formats, and it includes a link to content that is no longer available, what should you do with it?

A few other criteria for this little scenario:

  • This is a historical document that was correct at the time of publishing
  • If you change the document it will no longer reflect how it was when it was published
  • We’re assuming that the content that it was linking to hasn’t just moved and that it is no longer available at all.

Our suggestions were:

  • Leave it as-is with a known broken link. Users will likely get frustrated by this, but they can still see what the url was supposed to be and use archival tools to try and dig up a version if they needed it.
  • Leave the link as-is, but intercept it with some javascript to display a prompt or message to state that we know that this resource is no longer available.
  • Remove the link. Removes the possibility of frustration from the broken link, but depending on context the intent of the sentence may be changed.
  • Rewrite or remove the sentence. Broken link is fixed, but so it the original intent of the sentence. The html and printed versions no longer match up.

Please let us know how you think this could be resolved!

1 Like

From memory, the issue of what to do with old documents came up after the March 1996 federal election, at the Commonwealth Internet Reference Group (CIRG). We decided to label the old documents as being for archival purposes only, and leave the broken links. Fixing the links is just too hard. I would have put a record of this on file at the Department of Defence, a paper file, which may be in a dusty basement somewhere. :wink:

Option 5, remove the HTML versions of old annual reports. We do this with ours as new ones come out. We rely on a PDF and a (mostly) accessible alternative as the ‘archive’ copy. Our reasoning was very few people want this old information and we definitely don’t want to be maintaining it or migrating it. The broken links are less obvious in documents.

I’ve run across this scenario in media releases. We’ve removed the link when it wouldn’t break the experience. You could remove the link and insert an appropriately styled editors note such as [This link is no longer available].

As what @mgdhs mentioned, archive and remove the HTML report and refer them to webarchive.nla.gov.au to view historical archived content.
Something that I’ve done for my previous agency.

1 Like

Thanks for the ideas folks!

The general vibe here seems to be that once something gets to a point where links are no longer available, that it should be removed or formally archived and references to it switched out with links to the archived versions.

I like @AdrianWong’s use of a direct link to the content in the webarchive. But I can see that that will only work when the content that we were linking to was on a government site.

Would we consider doing the same with direct links to wayback machine (https://archive.org/web/web.php)?

You could talk to the Pandora team.

We use their service to take snapshots of our Minister’s websites before change of Minister and media releases before we archive them. They do it on demand.

That is an interesting problem!
Personally I do like @AdrianWong idea of directing to the webarchive. Doesn’t appear that’ll work for you if it has been linked to a non-government resource.

Moving forward we should all think about DTA’s Digital First approach, that is any PDF/documents/reports are only supplementary to the content on a website, where the information is a more “living” document that can be updated.

At Dept. Social Services, we try to follow a similar approach to Stackoverflow of always including snippets or enough context on the site/document itself, before linking away to somewhere outside of our control, to lessen the effect of link rot.

You should also consider the FOI-ability of the content you are archiving/deleting. Not that many people would even bother putting in a request, but important if your archival mechanism makes that process hard. For ministers, we put in redirects to our dedicated former ministers site which makes it a bit easier for everyone.

Those agencies with capacity (or maybe a new project for DTA for WoG) might look at using their own internally hosted link-forwarding/shortening service like aka.ms or goo.gl which would alleviate any link rot problem.

1 Like

I recommend looking at the Remove content guidance co-created by DTA and the National Archives of Australia on Digital Guides at: https://guides.service.gov.au/content-strategy/remove-content/


@alexandra - the guides on DTA are a great go-to reference but they don’t quite fit for what we’re looking at here. I like that it is endorsing the strategy suggested by a few here of referring people to one of NLA’s archives.

^ this is a good idea too and something I agree we should all be doing. This approach works best for policy and informational content. When we’re dealing with point-in-time content like reports or media releases, it doesn’t seem right to me to go back and edit it after the fact.

I think this discussion has led us to a few ideas/conclusions:

  • Avoid including links to external sources in point-in-time content, or include enough of the linked material that the external link is not necessary and can be removed if it goes away for whatever reason.
  • Change dead links that can’t be changed or edited for links to the same content (at a similar point in time) on the NLA’s Web Archive or Pandora, or for non-government content perhaps the Wayback machine? Consider flagging this for your users somehow.
  • Where content is out of date, update it or consider removing it entirely using the Digital Guide as a reference.
1 Like

For something such as an annual report where the state of that report is important to capture at a specific point in time. One of the first two suggestions seemed to be the way to go. I have opted for the first suggestion in past projects in order to capture the status quo at a point in time and to allow for that trail to be followed and revived if necessary. (we use a simply 404 page with clear text ‘this is missing’ or ‘this is archived’). It might be worth adding a date to the ‘archived’ page to help with reviving the trail so to speak.

For other less significant publications, maybe it is ok to be creative and try different solutions.

I work at a content rich organisation, and the nature of our work requires a role with high level expertise to make minor changes to publications, so the web team almost never makes changes to publications or documents because even removing or changing a link can alter the context or meaning of a sentence. It’s expensive and time consuming but very necessary.

We tend to be more relaxed and allow more autonomy on the web team when dealing with publications that do not rely on precise and accurate language so much.

My personal belief is Remove the link. Removes the possibility of frustration from the broken link, but depending on context the intent of the sentence may be changed.