The Vanishing Web

"But the idea that Vice may be deleted was not an irrational worry. News websites have been disappeared before: The contents of the website Gawker were deleted after it was successfully sued by Hulk Hogan and its former parent company went bankrupt. (Fortunately, the Freedom of the Press Foundation and the Internet Archive were able to preserve the archive.) Gawker was revived, and then went kaput again; is currently devoid of articles. In January, the not-quite-a-year-old website the Messenger vanished similarly when its owner, billionaire Jimmy Finkelstein, closed up shop and shut down the website with little notice. Go to the site now, and all you’ll see is a white screen with the name of the brand, and a general email address."

Vice remains online as of June, but the last article was from April and it's clear the writer expected it to be the last. Gawker is still a jokey fake news tip site. (some new articles have come along since I drafted this, but with a two and a half week gap, so something was up)

'Joshua Keating, a former Messenger reporter who left for a job at Vox before the site shut down, said that multiple decisions by the company’s management have left him without valuable clips. The Messenger bought his previous employer, Grid News, before launching, so now work from two different jobs is mostly inaccessible to the internet—unless you know exactly what to search for on a third-party service. “I have the text of all of my stuff, but it’s frustrating because there’s two years of work that can’t be found on Google,” said Keating, who is also a former editor at Slate.'

Here's my wget command for archiving sites using Windows Subsystem for Linux (WSL). I use Debian, but this should work in any Linux.

wget --mirror --warc-cdx --page-requisites --html-extension --convert-links --execute robots=off --directory-prefix=. --span-hosts --domains=domainofsite --restrict-file-names=windows --user-agent="Mozilla (" --wait=5 --random-wait

To fix asset URLs since Windows/NTFS doesn't like ? chars

find . -type f -name "html" -exec sed -i -r 's/(src|href)=(["\x27])(.?)(?)(.?)\2/\1=\2\3%3F\5\2/g' {} +

This should produce a static archive you can use to upload your site to any static host, like Neocities.

Why Sites Like Gawker and the Messenger Go Blank—and Why It Doesn’t Have to Be This Way
Here’s all it takes to keep articles preserved.