When working on the web, coders or content managers often see the request to ‘post this pdf on our page…’. A wise one will ask, ‘Is this a file replacement or is it new? Shall I delete the outdated file or keep it for archives?’ File management can be a bear for content managers.
Think of a bucket that is mounted so high you cannot see inside it, or it has a lid that won’t let you see inside the bucket. Let’s pretend the bucket is your file upload directory in your content management system (CMS). Some CMS’s make you first upload into a media library and then get the link to the file from there so you can link it on your page. But other CMS’s have a file uploader right on the edit page pane. Every time you upload a .docx or .pdf or .xlsx document into a page it also goes into the bucket.
You can show or hide the file from the page, but it still lives in the bucket, out of your sight.
Out of Sight But Not Out of the Engines
Search engines can find those files! Even if you have un-linked the file from the page, it still lives in the bucket of files that search engines can index and keep in their short term memory, called cache.
That is why I do the tedious thing, the hard thing, and take the longer road. It’s boring and tedious to go cherrypick files for deletion, but if you’d like to save yourself some headaches further down the road, keep your directories clean in the present.
Depending on your workflow, you may need to ask your internet host to delete files for you, or your coder, or your IT team. Sometimes it is not obvious how to get a printout of what is in that bucket, and sometimes you just plain can’t get a listing of all the files in your upload directory. So each time you replace a file, make sure the old one is either deleted or overwritten.
Website Emergency!
This came up for me at work this quarter. A user had done a search and found an obsolete file on Google. It posed a fraud risk for the financial division, and they asked to have it removed asap. In my workflow, only the person who originally uploaded a file can delete it fully from the server. Since the file was over six years old and it was unknown who uploaded it, I needed to place a trouble ticket with my host to have it removed.
Next time you get a request to ‘please post this pdf on our page…,’ know that this is only half of the request! Your next question is ‘shall I delete the old file?’ And that is document management.