Last time I had to substitute files from backups brought unpleasant surprise. About 80 files, thats 10% of all files, did not display on website.
Digging into issue fast revealed that all those files contained exotic, non ASCII characters in them. Django does not convert uploaded unicode filenames, and all modern operating systems have support for unicode filenames. Amazon S3 have support for unicode filenames. The problem was, that sync scripts that was used to backup files does not have so good support for unicode in filenames.
Fortunately changing Django default file storage so it saves all files in ASCII encoding takes only two steps:
1. Subclassing default FileSystemStorage:
import unicodedata from django.core.files.storage import FileSystemStorage class ASCIIFileSystemStorage(FileSystemStorage): """ Convert unicode characters in name to ASCII characters. """ def get_valid_name(self, name): name = unicodedata.normalize('NFKD', name).encode('ascii', 'ignore') return super(ASCIIFileSystemStorage, self).get_valid_name(name)
2. Tell Django to use ASCIIFileSystemStorage as default storage. This is to be added in settings.py.
DEFAULT_FILE_STORAGE = 'utils.storage.ASCIIFileSystemStorage'
More info about File storage is available in Django documentation.