edainworks.com :: VGR :: testing the methods for changing an uploaded file's name for successful URI, DB and filesystem use.

The Problem : when a user submits a local file via HTTP upload, the file name has usually to be transformed so as to pose no problem when used to display the document or image (in an URI, so think urlencoding(), huh ?;-), when stored in the DB (so think addslashes(), huh ? ;-) and when stored in the filesystem (whatever this is : linuxfs, NTFS, FAT... accepting or not spaces, parentheses, accentuated characters, quotes, the % sign...).

The Symptom : broken images, filesystem errors, files stored as simply urlencoded() are unretrievable...

the problematic filename = 'Grand-mère à 98 ans (toto).jpg'

Examples of (string) filenames values that do not please at the same time the browser, the DB and the filesystem :
'Grand-Mère 001.jpg' (filename as is), 'Grand-M%E8re 001.jpg' (urlencoded with %20 replaced by SPC), 'Mister_Andr%E9_2_1937.jpg' (urlencoded), urlencoded with '+' replaced by %20 or the reverse... etc ad nauseam
Method 1 : preparar_nom_archivo(thefile)
Grand-mere-a-98-ans-toto.jpg
page generated in 0.069 ms

This method produces good results and is the best so far.

Let's measure performance now :
looping through 10 filenames 100 times...
test filename 0 : 'Georges Héroïnimums (1855-décédé).jpeg' > 'Georges-Heronimums--1855-decede.jpeg'
test filename 1 : 'Grand-mère à 98 ans (toto).jpg' > 'Grand-mere-a-98-ans-toto.jpg'
test filename 2 : 'quoted' and accéntuatèd spaced str ng.jpg' > 'quoted-and-accentuated-spaced-str-ng.jpg'
test filename 3 : 'fully+urlencoded%38file%20name.jpg' > 'fullyurlencoded38file20name.jpg'
test filename 4 : 'some more accents éàù with tremas äïü.jpg' > 'some-more-accents-eau-with-tremas-.jpg'
test filename 5 : 'a French déprécié word ( try obligé ).jpg' > 'a-French-deprecie-word--try-oblige-.jpg'
test filename 6 : 'a ÖÄÀ (nordic) try.jpg' > 'a-A-nordic-try.jpg'
test filename 7 : 'I'm also a+file%20name%20withéàèaccents.jpg' > 'Im-also-afile20name20witheaeaccents.jpg'
test filename 8 : 'anyone éssaie el niño (spanish).jpg' > 'anyone-essaie-el-nino-spanish.jpg'
test filename 9 : 'éàdd ç§tt ()_ .jpg' > 'eadd-ctt-_-.jpg'

page generated in 2.028 ms

code :
function preparar_nom_archivo($nom_archivo) { // Dany Alejandro Cabrera (20-Mar-2009 02:03) http://www.php.net/manual/fr/function.str-replace.php
    $arr_busca = array(' ','á','à','â','ã','ª','Á','À',
    'Â','Ã', 'é','è','ê','É','È','Ê','í','ì','î','Í',
    'Ì','Î','ò','ó','ô', 'õ','º','Ó','Ò','Ô','Õ','ú',
    'ù','û','Ú','Ù','Û','ç','Ç','Ñ','ñ');
    $arr_susti = array('-','a','a','a','a','a','A','A',
    'A','A','e','e','e','E','E','E','i','i','i','I','I',
    'I','o','o','o','o','o','O','O','O','O','u','u','u',
    'U','U','U','c','C','N','n');
    $nom_archivo = trim(str_replace($arr_busca, $arr_susti, $nom_archivo));
    return preg_replace('/[^A-Za-z0-9\_\.\-]/', '', $nom_archivo);
    //VGR27042011 MOD, some moron deprecated ereg_replace() in PHP 5.3... was : return ereg_replace('[^A-Za-z0-9\_\.\-]', '', $nom_archivo);
} 

Method 2 : strtr(thefile, "àâäéèêëîïôöûùü ()", "aaaeeeeiioouuu___")
Grand-mere_a_98_ans__toto_.jpg
page generated in 0.003 ms

This method is faster (thanks to hardened PHP functions) produces worst results because you've not to forget any character in the call itself (see the remaining +, ñ, §, ç etc).

Let's measure performance now :
looping through 10 filenames 100 times...
test filename 0 : 'Georges Héroïnimums (1855-décédé).jpeg' > 'Georges_Heroinimums___1855-decede_.jpeg'
test filename 1 : 'Grand-mère à 98 ans (toto).jpg' > 'Grand-mere_a_98_ans__toto_.jpg'
test filename 2 : 'quoted' and accéntuatèd spaced str ng.jpg' > 'quoted'_and_accentuated_spaced_str_ng.jpg'
test filename 3 : 'fully+urlencoded%38file%20name.jpg' > 'fully+urlencoded%38file%20name.jpg'
test filename 4 : 'some more accents éàù with tremas äïü.jpg' > 'some_more_accents_eau_with_tremas_aiu.jpg'
test filename 5 : 'a French déprécié word ( try obligé ).jpg' > 'a_French_deprecie_word___try_oblige__.jpg'
test filename 6 : 'a ÖÄÀ (nordic) try.jpg' > 'a_ÖÄÀ__nordic__try.jpg'
test filename 7 : 'I'm also a+file%20name%20withéàèaccents.jpg' > 'I'm_also_a+file%20name%20witheaeaccents.jpg'
test filename 8 : 'anyone éssaie el niño (spanish).jpg' > 'anyone_essaie_el_niño__spanish_.jpg'
test filename 9 : 'éàdd ç§tt ()_ .jpg' > 'eadd_ç§tt_____.jpg'

page generated in 0.303 ms

Method 3 : preg_replace('/[^a-z0-9A-Z_-]/', '_', thefile)
Grand-m_re___98_ans__toto__jpg
page generated in 0.021 ms

This method is fast but produces bad results (too many characters stripped away) unless you extend the charset to admit like the first function.

Let's measure performance now :
looping through 10 filenames 100 times...
test filename 0 : 'Georges Héroïnimums (1855-décédé).jpeg' > 'Georges_H_ro_nimums___1855-d_c_d___jpeg'
test filename 1 : 'Grand-mère à 98 ans (toto).jpg' > 'Grand-m_re___98_ans__toto__jpg'
test filename 2 : 'quoted' and accéntuatèd spaced str ng.jpg' > 'quoted__and_acc_ntuat_d_spaced_str_ng_jpg'
test filename 3 : 'fully+urlencoded%38file%20name.jpg' > 'fully_urlencoded_38file_20name_jpg'
test filename 4 : 'some more accents éàù with tremas äïü.jpg' > 'some_more_accents_____with_tremas_____jpg'
test filename 5 : 'a French déprécié word ( try obligé ).jpg' > 'a_French_d_pr_ci__word___try_oblig____jpg'
test filename 6 : 'a ÖÄÀ (nordic) try.jpg' > 'a______nordic__try_jpg'
test filename 7 : 'I'm also a+file%20name%20withéàèaccents.jpg' > 'I_m_also_a_file_20name_20with___accents_jpg'
test filename 8 : 'anyone éssaie el niño (spanish).jpg' > 'anyone__ssaie_el_ni_o__spanish__jpg'
test filename 9 : 'éàdd ç§tt ()_ .jpg' > '__dd___tt______jpg'

page generated in 0.916 ms

Method 4 : preg_replace('/[^a-z0-9A-Z\_\.\-]/', '_', thefile)
Grand-m_re___98_ans__toto_.jpg
page generated in 0.019 ms

This method produces very bad results.

Let's measure performance now :
looping through 10 filenames 100 times...
test filename 0 : 'Georges Héroïnimums (1855-décédé).jpeg' > 'Georges_H_ro_nimums___1855-d_c_d__.jpeg'
test filename 1 : 'Grand-mère à 98 ans (toto).jpg' > 'Grand-m_re___98_ans__toto_.jpg'
test filename 2 : 'quoted' and accéntuatèd spaced str ng.jpg' > 'quoted__and_acc_ntuat_d_spaced_str_ng.jpg'
test filename 3 : 'fully+urlencoded%38file%20name.jpg' > 'fully_urlencoded_38file_20name.jpg'
test filename 4 : 'some more accents éàù with tremas äïü.jpg' > 'some_more_accents_____with_tremas____.jpg'
test filename 5 : 'a French déprécié word ( try obligé ).jpg' > 'a_French_d_pr_ci__word___try_oblig___.jpg'
test filename 6 : 'a ÖÄÀ (nordic) try.jpg' > 'a______nordic__try.jpg'
test filename 7 : 'I'm also a+file%20name%20withéàèaccents.jpg' > 'I_m_also_a_file_20name_20with___accents.jpg'
test filename 8 : 'anyone éssaie el niño (spanish).jpg' > 'anyone__ssaie_el_ni_o__spanish_.jpg'
test filename 9 : 'éàdd ç§tt ()_ .jpg' > '__dd___tt_____.jpg'

page generated in 0.79 ms

9. Conclusion top

FR

En conclusion les performances ne sont pas les mêmes pour les quatre méthodes ; la plus propre est probablement la première mais la plus efficace en PHP 5.3+ est maintenant strtr() car preg_replace() est plus lente que ereg_replace() qui a été dépréciée (sans raison IMHO).
Cordialement,

EN

Conclusion : performance are not the same for the four methods tried here. The best and cleaner one is probably the first but the most efficient in PHP 5.3+ is now strtr() because preg_replace() seems slower than ereg_replace() which has been deprecated (no good reason why IMHO).
Best regards,

nVGR27042011 REM some moron deprecated ereg_replace() in PHP 5.3... So I had to modify all the ereg_replace() calls with preg_replace() - all you've to do if you use a REXP is to check the presence of the REXP reparator at first and last places in the format string. Example : '[a-z]' should become '/[a-z]/'.

Vincent Graux (VGR) for European Experts Exchange and Edaìn Works  back to list of test scripts
Last update 2023-12-21 12:38:15