edainworks.com :: VGR :: testing the methods for changing an uploaded file's name for successful URI, DB and filesystem use.

The Problem : when a user submits a local file via HTTP upload, the file name has usually to be transformed so as to pose no problem when used to display the document or image (in an URI, so think urlencoding(), huh ?;-), when stored in the DB (so think addslashes(), huh ? ;-) and when stored in the filesystem (whatever this is : linuxfs, NTFS, FAT... accepting or not spaces, parentheses, accentuated characters, quotes, the % sign...).

The Symptom : broken images, filesystem errors, files stored as simply urlencoded() are unretrievable...

the problematic filename = 'Grand-mre 98 ans (toto).jpg'

Examples of (string) filenames values that do not please at the same time the browser, the DB and the filesystem :
'Grand-Mre 001.jpg' (filename as is), 'Grand-M%E8re 001.jpg' (urlencoded with %20 replaced by SPC), 'Mister_Andr%E9_2_1937.jpg' (urlencoded), urlencoded with '+' replaced by %20 or the reverse... etc ad nauseam
Method 1 : preparar_nom_archivo(thefile)
Grand-mere-a-98-ans-toto.jpg
page generated in 0.07 ms

This method produces good results and is the best so far.

Let's measure performance now :
looping through 10 filenames 100 times...
test filename 0 : 'Georges Hronimums (1855-dcd).jpeg' > 'Georges-Heronimums--1855-decede.jpeg'
test filename 1 : 'Grand-mre 98 ans (toto).jpg' > 'Grand-mere-a-98-ans-toto.jpg'
test filename 2 : 'quoted' and accntuatd spaced str ng.jpg' > 'quoted-and-accentuated-spaced-str-ng.jpg'
test filename 3 : 'fully+urlencoded%38file%20name.jpg' > 'fullyurlencoded38file20name.jpg'
test filename 4 : 'some more accents with tremas .jpg' > 'some-more-accents-eau-with-tremas-.jpg'
test filename 5 : 'a French dprci word ( try oblig ).jpg' > 'a-French-deprecie-word--try-oblige-.jpg'
test filename 6 : 'a (nordic) try.jpg' > 'a-A-nordic-try.jpg'
test filename 7 : 'I'm also a+file%20name%20withaccents.jpg' > 'Im-also-afile20name20witheaeaccents.jpg'
test filename 8 : 'anyone ssaie el nio (spanish).jpg' > 'anyone-essaie-el-nino-spanish.jpg'
test filename 9 : 'dd tt ()_ .jpg' > 'eadd-ctt-_-.jpg'

page generated in 2.032 ms

code :
function preparar_nom_archivo($nom_archivo) { // Dany Alejandro Cabrera (20-Mar-2009 02:03) http://www.php.net/manual/fr/function.str-replace.php
    $arr_busca = array(' ','','','','','','','',
    '','', '','','','','','','','','','',
    '','','','','', '','','','','','','',
    '','','','','','','','','');
    $arr_susti = array('-','a','a','a','a','a','A','A',
    'A','A','e','e','e','E','E','E','i','i','i','I','I',
    'I','o','o','o','o','o','O','O','O','O','u','u','u',
    'U','U','U','c','C','N','n');
    $nom_archivo = trim(str_replace($arr_busca, $arr_susti, $nom_archivo));
    return preg_replace('/[^A-Za-z0-9\_\.\-]/', '', $nom_archivo);
    //VGR27042011 MOD, some moron deprecated ereg_replace() in PHP 5.3... was : return ereg_replace('[^A-Za-z0-9\_\.\-]', '', $nom_archivo);
} 

Method 2 : strtr(thefile, " ()", "aaaeeeeiioouuu___")
Grand-mere_a_98_ans__toto_.jpg
page generated in 0.003 ms

This method is faster (thanks to hardened PHP functions) produces worst results because you've not to forget any character in the call itself (see the remaining +, , , etc).

Let's measure performance now :
looping through 10 filenames 100 times...
test filename 0 : 'Georges Hronimums (1855-dcd).jpeg' > 'Georges_Heroinimums___1855-decede_.jpeg'
test filename 1 : 'Grand-mre 98 ans (toto).jpg' > 'Grand-mere_a_98_ans__toto_.jpg'
test filename 2 : 'quoted' and accntuatd spaced str ng.jpg' > 'quoted'_and_accentuated_spaced_str_ng.jpg'
test filename 3 : 'fully+urlencoded%38file%20name.jpg' > 'fully+urlencoded%38file%20name.jpg'
test filename 4 : 'some more accents with tremas .jpg' > 'some_more_accents_eau_with_tremas_aiu.jpg'
test filename 5 : 'a French dprci word ( try oblig ).jpg' > 'a_French_deprecie_word___try_oblige__.jpg'
test filename 6 : 'a (nordic) try.jpg' > 'a___nordic__try.jpg'
test filename 7 : 'I'm also a+file%20name%20withaccents.jpg' > 'I'm_also_a+file%20name%20witheaeaccents.jpg'
test filename 8 : 'anyone ssaie el nio (spanish).jpg' > 'anyone_essaie_el_nio__spanish_.jpg'
test filename 9 : 'dd tt ()_ .jpg' > 'eadd_tt_____.jpg'

page generated in 0.302 ms

Method 3 : preg_replace('/[^a-z0-9A-Z_-]/', '_', thefile)
Grand-m_re___98_ans__toto__jpg
page generated in 0.021 ms

This method is fast but produces bad results (too many characters stripped away) unless you extend the charset to admit like the first function.

Let's measure performance now :
looping through 10 filenames 100 times...
test filename 0 : 'Georges Hronimums (1855-dcd).jpeg' > 'Georges_H_ro_nimums___1855-d_c_d___jpeg'
test filename 1 : 'Grand-mre 98 ans (toto).jpg' > 'Grand-m_re___98_ans__toto__jpg'
test filename 2 : 'quoted' and accntuatd spaced str ng.jpg' > 'quoted__and_acc_ntuat_d_spaced_str_ng_jpg'
test filename 3 : 'fully+urlencoded%38file%20name.jpg' > 'fully_urlencoded_38file_20name_jpg'
test filename 4 : 'some more accents with tremas .jpg' > 'some_more_accents_____with_tremas_____jpg'
test filename 5 : 'a French dprci word ( try oblig ).jpg' > 'a_French_d_pr_ci__word___try_oblig____jpg'
test filename 6 : 'a (nordic) try.jpg' > 'a______nordic__try_jpg'
test filename 7 : 'I'm also a+file%20name%20withaccents.jpg' > 'I_m_also_a_file_20name_20with___accents_jpg'
test filename 8 : 'anyone ssaie el nio (spanish).jpg' > 'anyone__ssaie_el_ni_o__spanish__jpg'
test filename 9 : 'dd tt ()_ .jpg' > '__dd___tt______jpg'

page generated in 1.077 ms

Method 4 : preg_replace('/[^a-z0-9A-Z\_\.\-]/', '_', thefile)
Grand-m_re___98_ans__toto_.jpg
page generated in 0.023 ms

This method produces very bad results.

Let's measure performance now :
looping through 10 filenames 100 times...
test filename 0 : 'Georges Hronimums (1855-dcd).jpeg' > 'Georges_H_ro_nimums___1855-d_c_d__.jpeg'
test filename 1 : 'Grand-mre 98 ans (toto).jpg' > 'Grand-m_re___98_ans__toto_.jpg'
test filename 2 : 'quoted' and accntuatd spaced str ng.jpg' > 'quoted__and_acc_ntuat_d_spaced_str_ng.jpg'
test filename 3 : 'fully+urlencoded%38file%20name.jpg' > 'fully_urlencoded_38file_20name.jpg'
test filename 4 : 'some more accents with tremas .jpg' > 'some_more_accents_____with_tremas____.jpg'
test filename 5 : 'a French dprci word ( try oblig ).jpg' > 'a_French_d_pr_ci__word___try_oblig___.jpg'
test filename 6 : 'a (nordic) try.jpg' > 'a______nordic__try.jpg'
test filename 7 : 'I'm also a+file%20name%20withaccents.jpg' > 'I_m_also_a_file_20name_20with___accents.jpg'
test filename 8 : 'anyone ssaie el nio (spanish).jpg' > 'anyone__ssaie_el_ni_o__spanish_.jpg'
test filename 9 : 'dd tt ()_ .jpg' > '__dd___tt_____.jpg'

page generated in 0.773 ms

9. Conclusion top

FR

En conclusion les performances ne sont pas les mmes pour les quatre mthodes ; la plus propre est probablement la premire mais la plus efficace en PHP 5.3+ est maintenant strtr() car preg_replace() est plus lente que ereg_replace() qui a t dprcie (sans raison IMHO).
Cordialement,

EN

Conclusion : performance are not the same for the four methods tried here. The best and cleaner one is probably the first but the most efficient in PHP 5.3+ is now strtr() because preg_replace() seems slower than ereg_replace() which has been deprecated (no good reason why IMHO).
Best regards,

nVGR27042011 REM some moron deprecated ereg_replace() in PHP 5.3... So I had to modify all the ereg_replace() calls with preg_replace() - all you've to do if you use a REXP is to check the presence of the REXP reparator at first and last places in the format string. Example : '[a-z]' should become '/[a-z]/'.

Vincent Graux (VGR) for European Experts Exchange and Edan Works  back to list of test scripts
Last update 2024-05-13 15:34:23