edainworks.com :: VGR :: testing the methods for changing an uploaded file's name for successful URI, DB and filesystem use.

The Problem : when a user submits a local file via HTTP upload, the file name has usually to be transformed so as to pose no problem when used to display the document or image (in an URI, so think urlencoding(), huh ?;-), when stored in the DB (so think addslashes(), huh ? ;-) and when stored in the filesystem (whatever this is : linuxfs, NTFS, FAT... accepting or not spaces, parentheses, accentuated characters, quotes, the % sign...).

The Symptom : broken images, filesystem errors, files stored as simply urlencoded() are unretrievable...

the problematic filename = 'Grand-mère à 98 ans (toto).jpg'

Examples of (string) filenames values that do not please at the same time the browser, the DB and the filesystem :
'Grand-Mère 001.jpg' (filename as is), 'Grand-M%E8re 001.jpg' (urlencoded with %20 replaced by SPC), 'Mister_Andr%E9_2_1937.jpg' (urlencoded), urlencoded with '+' replaced by %20 or the reverse... etc ad nauseam
Method 1 : preparar_nom_archivo(thefile)
Grand-mere-a-98-ans-toto.jpg
page generated in 0.102 ms

This method produces good results and is the best so far.

Let's measure performance now :
looping through 10 filenames 100 times...
test filename 0 : 'Georges Héroïnimums (1855-décédé).jpeg' > 'Georges-Heronimums--1855-decede.jpeg'
test filename 1 : 'Grand-mère à 98 ans (toto).jpg' > 'Grand-mere-a-98-ans-toto.jpg'
test filename 2 : 'quoted' and accéntuatèd spaced str ng.jpg' > 'quoted-and-accentuated-spaced-str-ng.jpg'
test filename 3 : 'fully+urlencoded%38file%20name.jpg' > 'fullyurlencoded38file20name.jpg'
test filename 4 : 'some more accents éàù with tremas äïü.jpg' > 'some-more-accents-eau-with-tremas-.jpg'
test filename 5 : 'a French déprécié word ( try obligé ).jpg' > 'a-French-deprecie-word--try-oblige-.jpg'
test filename 6 : 'a ÖÄÀ (nordic) try.jpg' > 'a-A-nordic-try.jpg'
test filename 7 : 'I'm also a+file%20name%20withéàèaccents.jpg' > 'Im-also-afile20name20witheaeaccents.jpg'
test filename 8 : 'anyone éssaie el niño (spanish).jpg' > 'anyone-essaie-el-nino-spanish.jpg'
test filename 9 : 'éàdd ç§tt ()_ .jpg' > 'eadd-ctt-_-.jpg'

page generated in 43.207 ms

code :
function preparar_nom_archivo() { // Dany Alejandro Cabrera (20-Mar-2009 02:03) http://www.php.net/manual/fr/function.str-replace.php
    $arr_busca = array(' ','á','à','â','ã','ª','Á','À',
    'Â','Ã', 'é','è','ê','É','È','Ê','í','ì','î','Í',
    'Ì','Î','ò','ó','ô', 'õ','º','Ó','Ò','Ô','Õ','ú',
    'ù','û','Ú','Ù','Û','ç','Ç','Ñ','ñ');
    $arr_susti = array('-','a','a','a','a','a','A','A',
    'A','A','e','e','e','E','E','E','i','i','i','I','I',
    'I','o','o','o','o','o','O','O','O','O','u','u','u',
    'U','U','U','c','C','N','n');
    $nom_archivo = trim(str_replace($arr_busca, $arr_susti, $nom_archivo));
    return ereg_replace('[^A-Za-z0-9\_\.\-]', '', $nom_archivo);
} 

Method 2 : strtr(thefile, "àâäéèêëîïôöûùü ()", "aaaeeeeiioouuu___")
Grand-mere_a_98_ans__toto_.jpg
page generated in 0.016 ms

This method is faster (thanks to hardened PHP functions) produces worst results because you've not to forget any character in the call itself (see the remaining +, ñ, §, ç etc).

Let's measure performance now :
looping through 10 filenames 100 times...
test filename 0 : 'Georges Héroïnimums (1855-décédé).jpeg' > 'Georges_Heroinimums___1855-decede_.jpeg'
test filename 1 : 'Grand-mère à 98 ans (toto).jpg' > 'Grand-mere_a_98_ans__toto_.jpg'
test filename 2 : 'quoted' and accéntuatèd spaced str ng.jpg' > 'quoted'_and_accentuated_spaced_str_ng.jpg'
test filename 3 : 'fully+urlencoded%38file%20name.jpg' > 'fully+urlencoded%38file%20name.jpg'
test filename 4 : 'some more accents éàù with tremas äïü.jpg' > 'some_more_accents_eau_with_tremas_aiu.jpg'
test filename 5 : 'a French déprécié word ( try obligé ).jpg' > 'a_French_deprecie_word___try_oblige__.jpg'
test filename 6 : 'a ÖÄÀ (nordic) try.jpg' > 'a_ÖÄÀ__nordic__try.jpg'
test filename 7 : 'I'm also a+file%20name%20withéàèaccents.jpg' > 'I'm_also_a+file%20name%20witheaeaccents.jpg'
test filename 8 : 'anyone éssaie el niño (spanish).jpg' > 'anyone_essaie_el_niño__spanish_.jpg'
test filename 9 : 'éàdd ç§tt ()_ .jpg' > 'eadd_ç§tt_____.jpg'

page generated in 1.208 ms

Method 3 : preg_replace('/[^a-z0-9A-Z_-]/', '_', thefile)
Grand-m_re___98_ans__toto__jpg
page generated in 0.088 ms

This method is fast but produces bad results (too many characters stripped away) unless you extend the charset to admit like the first function.

Let's measure performance now :
looping through 10 filenames 100 times...
test filename 0 : 'Georges Héroïnimums (1855-décédé).jpeg' > 'Georges_H_ro_nimums___1855-d_c_d___jpeg'
test filename 1 : 'Grand-mère à 98 ans (toto).jpg' > 'Grand-m_re___98_ans__toto__jpg'
test filename 2 : 'quoted' and accéntuatèd spaced str ng.jpg' > 'quoted__and_acc_ntuat_d_spaced_str_ng_jpg'
test filename 3 : 'fully+urlencoded%38file%20name.jpg' > 'fully_urlencoded_38file_20name_jpg'
test filename 4 : 'some more accents éàù with tremas äïü.jpg' > 'some_more_accents_____with_tremas_____jpg'
test filename 5 : 'a French déprécié word ( try obligé ).jpg' > 'a_French_d_pr_ci__word___try_oblig____jpg'
test filename 6 : 'a ÖÄÀ (nordic) try.jpg' > 'a______nordic__try_jpg'
test filename 7 : 'I'm also a+file%20name%20withéàèaccents.jpg' > 'I_m_also_a_file_20name_20with___accents_jpg'
test filename 8 : 'anyone éssaie el niño (spanish).jpg' > 'anyone__ssaie_el_ni_o__spanish__jpg'
test filename 9 : 'éàdd ç§tt ()_ .jpg' > '__dd___tt______jpg'

page generated in 4.422 ms

Method 4 : ereg_replace('/[^a-z0-9A-Z\_\.\-]/', '_', thefile)
Grand-mère à 98 ans (toto).jpg
page generated in 0.024 ms

This method produces very bad results.

Let's measure performance now :
looping through 10 filenames 100 times...
test filename 0 : 'Georges Héroïnimums (1855-décédé).jpeg' > 'Georges Héroïnimums (1855-décédé).jpeg'
test filename 1 : 'Grand-mère à 98 ans (toto).jpg' > 'Grand-mère à 98 ans (toto).jpg'
test filename 2 : 'quoted' and accéntuatèd spaced str ng.jpg' > 'quoted' and accéntuatèd spaced str ng.jpg'
test filename 3 : 'fully+urlencoded%38file%20name.jpg' > 'fully+urlencoded%38file%20name.jpg'
test filename 4 : 'some more accents éàù with tremas äïü.jpg' > 'some more accents éàù with tremas äïü.jpg'
test filename 5 : 'a French déprécié word ( try obligé ).jpg' > 'a French déprécié word ( try obligé ).jpg'
test filename 6 : 'a ÖÄÀ (nordic) try.jpg' > 'a ÖÄÀ (nordic) try.jpg'
test filename 7 : 'I'm also a+file%20name%20withéàèaccents.jpg' > 'I'm also a+file%20name%20withéàèaccents.jpg'
test filename 8 : 'anyone éssaie el niño (spanish).jpg' > 'anyone éssaie el niño (spanish).jpg'
test filename 9 : 'éàdd ç§tt ()_ .jpg' > 'éàdd ç§tt ()_ .jpg'

page generated in 1.697 ms

9. Conclusion top

FR

En conclusion les performances ne sont pas les mêmes pour les quatre méthodes ; la plus propre est probablement la première.
Cordialement,

EN

Conclusion : performance are not the same for the four methods tried here. The best and cleaner one is probably the first.
Best regards,

Vincent Graux (VGR) for European Experts Exchange and Edaìn Works  back to list of test scripts
Last update 2009-10-30 09:19:52

 Add This Article To:
 del.icio.usDel.icio.us  diggDigg  googleGoogle  spurlSpurl
 blinkBlink  furlFurl  simpySimpy  yahooY! MyWeb