SkayaWiki

PlPythonU

JeromePetazzoni :: DerniersChangements :: DerniersCommentaires? :: ParametresUtilisateur :: http://www.enix.org/ :: Vous êtes ec2-18-218-50-170.us-east-2.compute.amazonaws.com
Ok, you are dreaming about a "normalization" function in SQL, to remove accents, uppercase symbols, etc. ?

Here is (one of many ways) to do it.

1. Install support for Python in your postgresql engine :
jpetazzo@illithid:~$ sudo apt-get install postgresql-plpython-8.3
(Replace with your actual version ; if you are using a non-Debian distribution, adapt!)

2. Assume we have a database named foobase ; install Python support inside this database :
jpetazzo@illithid:~$ createlang plpythonu foobase
This will basically tell your postgrsql engine "okay, the foobase can use Python support from now". In case you're wondering, all postgresql languages begin with PL ; hence the name "plpythonu". The U at the end means "untrusted" (but don't ask me what it really means deep inside).
NB : the command should be run as a privileged postgresql user (e.g., sudo -u postgres createlang ...).

3. Now, connect to the database :
jpetazzo@illithid:~$ psql foobase

4. Create a function :
foobase=# create or replace function
normalize(t text) returns text as $$
import unicodedata
without_accents = unicodedata.normalize('NFKD',t.decode('utf-8')).encode('ascii','ignore')
return without_accents.lower()
$$ language plpythonu;

Note : using "create or replace" allows you to modify the query and rerun it without having to bother with dropping the function beforehand.

5. Use the function :
foobase=# select normalize('YéhaÉhaö');
normalize
- - - - -
yehaehao
(1 row)


6. Rejoice and have a beer.

Of course, you can (and should!) modify my (simplistic) function to do further processing (translate HTML entities, remove stray tags, non-alphanumeric symbols, etc.)
Il n'y a pas de commentaire sur cette page. [Afficher commentaires/formulaire]