Class::DBI::utf8

A Class:::DBI subclass that knows about UTF-8
Download

Class::DBI::utf8 Ranking & Summary

Advertisement

  • Rating:
  • License:
  • Perl Artistic License
  • Price:
  • FREE
  • Publisher Name:
  • Fotango Ltd
  • Publisher web site:
  • http://search.cpan.org/~fotango/

Class::DBI::utf8 Tags


Class::DBI::utf8 Description

A Class:::DBI subclass that knows about UTF-8 Rather than have to think about things like character sets, I prefer to have my objects just Do The Right Thing. I also want utf-8 encoded byte strings in the database whenever possible. Using this subclass of Class::DBI, I can just put perl strings into the properties of an object, and the right thing will always go into the database and come out again.For example, without Class::DBI::utf8, MyObject->create({ id => 1, text => "\x{2264}" }); # a less-than-or-equal-to symbol..will create a row in the database containing (probably) the utf-8 byte encoding of the less-than-or-equal-to symbol. But when trying to retrieve the object again.. my $broken = MyObject->retrieve( 1 ); my $text = $broken->text;... $text will (probably) contain 3 characters and look nothing like a less-than-or-equal-to symbol. Likewise, you will be unable to search properly for strings containing non-ascii characters.Creating objects with simpler non-ascii characters from the latin-1 range will lead to even stranger behaviours: my $e_acute = "\x{e9}"; # an e-acute MyObject->create({ text => $e_acute }); utf8::upgrade($e_acute); # still the same letter, but with a different # internal representation MyObject->create({ text => $e_acute });This will create two rows in the database - the first containing the latin-1 encoded bytes of an e-acute character (or the database may refuse to let you create the row, if it's been set up to require utf-8), the latter containing the utf-8 encoded bytes of an e-acute. In the latter case you won't get an e-acute back out again if you retrieve the row; You'll get a string containing two characters, one for each byte of the utf-8 encoding.Because of this, if you're handling data from an outside source, you won't really have any clear idea of what will be going into the database at all.Fortunately, simply adding the lines: use Class::DBI::utf8; __PACKAGE__->utf8_columns("text");will make all these operations work much more as expected - the database will always contain utf-8 bytes, you will always get back the characters you put in, and you will instantly become the most popular person at work.Class::DBI::utf8 is a Perl module which assumes that the underlying database and driver don't know anything about character sets, and just store bytes. Some databases, for instance postgresql and later versions of mysql, allow you to create tables with utf-8 character sets, but the Perl DB drivers don't respect this and still require you to pass utf-8 bytes, and return utf-8 bytes and hence still need special handling with Class::DBI.Class::DBI::utf8 will do the right thing in both cases, and I would suggest you tell the database to use utf-8 encoding as well as using Class::DBI::utf8 where possible.SYNOPSISThis module is a Class::DBI plugin: package Foo; use base qw( Class::DBI ); use Class::DBI::utf8; ... __PACKAGE__->columns( All => qw( id text other ) ); # the text column contains utf8-encoded character data __PACKAGE__->utf8_columns(qw( text )); ... # create an object with a nasty character. my $foo = Foo->create({ text => "a \x{2264} b for some a", }); # search for utf8 chars. Foo->search( text => "a \x{2264} b for some a" ); Requirements: · Perl


Class::DBI::utf8 Related Software