utf8

Perl pragma to enable/disable UTF-8 (or UTF-EBCDIC) in source code
Download

utf8 Ranking & Summary

Advertisement

  • Rating:
  • License:
  • Perl Artistic License
  • Price:
  • FREE
  • Publisher Name:
  • TTY Internet Solutions
  • Publisher web site:
  • http://search.cpan.org/~tty/

utf8 Tags


utf8 Description

Perl pragma to enable/disable UTF-8 (or UTF-EBCDIC) in source code utf8 is a Perl class to enable/disable UTF-8 (or UTF-EBCDIC) in source code.SYNOPSIS use utf8; no utf8; # Convert a Perl scalar to/from UTF-8. $num_octets = utf8::upgrade($string); $success = utf8::downgrade($string); # Change the native bytes of a Perl scalar to/from UTF-8 bytes. utf8::encode($string); utf8::decode($string); $flag = utf8::valid(STRING);The use utf8 pragma tells the Perl parser to allow UTF-8 in the program text in the current lexical scope (allow UTF-EBCDIC on EBCDIC based platforms). The no utf8 pragma tells Perl to switch back to treating the source text as literal bytes in the current lexical scope.Do not use this pragma for anything else than telling Perl that your script is written in UTF-8. The utility functions described below are directly usable without use utf8;.Because it is not possible to reliably tell UTF-8 from native 8 bit encodings, you need either a Byte Order Mark at the beginning of your source code, or use utf8;, to instruct perl.When UTF-8 becomes the standard source format, this pragma will effectively become a no-op. For convenience in what follows the term UTF-X is used to refer to UTF-8 on ASCII and ISO Latin based platforms and UTF-EBCDIC on EBCDIC based platforms.See also the effects of the -C switch and its cousin, the $ENV{PERL_UNICODE}, in perlrun.Enabling the utf8 pragma has the following effect: * Bytes in the source text that have their high-bit set will be treated as being part of a literal UTF-X sequence. This includes most literals such as identifier names, string constants, and constant regular expression patterns. On EBCDIC platforms characters in the Latin 1 character set are treated as being part of a literal UTF-EBCDIC character.Note that if you have bytes with the eighth bit on in your script (for example embedded Latin-1 in your string literals), use utf8 will be unhappy since the bytes are most probably not well-formed UTF-X. If you want to have such bytes under use utf8, you can disable this pragma until the end the block (or file, if at top level) by no utf8;. Requirements: · Perl


utf8 Related Software