@@ -279,19 +279,30 @@ X<-C>
279
279
280
280
The B<-C> flag controls some of the Perl Unicode features.
281
281
282
+ B<CAUTION:> As with the L<C<:utf8> PerlIO layer|PerlIO/:utf8>, none of
283
+ the features enabled by this flag or the equivalent C<PERL_UNICODE>
284
+ environment variable validate that input is valid UTF-8, nor guarantee
285
+ to produce valid UTF-8. Instead it will assume input is provided in
286
+ Perl's internal upgraded byte encoding, and provide output in this
287
+ encoding, which is a superset of UTF-8 that can encode any character
288
+ allowed in Perl strings. This can result in broken Perl strings or
289
+ output bytes which are not valid UTF-8. This internal encoding will be
290
+ referred to as C<utf8> below to differentiate it from a strict UTF-8
291
+ encoding format.
292
+
282
293
As of 5.8.1, the B<-C> can be followed either by a number or a list
283
294
of option letters. The letters, their numeric values, and effects
284
295
are as follows; listing the letters is equal to summing the numbers.
285
296
286
- I 1 STDIN is assumed to be in UTF-8
287
- O 2 STDOUT will be in UTF-8
288
- E 4 STDERR will be in UTF-8
297
+ I 1 STDIN is assumed to be in utf8
298
+ O 2 STDOUT will be in utf8
299
+ E 4 STDERR will be in utf8
289
300
S 7 I + O + E
290
- i 8 UTF-8 is the default PerlIO layer for input streams
291
- o 16 UTF-8 is the default PerlIO layer for output streams
301
+ i 8 :utf8 is the default PerlIO layer for input streams
302
+ o 16 :utf8 is the default PerlIO layer for output streams
292
303
D 24 i + o
293
304
A 32 the @ARGV elements are expected to be strings encoded
294
- in UTF-8
305
+ in utf8
295
306
L 64 normally the "IOEioA" are unconditional, the L makes
296
307
them conditional on the locale environment variables
297
308
(the LC_ALL, LC_CTYPE, and LANG, in the order of
@@ -307,22 +318,22 @@ perl.h gives W/128 as PERL_UNICODE_WIDESYSCALLS "/* for Sarathy */"
307
318
perltodo mentions Unicode in %ENV and filenames. I guess that these will be
308
319
options e and f (or F).
309
320
310
- For example, B<-COE> and B<-C6> will both turn on UTF-8 -ness on both
321
+ For example, B<-COE> and B<-C6> will both turn on utf8 -ness on both
311
322
STDOUT and STDERR. Repeating letters is just redundant, not cumulative
312
323
nor toggling.
313
324
314
325
The C<io> options mean that any subsequent open() (or similar I/O
315
326
operations) in main program scope will have the C<:utf8> PerlIO layer
316
- implicitly applied to them, in other words, UTF-8 is expected from any
317
- input stream, and UTF-8 is produced to any output stream. This is just
327
+ implicitly applied to them, in other words, utf8 is expected from any
328
+ input stream, and utf8 is produced to any output stream. This is just
318
329
the default set via L<C<${^OPEN}>|perlvar/${^OPEN}>,
319
330
with explicit layers in open() and with binmode() one can
320
331
manipulate streams as usual. This has no effect on code run in modules.
321
332
322
333
B<-C> on its own (not followed by any number or option list), or the
323
334
empty string C<""> for the L</PERL_UNICODE> environment variable, has the
324
335
same effect as B<-CSDL>. In other words, the standard I/O handles and
325
- the default C<open()> layer are UTF-8 -fied I<but> only if the locale
336
+ the default C<open()> layer are utf8 -fied I<but> only if the locale
326
337
environment variables indicate a UTF-8 locale. This behaviour follows
327
338
the I<implicit> (and problematic) UTF-8 behaviour of Perl 5.8.0.
328
339
(See L<perl581delta/UTF-8 no longer default under UTF-8 locales>.)
0 commit comments