From: Michael G Schwern Date: 01:16 on 19 Nov 2007 Subject: Don't fancify my man pages! Lately I've been noticing that less has been subtly choking on perldoc pages. Lines might appear and disappear as I scrolled up and down. I figured it was a bug in less, but no. It is far more evil. Today I pasted some example code from "perldoc Attribute::Handlers" into a text file to write up a test based on it. The test was failing in mysterious ways. I finally figured out why. nroff (or rather groff) replaced all the ASCII single quotes in the file with fancy Unicode x2019 quotes. What?! Who thought this was a good idea?! Even if my CTYPE is set to UTF-8, NO! "Smart quote" stupidity should not leak into roff, the final bastion of the most Unixy of all Unix formatting tools! Ok, maybe this is some sort of OS X brain rot. Maybe the curvy corner, pointy-clicky heads somehow infected groff. But no! There it is, right in the pristine GNU roff source. $ grep 2019 groff-1.19.2/font/devutf8/R.proto ' 24 0 0x2019 WHY, GNU, WHY!?! To add insult to injury, the nroff TYPESETTER environment variable which can be used to override this madness isn't implemented in groff's nroff wrapper. Smart quotes in man pages. What's next? Turn ... into HORIZONTAL ELLIPSIS? Turn all my math functions into DIVISION SLASH? Will the next version of groff include a clippy helper or maybe a little dog to help me? How about putting google ads at the bottom of my man pages, wouldn't that be nice.
From: Phil Pennock Date: 04:41 on 19 Nov 2007 Subject: Re: Don't fancify my man pages! On 2007-11-18 at 17:16 -0800, Michael G Schwern wrote: > $ grep 2019 groff-1.19.2/font/devutf8/R.proto > ' 24 0 0x2019 Yeah, UTF-8 output in GNU nroff also uses fancy soft-break hyphens for words split across lines which is all well and good I suppose, but the character isn't supported in the default font used by PuTTY. (For which the fix is to stop using Windows even as just a connectivity client for getting to the machines where the real work happens.) Found this out when I switched to UTF-8, switched man-pages back and switched default text-viewer to lv(1), which unfortunately lacks features of less(1) which I use heavily, but at least converts the charsets as needed, reducing breakage when I see stuff with £ or ⬠in it. :^(
From: Matt McLeod Date: 06:17 on 19 Nov 2007 Subject: Re: Don't fancify my man pages! Phil Pennock wrote: > On 2007-11-18 at 17:16 -0800, Michael G Schwern wrote: > > $ grep 2019 groff-1.19.2/font/devutf8/R.proto > > ' 24 0 0x2019 > > Yeah, UTF-8 output in GNU nroff also uses fancy soft-break hyphens for > words split across lines which is all well and good I suppose, but the > character isn't supported in the default font used by PuTTY. (For which > the fix is to stop using Windows even as just a connectivity client for > getting to the machines where the real work happens.) It's not just putty. I don't often look at manpages on Linux boxes remotely (all my remote machines are Solaris) so I'd never noticed this particular bit of idiocy, but just a moment ago I had cause to check something on my work desktop machine from home and ran into this... Both machines are running Ubuntu 7.10 and I'm using gnome-terminal. Locally it's fine, but displaying over ssh I get the hateful behaviour. I really don't care whether that's because it's remote or because there's a step in the middle (ssh to Solaris gateway box, ssh from there to desktop), it's still hateful. Matt
From: Phil! Gregory Date: 04:56 on 19 Nov 2007 Subject: Re: Don't fancify my man pages! * Michael G Schwern <schwern@xxxxx.xxx> [2007-11-18 17:16 -0800]: > Smart quotes in man pages. What's next? A while back, I ran into a problem because nroff was turning all hyphens into en-dashes or something similarly Unicode-y. This becomes a problem when I try to copy and paste --bloody-long-gnu-option-dwim-dammit-now from the man page and the program chokes on the unexpected hyphen characters.
From: A. Pagaltzis Date: 09:01 on 19 Nov 2007 Subject: Re: Don't fancify my man pages! Hi Michael, * Michael G Schwern <schwern@xxxxx.xxx> [2007-11-19 02:25]: > nroff (or rather groff) replaced all the ASCII single quotes in > the file with fancy Unicode x2019 quotes. it doesn't stop there. Guess what it does with double hyphens? No really! It converts them into em-dashes. Aaaaaaaaaaah! $ grep man ~/.bashrc alias man='LC_CTYPE=C man' Reminds me, this is not the only GNU tool that needs such treatment. GNU grep pays attention to the locale as well, but its encoding decoder is apparently written in Visual Basic -- if you use a UTF-8 locale, it will slow down by TWO ORDERS OF MAGNITUDE. $ time LC_CTYPE=en_US.utf8 grep -cq tes /usr/share/dict/words real 0m0.686s user 0m0.680s sys 0m0.004s $ time LC_CTYPE=C grep -cq tes /usr/share/dict/words real 0m0.006s user 0m0.004s sys 0m0.000s Regards,
From: Michael G Schwern Date: 09:20 on 19 Nov 2007 Subject: Re: Don't fancify my man pages! A. Pagaltzis wrote: > Reminds me, this is not the only GNU tool that needs such > treatment. GNU grep pays attention to the locale as well, but its > encoding decoder is apparently written in Visual Basic -- if you > use a UTF-8 locale, it will slow down by TWO ORDERS OF MAGNITUDE. > > $ time LC_CTYPE=en_US.utf8 grep -cq tes /usr/share/dict/words > > real 0m0.686s > user 0m0.680s > sys 0m0.004s > > $ time LC_CTYPE=C grep -cq tes /usr/share/dict/words > > real 0m0.006s > user 0m0.004s > sys 0m0.000s Are you sure you didn't just measure disk caching? I don't any different results between the two on OS X.
From: A. Pagaltzis Date: 13:46 on 19 Nov 2007 Subject: Re: Don't fancify my man pages! * Michael G Schwern <schwern@xxxxx.xxx> [2007-11-19 10:25]: > A. Pagaltzis wrote: > > Reminds me, this is not the only GNU tool that needs such > > treatment. GNU grep pays attention to the locale as well, but > > its encoding decoder is apparently written in Visual Basic -- > > if you use a UTF-8 locale, it will slow down by TWO ORDERS OF > > MAGNITUDE. > > > > $ time LC_CTYPE=en_US.utf8 grep -cq tes /usr/share/dict/words > > > > real 0m0.686s > > user 0m0.680s > > sys 0m0.004s > > > > $ time LC_CTYPE=C grep -cq tes /usr/share/dict/words > > > > real 0m0.006s > > user 0m0.004s > > sys 0m0.000s > > Are you sure you didn't just measure disk caching? I don't any > different results between the two on OS X. Those measurements were with hot cache and are reliably reproducible on my machine. Possibly you need to set more locale variables; I also have LANG set. (The "funny" thing is I had LC_COLLATE set to `C` already, so grep should not be doing any decoding *anyway*.) Or your GNU utils have been compiled with other switches. Or something. Regards,
From: H.Merijn Brand Date: 14:12 on 19 Nov 2007 Subject: Re: Don't fancify my man pages! On Mon, 19 Nov 2007 14:46:50 +0100, "A. Pagaltzis" <pagaltzis@xxx.xx> wrote: > * Michael G Schwern <schwern@xxxxx.xxx> [2007-11-19 10:25]: > > A. Pagaltzis wrote: > > > Reminds me, this is not the only GNU tool that needs such > > > treatment. GNU grep pays attention to the locale as well, but > > > its encoding decoder is apparently written in Visual Basic -- > > > if you use a UTF-8 locale, it will slow down by TWO ORDERS OF > > > MAGNITUDE. > > >=20 > > > $ time LC_CTYPE=3Den_US.utf8 grep -cq tes /usr/share/dict/words=20 > > >=20 > > > real 0m0.686s > > > user 0m0.680s > > > sys 0m0.004s > > >=20 > > > $ time LC_CTYPE=3DC grep -cq tes /usr/share/dict/words=20 > > >=20 > > > real 0m0.006s > > > user 0m0.004s > > > sys 0m0.000s > >=20 > > Are you sure you didn't just measure disk caching? I don't any > > different results between the two on OS X. >=20 > Those measurements were with hot cache and are reliably > reproducible on my machine. >=20 > Possibly you need to set more locale variables; I also have LANG > set. (The "funny" thing is I had LC_COLLATE set to `C` already, > so grep should not be doing any decoding *anyway*.) >=20 > Or your GNU utils have been compiled with other switches. Or > something. Yet another reason to make =E2=80=93=E2=80=93disable=E2=80=93nls default fo= r such basic tools (don't paste that option, it might contain UTF8 :p) --=20 H.Merijn Brand Amsterdam Perl Mongers (http://amsterdam.pm.org/) using & porting perl 5.6.2, 5.8.x, 5.10.x on HP-UX 10.20, 11.00, 11.11, & 11.23, SuSE 10.1 & 10.2, AIX 5.2, and Cygwin. http://qa.perl.org http://mirrors.develooper.com/hpux/ http://www.test-smoke.org http://www.goldmark.org/jeff/stupid-disclaimers/
From: seph Date: 15:54 on 19 Nov 2007 Subject: Re: Don't fancify my man pages! Michael G Schwern <schwern@xxxxx.xxx> writes: > A. Pagaltzis wrote: >> Reminds me, this is not the only GNU tool that needs such >> treatment. GNU grep pays attention to the locale as well, but its >> encoding decoder is apparently written in Visual Basic -- if you >> use a UTF-8 locale, it will slow down by TWO ORDERS OF MAGNITUDE. > > Are you sure you didn't just measure disk caching? I don't any different > results between the two on OS X. gnu grep has had this rather embaressing bug. Though I think it should be patched by now. seph
Generated at 10:26 on 16 Apr 2008 by mariachi