New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[css-text] Prevent line breaking after explicit hyphens #3434
Comments
I believe the intent of the spec with At the same time, I can see how the definition about I think we should clarify that definition to make it more clear (possibly by defining hyphenation opportunity, possibly by rephrasing the whole thing). |
Level 4 will eventually include everything that's in 3, but for now it focuses on additions. If we solve this by clarifying that this is already the expected behavior, this is a level 3 thing. If we solve it by adding a new control, it is a level 4. I'll start by tagging as level 3, under the assumption that what we want is a clarification.
We'll deal with all issues, including this one, before going to CR. |
Usage data https://developer.microsoft.com/en-us/microsoft-edge/platform/usage/css/hyphens/ |
U+002D is a break opportunity, not a hyphenation opportunity. A hyphenation opportunity inserts a hyphen if the line was broken there. U+002D doesn't do it. So maybe you want to tweak one of break opportunity properties. |
@hftf Wow, this issue is incredibly well researched and detailed. Thank you so much for taking the time to gather all this information and write up the issue! |
The CSS Working Group just discussed
The full IRC log of that discussion<dael> Topic: Prevent line breaking after explicit hyphens<dael> github: https://github.com//issues/3434 <fantasai> tantek, so does D <tantek> fantasai, you're proving my point that appendices are in general informative <fantasai> tantek, E doesn't say anything -- it's normative <fantasai> tantek, F says it's informative explicitly <fantasai> etc. <dael> florian: Not entirely clear to me at least and at least one other impl if hyphens none is meant for only suppressing invisible but leave existing hyphens alone or if it's meant to turn off wrapping opportunity at regular hyphens <dael> florian: koji and I think fantasai understood to not doing anything to normal hyphens. But I read that it's no breaking at hyphens. either way we should clarify. If we clarify to say it's not suppressed maybe explore a control in the next level <dael> florian: Current spec is not clear so we should clarify <dael> florian: Spec says suppresses hyphenation opportunities. Doesn't define a hyphenation opportunity. <dael> florian: That's where ambig comes from <dael> dbaron: You'd think hypenation opp is different then breaking opp. <fantasai> https://drafts.csswg.org/css-text-3/#hyphenation <dael> florian: Different from wrapping opp. Wrapping opp that's right after a hyphen is different. I can see it's reasonable for the spec to mean it <dael> AmeliaBR: As an author, I've come across places where I want to suppress break at hypens. But you can do that by turning off wrapping <dael> florian: You can do it with extra mark up. <dael> astearns: Need extra to signal intent. It's not breaking at any breaking opp. in that string. If you have a reg hyphen and words on either side with breaking opp. you don't want those to break either <myles> ++astearns <dael> florian: You mean you would allow in other places as well? The automatic hypenation. But in this no auto, go wrapping. <dael> astearns: You want markup on the special words where you don't want entire term to break <dael> florian: Not opposed to saying hyphens:none does not disable wrapping at reg hyphens. I think we could use a little clarification <dael> florian: It does seem that's what spec intends, but was no obvious <dael> AmeliaBR: I like clarifying hyphenation opp is opp to insert a hyphen. Breaking opp is second to that. <dael> florian: And we can re-open an issue on L4 to expore if we want an automatic way of doing this. <dael> dauwhe: We have general rule where we don't want to hyphenate hyphenated phrases and I'd love some css control for that that doesn't involve preprocessing thousands of words. But that's L4 <dael> dauwhe: We don't want to hyphenate words with intrinsic hypens <dael> florian: That's dealt with. This is a different case. <dael> dauwhe: I'm phrasing in a differnet way. But yes we don't want line breaks anywhere in those phases as astearns said <dael> fantasai: Seems really weird that you'd take hyphenated phrase like one in issue, forbid breaking in a long term is unusually strict. I can see not breaking at a point that's not the hyphen, but breaking at hyphen I don't imagine you'd want to suppress. <bradk> “E-Mail” should not wrap <dael> fantasai: Case here is someone that doesn't want hyphenation and is getting breaks at hyphens. And they thought they turned off hyphens and think hyphens and breaking is analogous <dael> AmeliaBR: I think there is a property for hyphen w/o break <AmeliaBR> s/property/Unicode character/ <dael> fantasai: I think it makes sense suppress hyphens at breaks through hypens prop. Given current state of impl none does not suppress those breaks. Might mean we add a value in L4 that means no really don't break at hyphens or hypenation points <dael> myles: What's the exampl? <AmeliaBR> @bradk, I think that would also be covered by hyphenate-limit-chars https://drafts.csswg.org/css-text-4/#hyphenate-char-limits <dael> florian: bradk 's IRC example. You'd never want e-mail to break <dael> myles: This is about things like long-term and t-shirt <dauwhe> https://www.princexml.com/doc-refs/#prop-prince-hyphenate-before <dael> fantasai: t-shirt could be a case where you don't break if it's less then 2 char on other side. You can control that for hyphenation in L4. <dael> fantasai: long-term breaking there is less likely to be because each half is too short <dael> florian: Seems like stylistic choice in this case <dael> florian: Can we resolve for L3 to clarify as AmeliaBR and I spoke of and leave it open for L4 and hash it out there? Seem reasonable? <dael> myles: One more thought, in section it says it affects searching and copying. I think affecting copying is a feature not a bug. <dael> florian: Possibly. Searching is more annoying <dael> myles: Have to look at searching. Searching is more complex <dael> fantasai: [missed] <fantasai> NFK normalization <dael> florian: There was spec by i18n to help browsers figure out what to do when searching <dael> myles: Like curly " match striaght " <dael> myles: We use ICU u search facilities <dael> florian: I think we're a little off topic. L3 hyphens:none doesn't do what we talked about, open issue in L4? <dael> fantasai: I think we wouldn't change meaning of none in L4 <dael> florian: We might add a value <dael> fantasai: fine with me <bradk> 👍 <dael> Rossen: Any objections to add a clarifying note to L3 and discuss a potential new value in L4? <dael> RESOLVED: add a clarifying note to L3 and discuss a potential new value in L4 |
…ter U+002D or U+2010 See issue #3434.
So, based on the teleconf discussion minuted above, we've agreed that the behavior currently observed in browsers ( Now, as to whether we should have a separate control for this in level 4, everybody agreed that this was a useful thing to do in certain occasions, and we looked at different scenarios:
The remaining question is whether there are case where you want to suppress wrapping at hyphenation that don't fall in any of these. I think the answer is yes. Doing Another example could be turning off wrapping after hyphens on people's names, to make sure there's no confusion about whether the hyphen is part of the name or inserted at layout time. But names can also contain spaces, and we don't want to suppress that wrapping opportunity. Any other situation where the semantic of the phrase/paragraph/block (any piece of text that might contain spaces as well), rather than the semantic of the individual word calls for turning off wrapping after hyphens cannot be addressed without adding an explicit control. So I think we should have it. Bikeshedding time? |
Thank you for investigating my issue!
I think this commit contains a typo:
To clarify, is this independent of
It’s not about whether the parts of a hyphenated compound are shorter than a threshold. In general, most hyphenated compounds are conceived as a single unit (and inflected atomically when spoken out loud). Personal names and hyphenated keywords in code are two good examples to add to my meager sample of three frequent English words. But I think the use case is much broader and is not “weird” or “unusual.” Keeping compounds together avoids interrupting or misleading the reader (a miscue or false scent or garden path) and increases legibility. Some style advice is collected below (not all relevant): Links to selected style manuals and authoritiesThe Canadian Style: 2. Hyphenation: Compounding and Word Division
2.17: Word division
Chicago Manual of Style, 17th edition2: Manuscript Preparation, Manuscript Editing, and Proofreading
2.96: Marking dashes and hyphens
2.112: Proofreading for word breaks
7: Spelling, Distinctive Treatment of Words, and Compounds7.36–7.47: Word Division7.40: Dividing compounds, prefixes, and suffixes
7.42: Dividing proper nouns and personal names
7.81–7.89: Compounds and Hyphenation7.81: To hyphenate or not to hyphenate 7.83: The trend toward closed compounds
Microsoft Manual of Style: 7. Practical issues of style: Line breaks
Garner’s Modern English Usage: Headlinese: Peculiar Use Of
Illustrator CC: Visual QuickStart Guide: 20. Style & Edit Type: Applying hyphenation
GPO Style Manual: 6. Compounding Rules
|
Fixed in ecc7db8. Thanks.
Right, I agree the current text is still a bit hand-wavy about hyphenation opportunities. I thought it was worth lifting this particular ambiguity anyway, even if we later want to revisit to make the whole definition clearer, as that's where confusion has been. If you've got a suggestion for a complete rephrasing, I'm all ears. I think cases like T-shirt or e-mail can be covered by the UA with the specification as it is, since the specification does not mandate that the break-after (BA) unicode line-breaking class be always respected, and so the UA could decide that (in English?) hyphen-minus should not introduce a wrapping opportunity when it is preceded by only one letter. Currently, Chrome and Safari do not do that, but Firefox does, as can be seen in this little demo: https://jsbin.com/hoqufobeje/edit?html,css,output. Maybe we want to mandate that behavior specifically, but I suspect not, as such rules probably need to be language dependent (not to mention full of corner cases), and researching line-breaking best practices in all of the world's languages is beyond the scope of what we can hope to do in this specification. Maybe an note? Or, given that the spec already expects UAs to "do the right thing" for each language, even if the rules aren't spelled out explicitly, maybe the spec as it is is sufficient to consider that disallowing a break after the hyphen in “e-mail” is already must-do, and I can write a wpt test to check if it is being done. As for the rest, yes, I'm with you. The links/quotes you gave show that there are situations where it's not desired to break after a hyphen. Since it's not universal, I don't think it would be the default, but a |
Thank you, I agree with all of that. I don’t have enough expertise to suggest a better rephrasing though. I don’t think anything needs to be mandated for cases like T-shirt or e-mail (at least, once there is an explicit control like |
Pushed some rephrasing in 723a74e Note the section quoted in #3463 (comment) is also relevant here wrt breaking “e-mail” and other words with similarly short segments. Remaining things we could do:
All three of these make sense to me. Agenda+ for WG discussion/resolution. |
The CSS Working Group just discussed
The full IRC log of that discussion<dael> Topic: Prevent line breaking after explicit hyphens<dael> github: https://github.com//issues/3434#issuecomment-450610535 <dael> fantasai: I committed a set of changes to clarify what hyphenation is and when it's invoked. <dael> fantasai: There were specific things we can do. Rec if a word contains a hyphen that breakpoint takes priority over auto hypen points. Could add no hyphens to L4. All this makes sense to me and wanted to ask WG what makes sense to do <dael> astearns: Argument to put first into L3 instead of 4? <TabAtkins> Whoops, sorry, I'm on IRC. <dael> fantasai: Just spec a particular behavior. It wouldn't increase scope of l3. We can also not spec in L3 <TabAtkins> I can jump on phone to discuss the @charset thing. <dael> florian: Adding no-wrap to hyphen in L4 would be helpful. Since I've done a talk on linebreaking people have asked for this feature. <dael> florian: Priority on an actual hyphen over hpyenation I'm generally in support. There was a nuance brought up where in things like German you have [longword]-[longword] someone pointed out in some cases you might want to break middle of other words at high priority as well <florian> https://github.com//issues/618#issuecomment-255135593 <dael> fantasai: So you prefer to break at another word and not hyphen if the break is close? <dael> florian: Here's the comment^ <dael> fantasai: I can imagine if you have 2 long words you would allow hyphenation in them. but if auto hyphen point is 2 char from explicit hyphen you would want explicit hyphen. I'm not saying forbid the hyphen elsewhere, but encourage UA to use that break <dael> florian: Trying to find some way around this for UA to do something smarter, either by keeping vague or we make it a must rule but if a-b is in the dictionary it can override and do what it wants <dael> astearns: I think having a preference for the explicit hyphen but allow at other points, for a UA to make a decision it has to consider hyphenation points againt something else like a desired line break. A greedy linebreak algo will jsut pick the highest priority we spec for the longest line <dael> fantasai: Greedy means fill the line as much as possible. Doesn't mean you can't say you prioritize breaks in that. Spaces win but a hypehn in 2 char of break works <myles_> q+ <florian> q+ <dael> AmeliaBR: The desire is to keep some vagueness in rule for prioritization because it does end up around how many characters you will end up short. Spec has avoided strict hyphen algo so far <astearns> ack myles_ <dael> myles_: The smarter we get on hyphenation and line breaking the more it seems to fit the text wrap multiline type thing <dael> astearns: Given that we are discussing pro and con of spec explicit hyphen is desired I'm included to push to L4 <dael> florian: Need to say something in L3. <myles_> what happened to text-wrap:multi-line? it isn't in the spec any more, but there are references to it if you search-in-page <dael> fantasai: L3 spec if you break at punct it's rec you preform prioritization among your breaks. I don't think L3 needs to say anything more. It's not only allowed, but encouraged <astearns> ack florian <dael> fantasai: Happy to push to L4. If we want something in the spec I'll write it <dauwhe> q+ <dael> florian: There's multiple ways. There's prioritization. There's also if you have a hyphen and disallow the rest. Even looking at German in the example this is allowed but not nice. Makes me think it's akin to line break where there's strict and loose. <dael> fantasai: Okay <dael> dauwhe: I'm fine with prioritization as fantasai wrote. I think that expresses all other things being equal we prefer to break at hyphen that's there, but there are other things algo need to consider <dael> astearns: I think we need a resolution to accpe the current change. fantasai did you want a feeling of the group if those 3 items should be worked on in L4? <dael> fantasai: Yeah. If we want to add to L4 I can edit those in <florian> +1 for current change <dael> astearns: First is accepting the changes in L3. Any objections to the current hyphenation text in L3? <dael> RESOLVED: Accept text for hyphenation in L3 <florian> +1 <dael> astearns: More explicit rule on where to hyphenation when there is a n explicit hyphen. Objections to adding that to L4? <dael> [silence] <dael> astearns: So work on that <AmeliaBR> aka the e-mail/T-shirt rule <florian> +0 for hyphenate-limit-chars (no disagreement, just haven't thought about it, but go explore) <dael> fantasai: There's don't hyphenate if there will be this many char before or after and proposal is to apply that to hyphens. You can't break if there's one character before or after explicit hyphen <TabAtkins> e- <TabAtkins> mail <dael> astearns: Obj to add something in L4 around e-mail/t-shirt rule? <dael> astearns: Hearing none, let's work on that. <myles_> https://github.com/w3c/csswg-drafts/commit/a0c27afa0a50c462584511e617a20b687eb892af#diff-94819ad75aa15ba8049b412f93d8cc04 <florian> +1 for nowrap in L4 <dael> fantasai: Adding no-wrap to hyphens. None says don't do hyphenation but you can break at explicit hyphens. No-wrap says don'tbreak at explicit hyphens either <dael> astearns: Obj to dealing with not wrapping at explicit hyphens in L4? <dael> astearns: Let's work on that too. <fantasai> https://drafts.csswg.org/css-text-4/#hyphenate-char-limits <dael> astearns: One additional thing when talking about char limit. Does it make sense to have char limimt applyt o each segment in between explicit hyphens? <dael> fantasai: Three values, required min for total char to hyphenate, min for char before hyphen, min for char after <dael> astearns: Min for 3 char, you have 3 char, explicit hyphen, 2 char, hyphenation break. Is that allowed? <dael> fantasai: Need to check <dael> astearns: Not sure if that should be a thing or not. There are more then enough char before hyphen <dael> fantasai: Yes, but if there wasn't a hyphen seems weird to break there <dael> astearns: True. Maybe line length consideration needs the char <dael> fantasai: THen you would allow to break after 2 char. <dael> astearns: That's prob enough on this <fantasai> s/char/char afte a space, too/ |
Minor point but for |
Regarding breaks after explicit hyphens: what about adding an optional fourth value to hyphenate-limit-chars, which is the minimum distance to the nearest hyphen - either explicit or automatically inserted. This would allow users to allow or disallow hyphenation at all in a word containing an explicit hyphen (which is common practice I believe), or optionally allow it only for long words. See discussion of this at https://www.princexml.com/forum/topic/3316/hyphenation-of-overlong-words Note they state:
So there is precedence for this sort of logic. |
Chromium issue: https://crbug.com/974470 |
I have to commend you @hftf for well and thoroughly researched responses! I thought I was concise and detailed, but your prose in this forum is certainly a cut above and a model for others to imitate. Bravo! |
Excellent illustration of what I believe @hftf is talking about. This is also exactly what I am trying to accomplish. I tried |
This is a major issue with composed names like cities and first names. Ex.: Jean-François, Saint-Pamphile, etc. Lots of writing guides and stylistic guides mention to never split a name like this. And just like @aghArdeshir mentioned, most of the times, I don't have control on the content itself so replacing manually or parsing the content before outputting it to the HTMl character is very much unviable. The best solution would be a new value to the word-break property. |
I personally would like the option to disable wrapping of hyphens-in-content, because of email addresses (e.g. Given the high likelihood of the email address being copy-pasted, I cannot just replace the hyphen with a non-wrapping, visually-identical variant, and the enclosing sentence must still wrap in general, so I can't use |
I noticed that |
I want to prevent hyphenation/line breaking of words containing hyphens (hyphenated compounds). Some common examples of hyphenated compounds in English are T-shirt, long-term, and so-called.
Why?
This behavior is important for documents that prioritize legibility over aesthetics, such as:
These documents may want to apply the behavior to the entire document, not on a case-by-case basis.
¹ This very spec once faced a similar issue in which unwanted hyphenation led to confusion: #2307
Current status
CSS Text Module Level 3 does not define what hyphenation opportunities are:
Under
hyphens
, it says:However,
hyphens: none
does not give the expected result in most browsers.For example, see how Chrome 70/Mac renders this JSFiddle:
I am concerned that the current spec gives authors very little control over a simple display requirement. For example, it never mentions the behavior of U+002D HYPHEN-MINUS, the most widespread hyphen character by far, even once.
Some workarounds
Wrap all hyphenated compounds in
<span style="white-space:nowrap;">
or<nobr>
(nonstandard).Cons:
Surround all hyphens with U+2060 WORD JOINER.
Cons:
Replace all hyphens with U+2011 NON-BREAKING HYPHEN.
Cons:
Even if they do, U+2011 may not look identical to U+002D or U+2010.
The Unicode Line Breaking Algorithm (UAX 14) recommends this method, but it seems rare in practice:
Possible solutions
hyphens: none
hyphens: none-including-hyphens
Questions
I can’t find any previous discussion on this topic; sorry if it’s a duplicate.
Thank you for considering this feedback – it’s my first time engaging with the standards process.
Further reading
Note that Google Docs renders hyphens as non-breaking by default in its browser view.
The text was updated successfully, but these errors were encountered: