It's not noise, it's presentation markup. That's what HTML/CSS is for. To manage the presentation of information. It's not that hard. Learn a little responsive CSS and have the div go away if the browser's width drops below, say, 600px.
Visual presentation is a hard problem, has been ever since the tele-type days. You're always going to need some "noise". I'm not saying you have to go full-hog with design bling, but you should at least learn enough about responsive CSS so that your site is usable across the three main types of screen it will be displayed on.
You don't need responsive CSS, all you need is max-width.
The (technically) ideal place to solve this is in your browser's user stylesheets (then you can have the page look however you like!), but that's a bad solution right now for obvious reasons (how many people even know that they exist?).
User stylesheets will always suck, because the pages you're depending on for your stylesheets to work is a concretion, not an abstraction. You're just going to have constant problems with the user CSS conflicting with the creator's styling.
The right way to do this would involve an evolution in the community where you could easily get the content you need in JSON format or similar. Then you can create your own presentation, or, more likely, subscribe to a site/service that collects user-created presenters.
A document mixes information with presentation. It's not data. Because it mixes information with presentation, you have to re-prepare it for each medium you intend to display it on.
HTML is a poor document format. If you store your document as HTML, then you also have to store the CSS along with it or it won't be a complete document. Unless you want to include the styling in the HTML file, a kludge which violates the Single Responsibility Principle.
There are plenty of document formats out there that don't have this issue, like PDF or RTF.
If you want your document to look more like data, then what you do is factor out all the atomic bits of information into values that you can then input into a database. Then write code to present it. Works well for structured documents like orders, invoices, or reports, less so for unstructured information like blog posts. For these, mixing presentation with information is unavoidable, just adding bold-face to a word means you'll need to store presentation information in your data. The "semantic web" is intended to address this.
> And after CSS was introduced, HTML was definitely not supposed to be presentational.
Presentation involves more than just style. CSS handles the style of web content, HTML handles the structure.
I'm familiar with the distinction between document and data. I'm talking about a site like a blog, where the document is the data. That was the original context of this conversation, IIRC.
HTML is emphatically not a poor document format, but it satisfies different requirements than PDF and RTF do. The weak connection between structure and what you call "style" (what I would call presentation) is a feature; it allows clients to modify the presentation to suit their needs. If I want a bigger font or higher contrast or a smaller column width, I'm able to do that because of the separation between structure and style.
But (as you rightly pointed out) this is a lot harder than it should be (and harder than it used to be). As a result, when an HTML document doesn't work for someone (due to its presentation) they have to complain to document creators instead of merely configuring their client (once) to suit their needs. This sucks.
> But (as you rightly pointed out) this is a lot harder than it should be (and harder than it used to be).
It's hard because doing this is moving in the wrong direction concerning the intended abstractions.
As I said earlier, a document combines unstructured information with presentation, and you have to re-do the document every time the presentational logic changes. This is necessary because the information in the document is unstructured, it's not like an order form.
Because you cannot predict what form unstructured information will take, the presentational logic is necessarily strongly coupled with the information. That's why it's hard to do what you want. You can pop open Dev Tools and manually do it, but you can't write a program that will take _all blog posts_ and restructure them the way you want to, because _all blog posts_ is impossible to reason about.
No amount of evolution to the HTML or CSS standards will work out this particular bit of complexity. If you could standardize a blog post, then you could write a program to do it. But it would only work on posts that meet the standard.
Say you made it so every blog CMS out there stored the text in the DB in Markdown format and provided an API so you could get at the Markdown. Then you could do what you say, provide your own styling. What this would be doing is introducing a separation of concerns. You push most of structure out of the data, and divide up styling duties between a base level (Markdown) and an upper level. (whatever you're using to display it) You may not even need the API if the CMS doesn't screw around too much with the presentation by putting, say, ads, in the middle of the content. Then you could screen scrape and convert the generated HTML back into Markdown, but again, this is the wrong way to go, (depending on concretion rather than abstraction) and prone to breakage. You really need the API layer to do this properly.
But you can't hope that one day HTML and CSS will make sense again like the old days and that user-styling will work again. It only worked before in the very early days of the web because everything was super simple and people could live with the edge cases that cropped up, not because the underlying domain changed. That solution was always brittle, and it broke the second people wanted greater flexibility in presentation.
Because you always want to depend on abstraction, not concretion. If you have a whole bunch of data, and you know that all the data has 7 fields and none of the entries in field 3 are null, then that's much easier to work with if you don't know how many fields are in your data at all, or if some of the data has 9 fields and some of it has 3. If you have this situation, then you have to take an extra step to clean your data before you can reason about it properly.
The HTML you get when you go to a web page is anything but an abstraction of a data type. If you want it to get that way, now we're back to telling the whole web how to make web pages.
> The HTML you get when you go to a web page is anything but an abstraction of a data type.
You don't think that "this is an article", "this is a paragraph", "this is a link", "this text should be emphasized" are abstractions? And how is Markdown different, when it describes exactly the same elements of a document?
> now we're back to telling the whole web how to make web pages.
Telling people how to make web pages isn't a problem - that's what the HTML standard is. Depending on the whole web to make their pages suit your individual needs is a problem.
> You don't think that "this is an article", "this is a paragraph", "this is a link", "this text should be emphasized" is an abstraction?
It's not. There's many different ways to do this with HTML/CSS. You can use bold tags or spans. If you use spans, then you have to understand the class and the CSS before you can tell that this text is supposed to be bold-faced rather than colored differently.
Links can be specified in the HTML or added in with jQuery. An article can be described with a semantic HTML tag or a div with a class. If it's the latter, you've got to parse the class and figure out what it means. If you're lucky it will be 'article'. But you probably won't be.
HTML cannot be looked at as a data type. Markdown specifies one and only one way to do all of the above. That's a proper abstraction.
> Telling people how to make web pages isn't a problem - that's what the HTML standard is.
You can announce a set of 'best practices', but that's not a standard. A standard is an abstraction that you can rely on other people using because otherwise the vast majority of software won't work with it. Best practices cannot be relied on, you follow them for your own benefit, not others.
The HTML standard is insufficient for this kind of use. And it will remain this way because HTML isn't intended the way you seem to think it is.
> If you're lucky it will be 'article'. But you probably won't be.
This situation can be improved (and is being improved). If we treat HTML as nothing but a display language, then it will become one - and if that's what you want, then you should just be using PDF, PNG or SWF.
> And it will remain this way because HTML isn't intended the way you seem to think it is.
I guess we've reached the root of our disagreement. It's exactly how it was intended to be used historically, it's how it is still used for the most part (with webapps being a notable exception), and I think it's the best way to use it going forward.
I would still argue that making the browser aware of the screen it is using (or perhaps, aware of the screen real estate it currently occupies and the text size preference of the user) is a better solution than adding a div to every serious web page.
Also, I'm pretty sure that (at least for simple pages) styling the body element will work just as well.
Why would that be any better? The only thing the page displaying in any browser needs to be aware of is how big the viewing window is, and the DPI. With those two pieces of information, you can have anything that runs in a browser present a useful interface.
That way content can handle all the edge cases that different screens can throw at them. And different browsers can handle things like text sizes the way they want to.
You don't want to have to make individual websites cater to individual screens or browsers. There's just too many of both. The interface between them needs to be a loose coupling.
A better way to say what I was getting at is that a web browser that finds itself with a silly window size might better serve the user by only using a portion of it for the viewport.
This way a page with no extra width styling still gets presented to the user in a sensible fashion. It's probably a looser coupling than insisting that every page anticipate and handle the screens it might be viewed on.
The browser shouldn't be dictating how the web works. It's the creator's job to manage the complexity of presentation, you can't just push that responsibility onto your tech stack. The domain and expected use cases for web apps is just too broad for any browser to be making decisions like this.
Visual presentation is a hard problem, has been ever since the tele-type days. You're always going to need some "noise". I'm not saying you have to go full-hog with design bling, but you should at least learn enough about responsive CSS so that your site is usable across the three main types of screen it will be displayed on.