IndustryGetting The New York Times More Search Engine Friendly

Getting The New York Times More Search Engine Friendly

Online newspapers have often ignored search engines, or viewed them with mistrust, relying on the power of their brands to drive traffic. That attitude is changing dramatically at the New York Times, and with powerful effect.

Marshall Simmonds is a long-time colleague who, as vice president of enterprise search, oversees the SEO work to make the New York Times more search engine friendly. The NYT acquired Marshall as part of the Times’ purchase last year.

Prior to Marshall’s arrival at the Times, the company’s web sites, which also include the Boston Globe and the International Herald Tribune, had good traffic, but this was largely due to the strength of the brands. Little effort was expended on making search-friendly content, or making sure that search engines could easily crawl and index the sites.

“When I joined the company, Google was not crawling the Times,” said Marshall. “I’m pushing to get more traffic to the site.”

In a series of telephone and email interviews with Marshall, we learned what steps he took to change things at the Times to make the company’s web sites more search friendly. Most of the changes he made will be of interest to anyone responsible for in-house search engine optimization efforts.

The biggest problem for the Times with search engines was one common to most newspapers: The Times required users to register to read an article. Registration forms effectively block search engines from indexing a web site, as the crawlers can’t type a user name or password to access articles.

“Yahoo had indexed our registration page 20 million times,” quipped Marshall, but that turned out to be a serious issue. Yahoo’s crawler recognized the importance of the Times’ web site, but was unable to do anything but hammer away on the registration page that was ultimately displayed whenever the search engine crawler attempted to access a Times article by following a link.

Marshall’s first step was to allow search engine crawlers to have complete access to everything published by the Times, including archived content dating back to 1981. This was a major shift for the company, since archived content is only available to paying subscribers. How to reconcile allowing search engines to access content and at the same time maintain a revenue stream for premium content?

The Archives: Free vs. Fee

Like many online news sites, the New York Times offers some content free to anyone with no registration required, some free content to users who have registered with a user name and password, and some content only to paying subscribers. But unlike other newspapers, which often remove all content to subscriber-only archives after a period of a week to a month, the vast majority of the content on the Times is available to anyone without a subscription. Let’s look at how this works.

The Times classifies content in three ways: Seven day content, “open” content, and archived/Times Select content. The Times produces about 500 articles per day, or about 3,500 articles per week which are freely available to anyone without registration for seven days from the publication date. After seven days, these articles are moved to either the open area or the archived/Times Select area, depending on the type of article.

But there’s a catch: Seven day content is free to anyone without subscription, though readers who continue to read other articles within the Times are asked to login or register for a free subscription once they’ve clicked more than five links. The Times plans to increase this threshold to eight clicks soon.

When content changes from seven day to open status, articles keep the same URL but are physically moved to another server. This content remains accessible to both search engines and users alike. Marshall says open content consists of more than 20 million documents from the papers, including general news, theater and movie reviews, sports news, classifieds and so on.

In all, 97% of the overall site ends up classified as open content, freely accessible to anyone who hasn’t used up their 5 link quota.

The remaining 3% is classified as archived content, also called “Times Select” materials. This content consists of daily columns, op-ed editorials, special features and so on. To access this content, you must be a subscriber to the print edition of the Times, or pay a $49.95 annual subscription fee.

Despite this restriction for human users, Marshall says that both Google and Yahoo have been allowed to fully index the premium content, and will display results for matching queries (for example, a search for popular Times Op-Ed columnist Thomas Friedman in Google and Yahoo returns hundreds of results).

Click through on many of these links from both engines, however, and you won’t see the content indexed by the search engines. Rather, the Times web site detects that the user agent is a browser, and serves up a shorter abstract page with a login form to access the premium content.

Isn’t this cloaking—serving different pages to a search engine and an individual web browser? Yes, it is.

Although both Google and Yahoo warn against cloaking, Marshall says both companies are aware of what the Times is doing, and apparently condone the practice.

“They want the content, and they’re very interested in displaying it,” says Marshall.

Google has allowed cloaked content from other sources before, and we’ve seen other instances where search engines are apparently looking the other way when cloaking is used by a web site. We plan to do a follow-up on cloaking policies at all of the search engines in the near future.

[Note: A discussion over at our Search Engine Watch Forums since this article was written has Danny deciding that the NY Times isn’t cloaking, since humans who either register with or have paid subscriptions with the New York Times do ultimately see the same content as was indexed. Marshall also emphatically stated in a follow-up conversation that the New York Times does not cloak.”

Writing for Search Engines

The Times, like most newspapers, has a long-standing tradition of writing compelling headlines that grab human-readers, but that may not literally describe the news story. For example, when the Pope died, Times reporters headlined stories with titles like “Papacy Change” or “Pilgrims converge on the Vatican.”

Marshall has now trained many editors and producers to write content friendly to both users and searchers. “We encouraged them to use “Pope John Paul dies” and offered a more literal approach based on keyword research and internal metrics,” said Marshall. “The response has been great. Everyone so far is very excited to reach audiences through search and help users find our content.”

Well, not exactly everyone. In April Times reporter Steve Lohr wrote a skeptical piece about newspapers coming to grips with search engines called This Boring Headline Is Written for Google. Oddly, the article doesn’t mention Marshall’s efforts at the Times.

Marshall’s success at educating traditionally trained journalists and editors is a great lesson for anyone new to search marketing to remember. Don’t get cutesy. Put yourself in the mind of your audience. Use the words your audience might use to seek your content. Even do a little research with tools out there. Then use the language your target audience will be searching with.

Beyond words, the Times has been thinking more about the photos they’re putting online, to optimize for image search. Previously, the Times ignored image tags. Now, all images have descriptive captions within the image ALT tag, making them more search friendly.

Marshall also says thousands of RSS feeds have been created for content from the papers, and a top level list can be found here.

Since initiating the search engine optimization campaigns in April, the New York Times newspapers have seen a dramatic increase in search referrals. At the flagship site, search referrals have increased by 59%. There’s been an even greater increase at the Boston Globe’s site, with referrals up by 83%. And the International Herald Tribune’s has seen referrals increase by 45%.

Going forward, the Times is busily digitizing all of the content it has published back to 1854—and though much of that content will be premium, Times Select content, it too will be findable through web search engines. The project is expected to be completed in about a year.

And Marshall’s efforts at improving the Times’ performance with search engines continues. “We’re living the model,” he said. “If you don’t integrate search into the day-to-day work flow you won’t succeed.”

Search Headlines

NOTE: Article links often change. In case of a bad link, use the publication’s search facility, which most have, and search for the headline.

From The SEW Blog…

Other Things We Read, Didn’t Blog But You Might Want To Read…

Blog: Optimizing All the News That’s Fit to Search The New York Times has one of the most popular news web sites, but until this year that was largely because of the strength of its brand. After its acquisition of, the Times embarked on an aggressive campaign to make its web site more search friendly, a complex process that’s paid off with notable traffic gains for the company. Today’s SearchDay article, Getting The New York Times More Search Engine Friendly, takes a look behind the scenes at how the Times and its vice president of enterprise search, Marshall Simmonds, pulled it off.


The 2023 B2B Superpowers Index
whitepaper | Analytics

The 2023 B2B Superpowers Index

Data Analytics in Marketing
whitepaper | Analytics

Data Analytics in Marketing

The Third-Party Data Deprecation Playbook
whitepaper | Digital Marketing

The Third-Party Data Deprecation Playbook

Utilizing Email To Stop Fraud-eCommerce Client Fraud Case Study
whitepaper | Digital Marketing

Utilizing Email To Stop Fraud-eCommerce Client Fraud Case Study