Marshall Simmonds is a long-time colleague who, as vice president of enterprise search, oversees the SEO work to make the New York Times more search engine friendly. The NYT acquired Marshall as part of the Times’ About.com purchase last year.
Prior to Marshall’s arrival at the Times, the company’s web sites, which also include the Boston Globe and the International Herald Tribune, had good traffic, but this was largely due to the strength of the brands. Little effort was expended on making search-friendly content, or making sure that search engines could easily crawl and index the sites.
“When I joined the company, Google was not crawling the Times,” said Marshall. “I’m pushing to get more traffic to the site.”
In a series of telephone and email interviews with Marshall, we learned what steps he took to change things at the Times to make the company’s web sites more search friendly. Most of the changes he made will be of interest to anyone responsible for in-house search engine optimization efforts.
The biggest problem for the Times with search engines was one common to most newspapers: The Times required users to register to read an article. Registration forms effectively block search engines from indexing a web site, as the crawlers can’t type a user name or password to access articles.
“Yahoo had indexed our registration page 20 million times,” quipped Marshall, but that turned out to be a serious issue. Yahoo’s crawler recognized the importance of the Times’ web site, but was unable to do anything but hammer away on the registration page that was ultimately displayed whenever the search engine crawler attempted to access a Times article by following a link.
Marshall’s first step was to allow search engine crawlers to have complete access to everything published by the Times, including archived content dating back to 1981. This was a major shift for the company, since archived content is only available to paying subscribers. How to reconcile allowing search engines to access content and at the same time maintain a revenue stream for premium content?
The Archives: Free vs. Fee
Like many online news sites, the New York Times offers some content free to anyone with no registration required, some free content to users who have registered with a user name and password, and some content only to paying subscribers. But unlike other newspapers, which often remove all content to subscriber-only archives after a period of a week to a month, the vast majority of the content on the Times is available to anyone without a subscription. Let’s look at how this works.
The Times classifies content in three ways: Seven day content, “open” content, and archived/Times Select content. The Times produces about 500 articles per day, or about 3,500 articles per week which are freely available to anyone without registration for seven days from the publication date. After seven days, these articles are moved to either the open area or the archived/Times Select area, depending on the type of article.
But there’s a catch: Seven day content is free to anyone without subscription, though readers who continue to read other articles within the Times are asked to login or register for a free subscription once they’ve clicked more than five links. The Times plans to increase this threshold to eight clicks soon.
When content changes from seven day to open status, articles keep the same URL but are physically moved to another server. This content remains accessible to both search engines and users alike. Marshall says open content consists of more than 20 million documents from the papers, including general news, theater and movie reviews, sports news, classifieds and so on.
In all, 97% of the overall site ends up classified as open content, freely accessible to anyone who hasn’t used up their 5 link quota.
The remaining 3% is classified as archived content, also called “Times Select” materials. This content consists of daily columns, op-ed editorials, special features and so on. To access this content, you must be a subscriber to the print edition of the Times, or pay a $49.95 annual subscription fee.
Despite this restriction for human users, Marshall says that both Google and Yahoo have been allowed to fully index the premium content, and will display results for matching queries (for example, a search for popular Times Op-Ed columnist Thomas Friedman in Google and Yahoo returns hundreds of results).
Click through on many of these links from both engines, however, and you won’t see the content indexed by the search engines. Rather, the Times web site detects that the user agent is a browser, and serves up a shorter abstract page with a login form to access the premium content.
Isn’t this cloaking—serving different pages to a search engine and an individual web browser? Yes, it is.
Although both Google and Yahoo warn against cloaking, Marshall says both companies are aware of what the Times is doing, and apparently condone the practice.
“They want the content, and they’re very interested in displaying it,” says Marshall.
Google has allowed cloaked content from other sources before, and we’ve seen other instances where search engines are apparently looking the other way when cloaking is used by a web site. We plan to do a follow-up on cloaking policies at all of the search engines in the near future.
[Note: A discussion over at our Search Engine Watch Forums since this article was written has Danny deciding that the NY Times isn’t cloaking, since humans who either register with or have paid subscriptions with the New York Times do ultimately see the same content as was indexed. Marshall also emphatically stated in a follow-up conversation that the New York Times does not cloak.”
Writing for Search Engines
The Times, like most newspapers, has a long-standing tradition of writing compelling headlines that grab human-readers, but that may not literally describe the news story. For example, when the Pope died, Times reporters headlined stories with titles like “Papacy Change” or “Pilgrims converge on the Vatican.”
Marshall has now trained many editors and producers to write content friendly to both users and searchers. “We encouraged them to use “Pope John Paul dies” and offered a more literal approach based on keyword research and internal metrics,” said Marshall. “The response has been great. Everyone so far is very excited to reach audiences through search and help users find our content.”
Well, not exactly everyone. In April Times reporter Steve Lohr wrote a skeptical piece about newspapers coming to grips with search engines called This Boring Headline Is Written for Google. Oddly, the article doesn’t mention Marshall’s efforts at the Times.
Marshall’s success at educating traditionally trained journalists and editors is a great lesson for anyone new to search marketing to remember. Don’t get cutesy. Put yourself in the mind of your audience. Use the words your audience might use to seek your content. Even do a little research with tools out there. Then use the language your target audience will be searching with.
Beyond words, the Times has been thinking more about the photos they’re putting online, to optimize for image search. Previously, the Times ignored image tags. Now, all images have descriptive captions within the image ALT tag, making them more search friendly.
Marshall also says thousands of RSS feeds have been created for content from the papers, and a top level list can be found here.
Since initiating the search engine optimization campaigns in April, the New York Times newspapers have seen a dramatic increase in search referrals. At the flagship NYTimes.com site, search referrals have increased by 59%. There’s been an even greater increase at the Boston Globe’s Boston.com site, with referrals up by 83%. And the International Herald Tribune’s IHT.com has seen referrals increase by 45%.
Going forward, the Times is busily digitizing all of the content it has published back to 1854—and though much of that content will be premium, Times Select content, it too will be findable through web search engines. The project is expected to be completed in about a year.
And Marshall’s efforts at improving the Times’ performance with search engines continues. “We’re living the model,” he said. “If you don’t integrate search into the day-to-day work flow you won’t succeed.”
Search Headlines
NOTE: Article links often change. In case of a bad link, use the publication’s search facility, which most have, and search for the headline.
From The SEW Blog…
Other Things We Read, Didn’t Blog But You Might Want To Read…
- The Bard’s the thing, Official Google Blog (Google puts Shakespeare’s works online)
- Why isn’t there a British Google?, Phil Bradley
- Google Queried on Net Neutrality Ads, Broadcasting & Cable (the pro-net neutrality ads aren’t being given away for free, Google’s Vint Cerf told a US senate committee. I asked the same last week, was told Google was checking on it but never got answer).
- Google Execs Hint at Voice Recognition Services, Micro Persuasion (Google Voice Search back in 2001 was more than a hint. See also here for an early write-up of it; here, here, here and here for similar services; and here for a recent revisit of Google getting into cars, for SEW members)
- The Rumors of our death are only slightly exaggerated, Bob Wyman of PubSub
- Website designers want searches to work for free, USA Today (overview of SEO; a number of prominent search marketers get named)
- No Privacy for Picasa Web Albums, Google Blogoscoped
- Google Is Killing the Economics of Content, Publishing 2.0
- Australian Sites Make It Easier To Find, Keep Illegally Posted Viral Video, PaidContent.org
- EBay to Add a Phone Link From Listings to Sellers, New York Times
- Some Google Groups Posts Removed in Germany, Google Blogoscoped
- Windows Live Causes Head Scratching At TechEd, InformationWeek
- How Should You Pay a Search Agency?, iMedia Connection
- The History Effect and Split Testing, iMedia Connection
- SEMcares Links Search Marketers With Nonprofits, DMNews
- Google finally puts Picasa albums on the Web, News.com
- Picasa Web Albums: First Impressions, InsideGoogle
- Sky Takes Lead in AOL UK Bid Contest; BT Sits Out, PaidContent.org
- Googlers’ Orkut profiles: The better parts, Valleywag
- Google sales chief says still testing display ads, Reuters
- MSN PPC delivers good ROI, (though not as many clicks), DM News
- Microsoft MSN adCenter Releases New Reporting Features, Search Engine Roundtable
- JPM Analysis: Google Showing More Third Links, Searchblog
Blog: Optimizing All the News That’s Fit to Search The New York Times has one of the most popular news web sites, but until this year that was largely because of the strength of its brand. After its acquisition of About.com, the Times embarked on an aggressive campaign to make its web site more search friendly, a complex process that’s paid off with notable traffic gains for the company. Today’s SearchDay article, Getting The New York Times More Search Engine Friendly, takes a look behind the scenes at how the Times and its vice president of enterprise search, Marshall Simmonds, pulled it off.