. Ethan Zuckerman and a handful of his colleagues at the Berkman Center for Internet and Society at Harvard Law School found themselves in endless disputes about the mainstream media and newer digital variations. Who sets the agenda? How is public debate shaped? What topics are covered or ignored?
Anecdotes favoring one side or another were as plentiful as pop-ups, but a comprehensive and reliable database that could track the daily rhythm of the news cycle over time and was available for public use didn’t exist. So Mr. Zuckerman and others at Berkman decided to create one.
The result is Media Cloud, a system that tracks hundreds of newspapers and thousands of Web sites and blogs, and archives the information in a searchable form. The database, at mediacloud.org, will eventually enable researchers to search for key people, places and events — from Michael Jackson to the Iranian elections — and find out precisely when, where and how frequently they are covered, said Mr. Zuckerman, whose official title is senior researcher, though he acknowledges that a more accurate label would be computer geek and international development specialist. (At the moment only a small sample of Media Cloud’s tools are on the public Web site.)
The findings, which can be graphed or mapped, can demonstrate the evolution of a report and variations in coverage. Users get to “do the fun part, which is analyzing the data,” Mr. Zuckerman said, “while we do the hard part of this, which is collecting it.” Eventually users will be able to compare the top 10 news events covered by Fox News, The Atlanta Journal-Constitution and the BBC, for example, or chart the terms that appear most frequently in The New York Times, compared with leading blogs, or create a world map showing which countries receive the most media attention, or follow the path of a particular report to see if it dominates the news or dies out.
For the past decade or so, many researchers have used link analysis to figure out how information spreads, said Yochai Benkler, a Harvard Law School professor at Berkman who has been involved in creating Media Cloud. You could identify which Web sites were linked to most frequently and infer whose sites were most influential. But researchers have pretty much squeezed all that they can from that approach, Mr. Benkler said. Although Media Cloud is still in its early stages, it is among “the next generation of tools that actually look at what people are saying,” he said, adding, it is “a better microscope.”
There are other kinds of media trackers. Cornell University researchers, for example, have developed MemeTracker, which maps the daily news cycle by grabbing repeated quotations from one million online sources. (A meme is anything — an idea, a phrase — that spreads by imitation from one person to another.)
Its graphs, which can be viewed at memetracker.org, display the reports that are competing against one another for attention on a given day, as well as those that have staying power or quickly disappear. A recent paper on MemeTracker’s experience during the presidential campaign was hailed by experts as a landmark piece of work.
Pew Research Center’s Project for Excellence in Journalism offers a news-coverage index, which is laboriously compiled by having 14 people sample leading reports produced by 55 outlets. Media Cloud is much less exact, Mr. Zuckerman said, but it can automatically scan hundreds, and eventually thousands, of sources.
Amy S. Mitchell, deputy director of Pew’s journalism project, said Media Cloud “offers the public a great opportunity to play around with looking at a wide swath of media at more of a surface level.” But, she added, it cannot really capture the nuances of the news agenda of the news media. “There are certain things that computer algorithms cannot do that individuals can,” she said.
Since every method has virtues and drawbacks, she added, “I think there is tremendous value in having both approaches.”
What Media Cloud offers that no one else does is a tool anyone can use to answer all sorts of questions about the media landscape. One topic that Mr. Benkler and Mr. Zuckerman have long been debating is whether the Internet has helped open up the public sphere to more voices, or whether it just serves as an echo chamber, simply repeating information and views that the mainstream media already circulate.
Mr. Benkler is using Media Cloud to test his theory that digital media is widening the circle of voices somewhat. Sites that he characterizes as “one link out” from the most visible (like The Huffington Post, Talking Points Memo and Instapundit) are entering into the conversation, he argues.
Who has the power to place an idea on the national agenda is another question that Mr. Benkler said Media Cloud could help answer. For instance, how is the conversation about the recession and the financial crash shaped? Using some of the database’s more specialized tools, Mr. Benkler investigated who first floated the idea for a temporary takeover of the financial system by the government, as was done in Sweden in the 1990s.
Paul Krugman, a columnist for The New York Times, first raised the idea in September 2008, but it was a cluster of influential economic and political bloggers like Brad DeLong and Matthew Yglesias who kept the idea alive. After the subject disappeared for a couple of months, the bloggers then resurrected it early this year as Washington began discussing the details of a bank rescue plan. To Mr. Benkler, this preliminary evidence suggests that the network of public media has given a voice to some people who in the past may have had useful ideas but were, as a practical matter, unable to inject them into the national conversation.
“If you’re actually trying to map where an idea starts and how it moves through the public sphere, you need a database like we’re developing, with time-stamped data,” Mr. Benkler said, explaining that services like Google and Lexis/Nexis are not as comprehensive or do not provide that level of detailed information. Media Cloud also enables more fine-grained analyses by examining language and context. “How does rhetoric change over time, and what’s the role of the Internet and the mainstream media in that?” Mr. Zuckerman asked.
Some of his colleagues, for example, have been tracking the frequency of the words bailout and stimulus to pinpoint when one term overtook the other. Media Cloud mapped the results to show how the term bailout, used constantly in the news in the fall, eventually gave way to the word stimulus after President Obama took office. The results were graphed, illustrating precisely when the two lines crossed — where, as Mr. Zuckerman would say, “one meme took over from the other.”
Media Cloud’s founders have put out an open call on their Web site for research ideas. The system provides a platform, Mr. Zuckerman said, on which others can build using their own kinds of tracking software. The point is to start with facts rather than impressions.
As Mr. Zuckerman noted, a lot of anecdotes don’t necessarily add up to the truth.
Original article can be found here.…
. Ethan Zuckerman and a handful of his colleagues at the Berkman Center for Internet and Society at Harvard Law School found themselves in endless disputes about the mainstream media and newer digital variations. Who sets the agenda? How is public debate shaped? What topics are covered or ignored?
Anecdotes favoring one side or another were as plentiful as pop-ups, but a comprehensive and reliable database that could track the daily rhythm of the news cycle over time and was available for public use didn’t exist. So Mr. Zuckerman and others at Berkman decided to create one.
The result is Media Cloud, a system that tracks hundreds of newspapers and thousands of Web sites and blogs, and archives the information in a searchable form. The database, at mediacloud.org, will eventually enable researchers to search for key people, places and events — from Michael Jackson to the Iranian elections — and find out precisely when, where and how frequently they are covered, said Mr. Zuckerman, whose official title is senior researcher, though he acknowledges that a more accurate label would be computer geek and international development specialist. (At the moment only a small sample of Media Cloud’s tools are on the public Web site.)
The findings, which can be graphed or mapped, can demonstrate the evolution of a report and variations in coverage. Users get to “do the fun part, which is analyzing the data,” Mr. Zuckerman said, “while we do the hard part of this, which is collecting it.” Eventually users will be able to compare the top 10 news events covered by Fox News, The Atlanta Journal-Constitution and the BBC, for example, or chart the terms that appear most frequently in The New York Times, compared with leading blogs, or create a world map showing which countries receive the most media attention, or follow the path of a particular report to see if it dominates the news or dies out.
For the past decade or so, many researchers have used link analysis to figure out how information spreads, said Yochai Benkler, a Harvard Law School professor at Berkman who has been involved in creating Media Cloud. You could identify which Web sites were linked to most frequently and infer whose sites were most influential. But researchers have pretty much squeezed all that they can from that approach, Mr. Benkler said. Although Media Cloud is still in its early stages, it is among “the next generation of tools that actually look at what people are saying,” he said, adding, it is “a better microscope.”
There are other kinds of media trackers. Cornell University researchers, for example, have developed MemeTracker, which maps the daily news cycle by grabbing repeated quotations from one million online sources. (A meme is anything — an idea, a phrase — that spreads by imitation from one person to another.)
Its graphs, which can be viewed at memetracker.org, display the reports that are competing against one another for attention on a given day, as well as those that have staying power or quickly disappear. A recent paper on MemeTracker’s experience during the presidential campaign was hailed by experts as a landmark piece of work.
Pew Research Center’s Project for Excellence in Journalism offers a news-coverage index, which is laboriously compiled by having 14 people sample leading reports produced by 55 outlets. Media Cloud is much less exact, Mr. Zuckerman said, but it can automatically scan hundreds, and eventually thousands, of sources.
Amy S. Mitchell, deputy director of Pew’s journalism project, said Media Cloud “offers the public a great opportunity to play around with looking at a wide swath of media at more of a surface level.” But, she added, it cannot really capture the nuances of the news agenda of the news media. “There are certain things that computer algorithms cannot do that individuals can,” she said.
Since every method has virtues and drawbacks, she added, “I think there is tremendous value in having both approaches.”
What Media Cloud offers that no one else does is a tool anyone can use to answer all sorts of questions about the media landscape. One topic that Mr. Benkler and Mr. Zuckerman have long been debating is whether the Internet has helped open up the public sphere to more voices, or whether it just serves as an echo chamber, simply repeating information and views that the mainstream media already circulate.
Mr. Benkler is using Media Cloud to test his theory that digital media is widening the circle of voices somewhat. Sites that he characterizes as “one link out” from the most visible (like The Huffington Post, Talking Points Memo and Instapundit) are entering into the conversation, he argues.
Who has the power to place an idea on the national agenda is another question that Mr. Benkler said Media Cloud could help answer. For instance, how is the conversation about the recession and the financial crash shaped? Using some of the database’s more specialized tools, Mr. Benkler investigated who first floated the idea for a temporary takeover of the financial system by the government, as was done in Sweden in the 1990s.
Paul Krugman, a columnist for The New York Times, first raised the idea in September 2008, but it was a cluster of influential economic and political bloggers like Brad DeLong and Matthew Yglesias who kept the idea alive. After the subject disappeared for a couple of months, the bloggers then resurrected it early this year as Washington began discussing the details of a bank rescue plan. To Mr. Benkler, this preliminary evidence suggests that the network of public media has given a voice to some people who in the past may have had useful ideas but were, as a practical matter, unable to inject them into the national conversation.
“If you’re actually trying to map where an idea starts and how it moves through the public sphere, you need a database like we’re developing, with time-stamped data,” Mr. Benkler said, explaining that services like Google and Lexis/Nexis are not as comprehensive or do not provide that level of detailed information. Media Cloud also enables more fine-grained analyses by examining language and context. “How does rhetoric change over time, and what’s the role of the Internet and the mainstream media in that?” Mr. Zuckerman asked.
Some of his colleagues, for example, have been tracking the frequency of the words bailout and stimulus to pinpoint when one term overtook the other. Media Cloud mapped the results to show how the term bailout, used constantly in the news in the fall, eventually gave way to the word stimulus after President Obama took office. The results were graphed, illustrating precisely when the two lines crossed — where, as Mr. Zuckerman would say, “one meme took over from the other.”
Media Cloud’s founders have put out an open call on their Web site for research ideas. The system provides a platform, Mr. Zuckerman said, on which others can build using their own kinds of tracking software. The point is to start with facts rather than impressions.
As Mr. Zuckerman noted, a lot of anecdotes don’t necessarily add up to the truth.
Original article can be found here.…
Everything website:
Enchanting By Numbers
When I was in Beijing last summer I dropped by the Microsoft research campus to talk with Dr. Yu Zheng. He studies the air pollution in his city, and the noise pollution in mine. Using algorithms he is able to predict what kinds of noises New Yorkers are most likely to hear in their neighborhoods, take a look at his Citynoise map. His algorithms could one day help city planners curb air pollution and noise or as Christian Sandvig notes they could be used by the GPS apps on our mobile devices to keep us from walking through neighborhoods perceived to have loud people hanging around outside.
Christian Sandvig studies algorithms which is hard to do, most companies like Facebook and Google don’t make their algorithms public. In a recent study he asked Facebook users to explain how they imagine the Edgerank algorithm works (this is the algorithm that powers Facebook’s news feed). Sandvig discovered that most of his subjects had no idea there even was an algorithm at work. When they learned the truth, it was like a moment out of the Matrix. But none of the participants remained angry for long. Six months later they mostly reported satisfaction with the algorithms that determine what the can and can’t see. Sandvig finds this problematic, because our needs and desires often don’t match with the needs and desires of the companies who build the algorithms.
“Ada’s Algorithm” is the title of James Essinger’s new book. It tells the remarkable story about Ada Lovelace the woman who wrote the first computer program (or as James puts it – Algorithm) in 1843. He believes Ada’s insights came from her “poetical” scientific brain. Suw Charman-Anderson, the founder of Ada Lovelace day, tells us more about this remarkable woman.
…
Everything website:
Enchanting By Numbers
When I was in Beijing last summer I dropped by the Microsoft research campus to talk with Dr. Yu Zheng. He studies the air pollution in his city, and the noise pollution in mine. Using algorithms he is able to predict what kinds of noises New Yorkers are most likely to hear in their neighborhoods, take a look at his Citynoise map. His algorithms could one day help city planners curb air pollution and noise or as Christian Sandvig notes they could be used by the GPS apps on our mobile devices to keep us from walking through neighborhoods perceived to have loud people hanging around outside.
Christian Sandvig studies algorithms which is hard to do, most companies like Facebook and Google don’t make their algorithms public. In a recent study he asked Facebook users to explain how they imagine the Edgerank algorithm works (this is the algorithm that powers Facebook’s news feed). Sandvig discovered that most of his subjects had no idea there even was an algorithm at work. When they learned the truth, it was like a moment out of the Matrix. But none of the participants remained angry for long. Six months later they mostly reported satisfaction with the algorithms that determine what the can and can’t see. Sandvig finds this problematic, because our needs and desires often don’t match with the needs and desires of the companies who build the algorithms.
“Ada’s Algorithm” is the title of James Essinger’s new book. It tells the remarkable story about Ada Lovelace the woman who wrote the first computer program (or as James puts it – Algorithm) in 1843. He believes Ada’s insights came from her “poetical” scientific brain. Suw Charman-Anderson, the founder of Ada Lovelace day, tells us more about this remarkable woman.
…
Everything website:
Enchanting By Numbers
When I was in Beijing last summer I dropped by the Microsoft research campus to talk with Dr. Yu Zheng. He studies the air pollution in his city, and the noise pollution in mine. Using algorithms he is able to predict what kinds of noises New Yorkers are most likely to hear in their neighborhoods, take a look at his Citynoise map. His algorithms could one day help city planners curb air pollution and noise or as Christian Sandvig notes they could be used by the GPS apps on our mobile devices to keep us from walking through neighborhoods perceived to have loud people hanging around outside.
Christian Sandvig studies algorithms which is hard to do, most companies like Facebook and Google don’t make their algorithms public. In a recent study he asked Facebook users to explain how they imagine the Edgerank algorithm works (this is the algorithm that powers Facebook’s news feed). Sandvig discovered that most of his subjects had no idea there even was an algorithm at work. When they learned the truth, it was like a moment out of the Matrix. But none of the participants remained angry for long. Six months later they mostly reported satisfaction with the algorithms that determine what the can and can’t see. Sandvig finds this problematic, because our needs and desires often don’t match with the needs and desires of the companies who build the algorithms.
“Ada’s Algorithm” is the title of James Essinger’s new book. It tells the remarkable story about Ada Lovelace the woman who wrote the first computer program (or as James puts it – Algorithm) in 1843. He believes Ada’s insights came from her “poetical” scientific brain. Suw Charman-Anderson, the founder of Ada Lovelace day, tells us more about this remarkable woman.
…
know what words you use too often.
On an older post "Ideas on making Julius Caesar POP?" you can scroll down to see a cool example of a word cloud done for that play.
In the meantime a Stanford English professor is looking to use data and word clouds to do what is called "distant reading" - fascinating idea - coolest part of the blurb are in bold. From WIRED Magazine...
If Google has its way, all of English literature will one day exist as searchable digital text. Franco Moretti, a Stanford English professor, wants to be ready for the deluge with new kinds of questions and new tools to answer them — things like computational linguistics, data mining, computer modeling, and network theory. Moretti is already famous in bookish circles for his data-centric approach to novels, which he graphs, maps, and charts. Until recently, though, he’s been able to crunch only a few novels at a time, doing all that quantitative stuff by hand. Now he’s going digital, building searchable databases of old books, working to write software that can mine for patterns. Instead of diving deep into a few beloved titles, Moretti aims to zip across the creative output of entire eras. He calls it distant reading, and if his new methods catch on, they could change the way we look at literary history.
Take one experiment. Moretti decided to test the idea that Victorian writers, through their choice of adjectives, might reveal their belief that moral qualities were indivisible from reality itself and that physical traits reflected a person’s virtue. So he assembled a database of 250 novels and sent the file to computer scientists at IBM’s Visual Communications Lab, who turned the books into a series of word clouds. “Boom! There were exactly the adjectives I had hoped would pop up!” he says. “Adjectives like strong, bright, fair, in which the physical and the moral blend.”
For another project, he looked at the titles of 7,000 books in 18th- and 19th-century England and discovered a correlation between shorter titles and the growth of the book publishing industry. (Moretti theorizes that more concise titles made books easier to promote in a crowded marketplace.) He is also working with a programmer to test new software that can “read” terabytes of obscure, mostly unread fiction and classify the books by genre.
“In 19th-century Britain, maybe 30,000 novels were published,” Moretti says. He is dying to analyze them all. It will be like peering through the first telescope, he says — surveying more literature at a glance than he could read in a lifetime. “We will get a sense,” he says, “of a much wider universe.”
SOURCE: http://www.wired.com/magazine/2009/11/pl_print/…
know what words you use too often.
On an older post "Ideas on making Julius Caesar POP?" you can scroll down to see a cool example of a word cloud done for that play.
In the meantime a Stanford English professor is looking to use data and word clouds to do what is called "distant reading" - fascinating idea - coolest part of the blurb are in bold. From WIRED Magazine...
If Google has its way, all of English literature will one day exist as searchable digital text. Franco Moretti, a Stanford English professor, wants to be ready for the deluge with new kinds of questions and new tools to answer them — things like computational linguistics, data mining, computer modeling, and network theory. Moretti is already famous in bookish circles for his data-centric approach to novels, which he graphs, maps, and charts. Until recently, though, he’s been able to crunch only a few novels at a time, doing all that quantitative stuff by hand. Now he’s going digital, building searchable databases of old books, working to write software that can mine for patterns. Instead of diving deep into a few beloved titles, Moretti aims to zip across the creative output of entire eras. He calls it distant reading, and if his new methods catch on, they could change the way we look at literary history.
Take one experiment. Moretti decided to test the idea that Victorian writers, through their choice of adjectives, might reveal their belief that moral qualities were indivisible from reality itself and that physical traits reflected a person’s virtue. So he assembled a database of 250 novels and sent the file to computer scientists at IBM’s Visual Communications Lab, who turned the books into a series of word clouds. “Boom! There were exactly the adjectives I had hoped would pop up!” he says. “Adjectives like strong, bright, fair, in which the physical and the moral blend.”
For another project, he looked at the titles of 7,000 books in 18th- and 19th-century England and discovered a correlation between shorter titles and the growth of the book publishing industry. (Moretti theorizes that more concise titles made books easier to promote in a crowded marketplace.) He is also working with a programmer to test new software that can “read” terabytes of obscure, mostly unread fiction and classify the books by genre.
“In 19th-century Britain, maybe 30,000 novels were published,” Moretti says. He is dying to analyze them all. It will be like peering through the first telescope, he says — surveying more literature at a glance than he could read in a lifetime. “We will get a sense,” he says, “of a much wider universe.”
SOURCE: http://www.wired.com/magazine/2009/11/pl_print/…
know what words you use too often.
On an older post "Ideas on making Julius Caesar POP?" you can scroll down to see a cool example of a word cloud done for that play.
In the meantime a Stanford English professor is looking to use data and word clouds to do what is called "distant reading" - fascinating idea - coolest part of the blurb are in bold. From WIRED Magazine...
If Google has its way, all of English literature will one day exist as searchable digital text. Franco Moretti, a Stanford English professor, wants to be ready for the deluge with new kinds of questions and new tools to answer them — things like computational linguistics, data mining, computer modeling, and network theory. Moretti is already famous in bookish circles for his data-centric approach to novels, which he graphs, maps, and charts. Until recently, though, he’s been able to crunch only a few novels at a time, doing all that quantitative stuff by hand. Now he’s going digital, building searchable databases of old books, working to write software that can mine for patterns. Instead of diving deep into a few beloved titles, Moretti aims to zip across the creative output of entire eras. He calls it distant reading, and if his new methods catch on, they could change the way we look at literary history.
Take one experiment. Moretti decided to test the idea that Victorian writers, through their choice of adjectives, might reveal their belief that moral qualities were indivisible from reality itself and that physical traits reflected a person’s virtue. So he assembled a database of 250 novels and sent the file to computer scientists at IBM’s Visual Communications Lab, who turned the books into a series of word clouds. “Boom! There were exactly the adjectives I had hoped would pop up!” he says. “Adjectives like strong, bright, fair, in which the physical and the moral blend.”
For another project, he looked at the titles of 7,000 books in 18th- and 19th-century England and discovered a correlation between shorter titles and the growth of the book publishing industry. (Moretti theorizes that more concise titles made books easier to promote in a crowded marketplace.) He is also working with a programmer to test new software that can “read” terabytes of obscure, mostly unread fiction and classify the books by genre.
“In 19th-century Britain, maybe 30,000 novels were published,” Moretti says. He is dying to analyze them all. It will be like peering through the first telescope, he says — surveying more literature at a glance than he could read in a lifetime. “We will get a sense,” he says, “of a much wider universe.”
SOURCE: http://www.wired.com/magazine/2009/11/pl_print/…
know what words you use too often.
On an older post "Ideas on making Julius Caesar POP?" you can scroll down to see a cool example of a word cloud done for that play.
In the meantime a Stanford English professor is looking to use data and word clouds to do what is called "distant reading" - fascinating idea - coolest part of the blurb are in bold. From WIRED Magazine...
If Google has its way, all of English literature will one day exist as searchable digital text. Franco Moretti, a Stanford English professor, wants to be ready for the deluge with new kinds of questions and new tools to answer them — things like computational linguistics, data mining, computer modeling, and network theory. Moretti is already famous in bookish circles for his data-centric approach to novels, which he graphs, maps, and charts. Until recently, though, he’s been able to crunch only a few novels at a time, doing all that quantitative stuff by hand. Now he’s going digital, building searchable databases of old books, working to write software that can mine for patterns. Instead of diving deep into a few beloved titles, Moretti aims to zip across the creative output of entire eras. He calls it distant reading, and if his new methods catch on, they could change the way we look at literary history.
Take one experiment. Moretti decided to test the idea that Victorian writers, through their choice of adjectives, might reveal their belief that moral qualities were indivisible from reality itself and that physical traits reflected a person’s virtue. So he assembled a database of 250 novels and sent the file to computer scientists at IBM’s Visual Communications Lab, who turned the books into a series of word clouds. “Boom! There were exactly the adjectives I had hoped would pop up!” he says. “Adjectives like strong, bright, fair, in which the physical and the moral blend.”
For another project, he looked at the titles of 7,000 books in 18th- and 19th-century England and discovered a correlation between shorter titles and the growth of the book publishing industry. (Moretti theorizes that more concise titles made books easier to promote in a crowded marketplace.) He is also working with a programmer to test new software that can “read” terabytes of obscure, mostly unread fiction and classify the books by genre.
“In 19th-century Britain, maybe 30,000 novels were published,” Moretti says. He is dying to analyze them all. It will be like peering through the first telescope, he says — surveying more literature at a glance than he could read in a lifetime. “We will get a sense,” he says, “of a much wider universe.”
SOURCE: http://www.wired.com/magazine/2009/11/pl_print/…
know what words you use too often.
On an older post "Ideas on making Julius Caesar POP?" you can scroll down to see a cool example of a word cloud done for that play.
In the meantime a Stanford English professor is looking to use data and word clouds to do what is called "distant reading" - fascinating idea - coolest part of the blurb are in bold. From WIRED Magazine...
If Google has its way, all of English literature will one day exist as searchable digital text. Franco Moretti, a Stanford English professor, wants to be ready for the deluge with new kinds of questions and new tools to answer them — things like computational linguistics, data mining, computer modeling, and network theory. Moretti is already famous in bookish circles for his data-centric approach to novels, which he graphs, maps, and charts. Until recently, though, he’s been able to crunch only a few novels at a time, doing all that quantitative stuff by hand. Now he’s going digital, building searchable databases of old books, working to write software that can mine for patterns. Instead of diving deep into a few beloved titles, Moretti aims to zip across the creative output of entire eras. He calls it distant reading, and if his new methods catch on, they could change the way we look at literary history.
Take one experiment. Moretti decided to test the idea that Victorian writers, through their choice of adjectives, might reveal their belief that moral qualities were indivisible from reality itself and that physical traits reflected a person’s virtue. So he assembled a database of 250 novels and sent the file to computer scientists at IBM’s Visual Communications Lab, who turned the books into a series of word clouds. “Boom! There were exactly the adjectives I had hoped would pop up!” he says. “Adjectives like strong, bright, fair, in which the physical and the moral blend.”
For another project, he looked at the titles of 7,000 books in 18th- and 19th-century England and discovered a correlation between shorter titles and the growth of the book publishing industry. (Moretti theorizes that more concise titles made books easier to promote in a crowded marketplace.) He is also working with a programmer to test new software that can “read” terabytes of obscure, mostly unread fiction and classify the books by genre.
“In 19th-century Britain, maybe 30,000 novels were published,” Moretti says. He is dying to analyze them all. It will be like peering through the first telescope, he says — surveying more literature at a glance than he could read in a lifetime. “We will get a sense,” he says, “of a much wider universe.”
SOURCE: http://www.wired.com/magazine/2009/11/pl_print/…