April 01, 2004

Disassociated Press: V-I Day - Internet Wins

April 1, 2004
Candle H. McCall
Disassociated Press

LALA LAND - In a surprise announcement today, all the world's government announced they were to disband in favor of Internet Emergent Democracy. United Nations (U.N.) Secretary General Nosee TheMan issues the shocking communique.

"War, famine, plague, and pestilence, have set man against man throughout human history. But it's a New Era. A few hackers with computers have changed it all. We're giving up. Bloggers, filesharers, and flame-wars trump laws, guns, and money.

We used to think we could control the world. But now, with the advent of The Internet, we all know we have no choice but to become technolibertarian cryptoanarchies."

Robber Baron Snidely Whiplash said "Curses! Foiled again! I would have gotten away with it too, if it weren't for those darn geeks".

By Seth Finkelstein | posted in politics | at 09:03 AM (Infothought permalink) | Comments (0) | Followups (0)
March 31, 2004

More on "Belle de Jour" blog potential hoax

"The Book Club Blog" has a great collection of information as to whether the supposed "diary of a london call girl", the Belle de Jour blog, is a hoax.

I remain with the skeptics. "She" recently "said" (my emphasis):

Unfortunately for the conspiracy theorists, there is no conspiracy. I am a young woman, I have sex for money, and I love to read and write. My taste in books shouldn't come as a surprise. After all, this job affords more spare time than most. Think of Occam's razor, the principle of parsimony: what would be simpler - that I am who I say I am, and write about, or that I am a famous author living a double life, unable to tell anyone and having a joke at the expense of my agent, publisher and readers? What does bother me is the presumption that a person's occupation is a reflection of their intelligence or value to society:

Let me reframe:

"... that I am a real well-read call-girl who instantly writes award-winning polished prose, or that I am a not-so-famous author who would like to be more famous, and saw an opportunity to do so by writing a fake blog and feeding on the media appetite for sex and the Internet and blogs and selling papers via titillation and scandal?"

When this question is put forth, there's almost a lawyer-trick of deflecting the suspicion by pounding the table and accusing the skeptic of bigotry: You think prostitutes can't be smart! Sexist! Classist!

No. I think writing is hard work for anyone. And that Occam's razor, the principle of parsimony, is that an established writer claiming to be a media-attention-draw is very likely indeed, much more so than such a real person getting awards and book deals. It's just ghost-writing taken one step further, where the writer starts by creating the celebrity in the first place (rather a clever idea, in retrospect).

Given the forthcoming "Belle de Jour" book, I was tempted to suggest turning its Amazon book reviews section into a hoax-information discussion forum. But that's probably playing into the book's buzz-hype. Still, it was an appealing thought.

By Seth Finkelstein | posted in journo | at 11:59 PM (Infothought permalink) | Comments (0) | Followups (0)
March 30, 2004

"Jew Watch", Google, and Evil

Search "Jew"

As noted by http://www.jewishjournal.com/home/preview.php?id=11998 (via JOHO the blog):

"Online searchers punching the word "Jew" into the Google search engine may be surprised at the results they get.

In fact, the No. 1 result for the search entry "Jew" turns out to be www.jewwatch.com. The fanatically anti-Semitic hate site is ranked first in relevance of more than 1.72 million Web pages."

Hate groups are learning search engine optimization. That ranking is no accident.

The No. 1 ranking of Jew Watch came as a surprise to David Krane, the director of corporate communications for the San Mateo-based Web giant.

Such a page might not pop up for Google searchers in European countries, where Holocaust denial is illegal. But Krane adamantly stated that Google has no plans to manually alter the results of their ranking system to knock Jew Watch from its top spot.

Yup (to all).

Do a German search for "Jew", or French search "Jew", the hate site is not there. For exactly the Google censorship reason noted. This is well-known, from the first "Localized Google search result exclusions" report by Benjamin Edelman and Jonathan Zittrain.

But it's a legal site in the US, full protected under the First Amendment as political speech.

This is an excellent example for a many points I made, but in specific:

Google ranks popularity, not authority

By Seth Finkelstein | posted in google | at 11:58 PM (Infothought permalink) | Comments (2) | Followups (0)
March 29, 2004

Google, image searching, and censorware circumvention

[I wrote this letter about news article regarding students using Google image search as a means of circumventing censorware]

Dear Annalee Newitz

I read with great interest your story on Google, censorware, and image searching, as a school censorware problem, at:
http://www.alternet.org/story.html?StoryID=18213

I've published much work about the issue of Google image searching and similar sites being a "loophole" for censorware. It even was referenced in the expert reports in the District Court decision on library censorware (unfortunately, it has been extremely poorly publicized and otherwise unreported). See, for example:

District Court CIPA decision
http://sethf.com/pipermail/infothought/2002-May/000010.html

BESS's Secret LOOPHOLE: (censorware vs. privacy and anonymity)
http://sethf.com/anticensorware/bess/loophole.php

BESS vs The Google Search Engine (Cache, Groups, Images)
http://sethf.com/anticensorware/bess/google.php

BESS vs Image Search Engines
http://sethf.com/anticensorware/bess/image.php

The Pre-Slipped Slope - censorware vs the Wayback Machine web archive
http://sethf.com/anticensorware/general/slip.php

But I noted one major error in your article, in this part:

> The second problem, which is strictly laughable, is that regular
> Google also has caching. When I recently did a Google search (not an
> image search) on "hot naked babes," I was able to retrieve images of
> naked people from the cache.

I don't think this is what happened. It just seemed that way. What really happened is that when you retrieved the text page from the Google cache, it had within it, image links to the naked people pictures at the non-Google sites. Since your computer was not censorware'd, you were able to retrieve those images. But again, that wouldn't have worked in the case where censorware prevented you from viewing anything on the non-Google image sites.

Note, however, the retrieval would work the way you described, with the Wayback Machine web archive:
http://www.archive.org/

Perhaps that will be the next site to become popular with students, and then prohibited.

By Seth Finkelstein | posted in censorware , google | at 11:59 PM (Infothought permalink) | Comments (1) | Followups (0)
March 26, 2004

Google-bombing cannot be defused trivially

There's a proposed Google-bombing solution in the article "Five-domain Googlebomb explodes in boardroom":

"An easy fix for many bombs," explains Brandt "Google should not use terms in external links to boost the rank of a page on those terms, unless those terms are on the page itself. This is a no-brainer. But it means another CPU cycle per link, which is why Google won't do it."

Unfortunately, I have to disagree here. It's not so simple. In fact, the way it works now is ultimately the Right Thing from a technical point of view, in terms making relevancy inferences from a simple algorithm.

One nontrivial reason is misspellings. If many people make the same spelling error in linking (such as turning "Dan Gillmor" into "Dan Gilmore"), it's useful to return that linked page for the search, rather than ignoring it since the wrong spelling likely won't be on the target page.

There's also issues with robots.txt. The robots.txt file isn't for privacy, it's just an advisory to have search-spiders work more efficiently (think of how ill-considered it would be, to have a public file listing material which should not be viewed - "Do Not Look Here"). If the site doesn't want spidering, but many people link to it with certain words, it seems a reasonable thing to return that site for those words. The option of not returning the site isn't necessarily right, because sites often just use robots.txt to avoid the load of being spidered, rather than to hide in any way.

Many issues with Google, or any complex search system, are more subtle than they might appear at first glance.

By Seth Finkelstein | posted in google | at 11:59 PM (Infothought permalink) | Comments (0) | Followups (0)
March 25, 2004

I can bomb that Google in ...

The Register has an article today, "Five-domain Googlebomb explodes in boardroom", talking about connecting the phrase "out of touch executives" to Google.

As I've noted, e.g. in discussing the miserable failure Google-bomb, the key concept is the confusion between popularity and authority. But I'm not sure how far this can be pushed in terms of giving relevance to obscure phrases.

Perhaps an experiment is in order, to demonstrate a principle.

Let's say I linked a certain phrase "EBig EBrother" to somewhere (such as Google ...). I've used an uncommon phrase here, so as to make it easy. The words "Big Brother" have many hits, but there's no occurrence of "EBig EBrother". Well, there wasn't until this post gets indexed.

What happens?

By Seth Finkelstein | posted in google | at 11:59 PM (Infothought permalink) | Comments (0) | Followups (0)
March 24, 2004

Cites & Insights - special "Broadcast Flag" edition

Walt Crawford has a special "Broadcast Flag" edition of his library 'zine (not blog) "Cites & Insights":

On November 4, 2003, the Federal Communications Commission (FCC) adopted a Report and Order and Further Notice of Proposed Rulemaking in the Matter of: Digital Broadcast Content Protection, MB Docket 02230. In English, the FCC adopted the Broadcast Flag. You can find the lengthy report (72 pages single-spaced, plus four appendices) on the web. This commentary may be long but it's far from comprehensive--and certainly not final, since the rulemaking is only a first step. My aim here is to provide a reasonable sampling of background, direct documents, and apparent consequences--and to give you some reason to believe that librarians, and those concerned with the future of digital technology in the U.S., should be concerned about the Broadcast Flag and its implications.

All worth reading, and recommended. I've not been much involved in that battle, though I've mentioned some "Broadcast Flag" strategies.

I do have one note of commentary (emphasis mine):

Paragraph 41 is also interesting as it cites limits within DMCA: nothing in this section shall require that the design of, or the design and selection of parts and components for, a consumer electronics, telecommunications, or computing product provide for a response to any particular technological measure, so long as such part or component, or the product in which such part or component is integrated, does not otherwise fall within the provisions... In other words, DMCA doesn't require new technological measures. Does that call into question the FCC's ability to impose such measures? Not according to the FCC: They limit the significance of the emphasized section to one subsection of DMCA, and deem it as not in any way limiting the FCC from imposing such requirements.

Well, sadly, basically, the FCC is right on this point (in my nonlawyer but DMCA studied view). The DMCA does not require a broadcast flag. But there's no pre-emption or affirmative limit there. That is, even though the DMCA doesn't mandate it, some other law or regulation could give the FCC the power to impose this, and that would not be a conflict. That's what the FCC is saying.

The FCC's claim to have authority over equipment-makers strikes me as broad, but there might actually be some precedent for it. But even if so, it would be on a very different basis from the DMCA.

By Seth Finkelstein | posted in copyblight , dmca | at 08:36 AM (Infothought permalink) | Comments (1) | Followups (0)
March 23, 2004

More on "Belle de Jour" as fake blog

Checking other "Belle de Jour" articles, I found one which argued skepticism based on a "Gender Genie", an algorithm for allegedly determining male or female authorship. Comments pointed out the statistics are unimpressive.

So I tried testing the infamous book review, the (female author) passage of text which supposedly formed the basis of the recent identity hunt.

In the results below, there's a caveat "(NOTE: The genie works best on texts of more than 500 words.)". All book reviews were given as "nonfiction" category writing.

Words: 256

Female Score: 74
Male Score: 346

The Gender Genie thinks the author of this passage is: male!

Amusing, when I clicked on feedback submission ( "Am I right? The author of this passage is actually ..."), the results were:

That is one butch chick.

According to Koppel and Argamon, the algorithm should predict the gender of the author approximately 80% of the time.
Accuracy Results
Am I right?
yes 129165 (63.72%)
no 73542 (36.28%)

Note coin-flipping will be right 50% of the time. So 80% is interesting, but not all that amazing. And 63%, for this implementation, seems only a slight improvement on the coin-flipping algorithm.

Testing a second review:

Words: 143
Female Score: 172
Male Score: 192
The Gender Genie thinks the author of this passage is: male!

Testing a third review:

Words: 261
Female Score: 337
Male Score: 280

The Gender Genie thinks the author of this passage is: female!

One out of three is bad (though granted, these are small-word samples)

So, now testing the "Belle de Jour" first month archive:

Considered as category "fiction" or "nonfiction":

Words: 1785
Female Score: 2138
Male Score: 1936

The Gender Genie thinks the author of this passage is: female!

Considered as category "blog entry" (apparently different keywords)

Words: 1785
Female Score: 2326
Male Score: 3384

The Gender Genie thinks the author of this passage is: male!

I can't see these results as worth much at all.

By Seth Finkelstein | posted in journo | at 11:30 PM (Infothought permalink) | Comments (1) | Followups (0)