A Networked World: Metadata Archives

September 29, 2003

Tom Does the Heavy Lifting on Push

Tom has done a great job in rounding up and pushing on with push technology with Push off: why early attempts at push media sucked. Which has, as usual inspired some thoughts.

George Bush took heavy flak last week when he acknowledged that he doesn't read news "because that's just opinion" and he gets everything he needs to know from the people around him. I wont get into the scrap over that except to say that what he calls opinion, I call the struggle for meaning, and we can't live without it.

If you whack me in the nose, one of my first responses is, "what the hell did you do that for?". I know what you did, but what I need more than anything is to understand why. That's why the victims of killers are very often more interested in why they did it, than simply getting revenge. We have come up with the idea that things mean something, and we will drag ourselves through any kind of trouble to figure out what the meaning is.

The net can be so addictive because we are surrounded by people with ideas about what things means. We are now getting some really valuable tools for filtering those opinions and, as Tom suggests, responding to them.

Why does the ability to respond matter? Partly because we need to make those opinions our own by actively engaging with them, either to tear into small pieces, or agree and add something that gets fed back to the originator who might also respond.

Why does that matter? Because the thing that makes us happy is not consumption, but production. Productive people gain huge personal, emotional and physical health benefits. Even more, when we have a conversation we gain acknowledgement. In a vast, impersonal universe where, for the most part we are ignored or dismissed, the net lets us participate in conversations where what we have to say is taken seriously. Even flames are a sign that what we have said matters to someone.

That is why the corporate media model is so irrelevant on the net, if we wanted to be preached at we can go to church, or watch Fox or CNN or Channel 9, but on the net we can influence the discourse, our ideas can participate in memes that change and grow, even if they wither and die. What some see as a dehumanising relationship with technology is a door to a very human thing, conversation.

Metadata tries to assign meaning to content, but opinion is metadata, and RSS is bringing that ever more subtly to our desktops. One of the things I love about Awasu is that I can tell it to search the feeds for a keyword (the search engine is still fairly primitive) and it then aggregates all the postings that meet that term into a single feed, regardless of who posted them. With more sophisticated filters we may move beyond reading the blogs to reading metafeeds focused on our specific interests. As I said, a little push, a little pull, a Pushmepullyou

September 29, 2003 in Knowledge Economy, Metadata, Society | Permalink | Comments (1) | TrackBack

September 26, 2003

Good News On Metadata - Content is Just a Lighthouse

David Weinberger has been blogging on the TTI Vanguard conference (OK I'm envious) and it looks like there are some voices being raised about the silliness of expecting people to behave like machines so machines can understand people.

Three clips.
People don't like filling out forms and entering metadata explicitly. (Or we refuse to do it, ignore it whenever possible, and do it perfunctorily the rest of the time) So, a KM system ought to mine content for metadata. (Or mine relationships, remember, Google doesn't know anything about anything, only that people think this URL is more important than that one. Content may be just a token, a beacon that identifies the location of knowledge, like a lighthouse on a rock)
Filling in metadata makes us pull back from the world, an attitude that goes against our biology. In fact, it's desire itself that draws us into the world and makes us shudder as we draw back from it. (Mhmm, looks like a very interesting conference)
The way people describe how they share information isn't how they share information. (The way people describe how they do anything, isn't how they do it, that's why usability testing doesn't include surveys, we either lie, or we don't know so we guess.)

I keep banging on about Google being a reputation tool, I want to revise that. I think PageRank measures consensus even when there is no agreement. Agreement is a negotiated property, even when we agree to disagree, we have to negotiate it, but consensus is an emergent property. Even in face to face meetings, a group of us will talk, maybe for a long time, and then someone will say "I think we have a consensus here" and then we will agree or disagree that it is so.

That's what Google does, listens to a whole lot of people talking and then says "these people have reached a consensus about this subject" and if enough of them are included in the consensus, the URL rises to the top.

OK, probably trivial.

September 26, 2003 in Metadata | Permalink | Comments (0) | TrackBack

September 17, 2003

The Genius of Cognitive Processing or Throw Out the Spellchecker!

I have the joy of being married to a linguist who specialises in the development of child language so I am exposed to all sorts of fascinating, and increasingly wierd knowledge. She passed on this piece today that came from one of her students and which demonstrates the staggering subtlety and redundancy of human language processing. Imagine trying to programme a computer to read this.

Aoccdrnig to a rscheearchr at an Elingsh uinervtisy, it deosn't mttaer in waht oredr the ltteers in a wrod are, the olny iprmoetnt tihng is taht frist and lsat ltteer is at the rghit pclae. The rset can be a toatl mses and you can sitll raed it wouthit porbelm. Tihs is bcuseae we do not raed ervey lteter by it slef but the wrod as a wlohe.

ceehiro

Now, that is only an artificial construct called text which is not language itself, but even though the text ought to be impossible to decipher because it breaks most of the rules we assume need to be obeyed, it is almost literally child's play.

When it comes to deciphering speech we are even faster and more subtle and when it comes to the meaning of the speech, the processing demands should be astronomical; except that they are not. We can carry on a perfectly communicative conversation while we are driving a manual car on a winding road and not skip a beat on any of the concurrent tasks, any one of which would overwhelm the fastest CPU with the biggest RAM on the planet. One day we'll figure out what is going on here, I hope I'm around to find out what it is.

September 17, 2003 in Metadata | Permalink | Comments (0) | TrackBack

September 11, 2003

More On Open Reference

Looks like Tim Langeman is on to a meme with his Openreference ideas. Tom Morris and Jean Burgess are both talking along the same lines, although not in quite as structured a way. I think what Tim is essentially talking about is some kind of web service and it might very well be able to be delivered through something as simple as a bookmarklet like LibraryLookup or through a plugin such as Awasu has for IE where I can subscribe to a feed through a right click on the page.

Opera also allows me to highlight a word or piece of text and using the right click I can search for it, translate it, get a doictionary definition, look it up in an encyclopedia or translate it. I'm sure, as Tim suggests, Amazon et al could provide a plugin or bookmarklet that lets me highlight some text and use it to search their database. It could start by assuming the text is a name, then a title then a keyword and just run it through their system for me. A reputation engine would be accessed in the same way.

Tim wrote the piece in November 2000 so quite a lot of what he proposes is now done or much more possible than it was then. He also raises some specific questions that I think are nearer being settled now.

The problem of singular references
Currently, when authors create links, they must choose one page as the target of their reference. This makes links a valuable commodity and has given rise to affiliate programs. I believe that the limitation of only one site per link is a problem because, when combined with affiliate programs, commercial interests tend to dominate all other considerations

As we get better at filters we will find that they replace links as essentially tradeable commodities and we will get nearer to "people's ideas" as objects that we can assemble into more or less coherent universes for us to explore.

I have a bee in my bonnet about horizons that I think falls into this idea. I want to be able to start anywhere and tune my horizons for a variety of factors such as "documents less than 3 links deep from this document" or documents written by people who are linked to by this author and which intersect with my keyword set for this document" or documents that intersect with my keywords written by anyone on my blogroll (and anyone they link to with the same keywords" and on through time "less than/ more than X weeks old/ written between these dates etc" or geography, institution and so on. You might want to check It All Depends on the Meaning of "Interest" on that subject too.

What I want is a right-click tool that takes as my starting point whatever document I have before me right now, a text file, a pdf, a web page an email and then offers me a set of sliders to tune the horizons in various ways until it generates a satisfactory world of ideas and people that lets me move to the next step of the process, whatever I might be doing.

I keep hoping that Enfish will come up with the goods, maybe it has, but it doesn't work with Eudora or Opera so tough luck for the moment.

Tim does raise the question of whether treating people as objects in the technical sense will lead to dehumanising them and its a fair point, but he then goes on to suggest that we might have control over references to us. Apart from the impossibility of policing that without a police force in an open system, the reality is that when we set up part of our psychic house on the net, we forego that privilege. It doesn't happen in the real world, expecting it to happen online is dreaming.

He is also prescient in his notes on freedom of speech

Suppose that customers shopping in a future store could use a referencing system that allowed them to review companies’ labor and environmental practices. This referencing system could also display evaluations of products’ design and construction and a section of customer reviews. Should reviewers be required to obtain the permission of the companies whose products they wish to review? Most westerners would answer in the negative, citing this as a case where free speech rights or public interests overrule private interests

Since he wrote that there have been several cases of companies, mostly software businesses, demanding the right to vet anything said about their product and even Microsoft wanting to be able to decide whether you have written something nasty about them using their software, and forbidding it, or remoiving your license to use their product etc.

His points about author's and reader's rights are a fair, and open, point and this piece brings us straight back to blogs again and the discussion being held between Tom Morris and Jean Burgess.

equality and access

All of this talk about negotiation assumes that participants are to some degree equal enough to make negotiation possible, for if either of the participants were in a totally superior position, the one could coerce the other. The question I find myself asking next is: "How equal do participants have to be?"

Equal enough to have a blog at least. But then we start having to deal with the inherent inequalities of power laws in cyberspace, and its time to give my brain another rest.

September 11, 2003 in Knowledge Economy, Metadata, Networks, Smart Mobs | Permalink | Comments (0) | TrackBack

September 02, 2003

Weblogging As Spinach Between the Teeth

Tim Langeman's Weblog has a piece called Fragments of Vision that chimes nicely with what I started to talk about on the A list piece.

Weblogs play into two ideas I have had for a while. One is that we really don't know what we think until we try to explain it and the other is that we don't know who we are until we go back and listen to what we have said.

On the first; I used to write a thrice weekly piece for radio, it ran 90 seconds and had to be tight as a drum. Several times I started out on one tack and ended up taking a completely different position because in trying to explain myself, I realised that the position was untenable. Comes as a hell of a surprise the first time you do it, after that it turns into a good way to understand things about yourself.

The second is an extension of that through time.

I've asked Typepad for a tool that lets me search the content on a keyword (which they have) and then to take the results of the search and assign the posts to a new category, one that might only become evident after writing for a while. (Apart from the intensely practical idea that I could assemble stuff into a course, a diatribe, even a book if I'm lucky, simply by assembling the bon mots.)

Its what we mean by personality, the accretion over time of all your actions, statements, choices and decisions. Who you are emerges from all that stuff, and weblogs are a way to trap the bits of it long enough to make them visible to myself, and anyone else who cares to look. A bit like spinach after a meal.

Tim's point is that it takes people like David Weinberger 5 years of steady work to get where he is, exactly as it should be, but blogs allow us not only to think in public, but to mark a trail for that thought. I write something, you comment on it, someone else writes something else and tracks back to it and before long we have a conversation going.

Following the information, thinking out loud, we construct a small corner of a web that traps new participants who drop in new thoughts or links and after a while we move on. But then we come back later and find something to which we have all contributed, in which ownership is shared and distributed but which is available to us all. Damn, that's exciting stuff.

The keys are to be interested enough to want to understand something, make the effort to do that in public through what you write and keit every day. Then let others decide whether you are interesting or not, but if that's the only reason you do it, stop wasting your time, its too uncertain to be worth the effort.

Time for another lesson in blogging methinks
ww 0.38

September 02, 2003 in Emergence, Knowledge Economy, Metadata, Weblogs | Permalink | Comments (0) | TrackBack

Happy Birthday Species Plantarum

Nature magazine commemorates the taxonomist's taxonomist Linnaeus and so do taxonomists, by continuing the fighting over where something belongs.

...a renegade band of scientists wants to ditch Linnaeus' names for a system called the PhyloCode, which names organisms according to their evolutionary relationships. Linnaeus' scheme places organisms in groups based on shared characteristics that do not necessarily reflect their position on the evolutionary tree.

Evolutionary relationships, sounds good to me. Now, if we could get something like that started with extracting the order from the net, we might be getting somewhere. Note, I'm not talking about "organising the net", but rather letting the undelrying connections crreate the order, then looking at it and seeing what it says.

I know, harder than it sounds.

I've been a few times to Linnaeus' home where he sorted out the system, but only to see a friend's horse which she kept there. Swedes are so cool about their history. If you ever get a chance to visit the castle museum in Stockholm or the Wasa, dredged up from the seabed where it went on its maiden voyage; you'll get a whole new approach to its importance.
ww 0.98

September 02, 2003 in Metadata | Permalink | Comments (0) | TrackBack

August 31, 2003

Taxonomies - A great Way to While Away an Evening

As someone who has appalling trouble with categories, the whole idea of the semantic web fills me with trepidation. All I can see before me is a life of endless twirling on the head of a pin, trying to figure out which particular angel should hold my next word.

Ask my wife. My wardrobe is a disaster area, I can't decide whether I should organise it by type (tee shirts, buttoned shirts) style (long sleeved or short) season, colour or use (sloppy, casual, OK for going shopping, acceptable for teaching, public speaking gear, applying for jobs, weddings/ funerals etc) Worse, I just get it right, then do a load of washing and forget what the system is, or find that there are too many for the shelf or some other mess evolves.

The excellent Bill Bryson has many fine passages in his "Short History of Nearly Everything" but the pages he spends on taxonomies rings a whole peal of bells with me.

He summarises nicely with this

Taxonomy is described sometimes as a science and sometimes as an art, but really it's a battleground. Even today there is more disorder in the system than most people realize. Take the category of the phylum, the division that describes the basic body plans of organisms. A few phyla are generally well known, such as molluscs (the home of clams and snails), arthropods (insects and crustaceans) and chordates (us and all other animals with a backbone or proto-backbone); thereafter, things move swiftly in the direction of obscurity. Among the obscure we might list gnathostomulida (marine worms), cnidaria Jellyfish, medusae, anemones and corals) and the delicate priapulida (or little 'penis worms'). Familiar or not, these are elemental divisions. Yet there is surprisingly little agreement on how many phyla there are or ought to be. Most biologists fix the total at about thirty, but some opt for a number in the low twenties while Edward 0. Wilson in The Diversity of Life puts the number at a surprisingly robust eighty-nine. It depends on where you decide to make your divisions, whether you are a 'Iumper' or a splitter as they say in the biological world.

At the more workaday level of species, the possibilities for disagreements are even greater. Whether a species of grass should be called Aegilops incurva, Aegilops incurvata or Aegilops ovata may not be a matter that would stir many non-botanists to passion, but it can be a source of very lively heat in the right quarters. The problem is that there are five thousand species of grass and many of them look awfully alike even to people who know grass. In consequence, some species have been found and named at least twenty times, and there are hardly any, it appears, that haven't been independently identified at least twice. The two-volume Manual of the Grasses of the United States devotes two hundred closely typeset pages to sorting out all the synonymies, as the biological world refers to its inadvertent but quite common duplications. And that is Just for the grasses of a single country.

Exactly. I have a niggling suspicion that if any such systems might work, it would be Jorge Luis Borges (1941) La Biblioteca de Babel review of Animal taxonomy from a Chinese lore which goes
belonging to the Emperor
embalmed
tame
suckling pigs
sirens
fabulous
stray dogs
included in the present classification
frenzied
innumerable
drawn with a very fine camelhair brush
et cetera
having just broken the water pitcher
that from a long way off look like flies

If for no other reason than we might draw up and think for a moment about what this thing means, rather than just assuming that, because we know its classification, we understand it.
ww 0.38

August 31, 2003 in Metadata | Permalink | Comments (0) | TrackBack

August 09, 2003

Chaos Theory and Shakespeare - Get Me 100 Monkeys Now!

The great thing about emergent knowledge is that figuring out how to ask good questions is everything. Notice I said "good" questions, not "the right" questions. Once you get the good question, you find an answer. It doesn't mean that it's the only answer contained in that information, or the best or most useful answer, and it certainly doesn't guarantee that its an answer you will like, but its an answer. For example, this is an answer; New computer analyses can identify Shakespeare as well as cardiac problems

For years, Shakespeare scholars have debated whether a strange 16th-century play known as ''Edward III'' actually was written by the bard of Avon himself. ... Now a team of researchers at Beth Israel Deaconess Medical Center believe they have settled the debate, as well as a broader Shakespearean controversy, with a new computer program that tracks the play's use of very common words like ''the,'' ''of'' and ''to.''
Though they designed the program for scientific purposes like predicting heart attacks, they tested it by combing through Shakespeare's plays and comparing them to other literature of the time. In the disputed play, ''Edward III,'' they found a very different pattern of word use from Shakespeare's own works, suggesting the play was penned by someone else. They also showed a substantial difference between the works of the bard and those of Christopher Marlowe, who some have argued was the real Shakespeare.

''This is a really nice method,'' said Daniel Kaplan, a specialist on information processing who is an associate professor at Macalester College in St. Paul, Minn. ''They are able to clearly distinguish the Shakespeare from the non-Shakespeare.''
[...]
That medical research would spawn a new method for studying Shakespeare underscores the increasing importance of computer analysis in making sense of the raw data churned out in overwhelming amounts by ever more automated and sensitive equipment. In trying to find order in the deluge, researchers are discovering that tools that help in one area, like cardiology, can often help in seemingly unrelated ones, like analyzing literary texts.

I can hear Steve Wolfram now, hopping around and saying "I told you so". He uses simple automata to generate complex structures, maybe these guys are travelling the same road in the opposite direction.

A series of heartbeats and a series of words are both ''streams of information-carrying data,'' said Dr. Ary L. Goldberger, a member of the research team and director of the Margret and H. A. Rey Laboratory for Nonlinear Dynamics in Medicine at Beth Israel. ''There is a hidden language in data sets that you can extract.''

I remember reading James Gleick's Chaos with great pleasure, especially where Robert Shaw extracted information from a dripping faucet. Gleick wrote, "order was so deeply ingrained in apparent disorder that it would find a way of expressing itself even to experimenters who did not know which physical variables to measure ..."

I love this thought because it describes perfectly the bootstrapping of human knowledge, the progress from the known to the unknown which becomes in turn the new known and so on. Every end is a new beginning; a man who has been married three times can find great comfort in that. But it has also started a whole train of thought about something that matters very much to the Information Society, identity and authentication. More on that shortly.

Unlike Internet search engines, which seek to find a single piece of information in a large database, (And unlike Google which looks for pointers to information) the new technique sorts big chunks of information, like plays, into groups based on how similar they are. ...
At its heart is an ingenious formula for measuring similarity. To compare two Shakespeare plays, for example, the computer constructs a list of all the words the two have in common, throwing out any word, such as ''Hamlet,'' that appears in only one of the plays. Then the computer ranks how often each word appears in each play.

Finally, the computer generates an overall score based on how different the rankings are. ... the Beth Israel team found that they could find the signature of a writer by looking at the most common words, such as ''and'' or ''as.''
[...]
Unlike many other linguistic techniques, the method works equally well regardless of the language...

But the original inspiration for the work was the desire to improve medicine, said Goldberger, who is a cardiologist. The heart does not tick like a clock. Its beat constantly changes -- even when a person is sitting calmly -- in seemingly random ways. Goldberger and other researchers now believe that these fluctuations, ignored by traditional measures such as average heart rate, could tell an important story.
[...]
the Goldberger team did what they called a ''linguistic'' analysis of heartbeats. They recorded a zero if the beat was slower than the previous one, and a one if it was faster. They treated every eight beats as a ''word.'' They then compared the frequencies of these ''words'' between patients, and found that patients with congestive heart failure had similar patterns.
[...]
For Shakespeare scholars, many of whom are drawn to the field by his keen psychological insights and the magic of his words, this new statistical approach is likely to seem particularly soulless. ''It is like taking all the words and throwing them in the blender,'' said Alan H. Nelson, a Shakespeare scholar and professor emeritus at the University of California at Berkeley.

Which is where the monkeys come in right? And in fact it is not at all a blender, it simply asks a very good question of the text. Analysing Shakespeare based on his use of highly distinctive words has its place, but the more distinctive the words are, the less context in the language they will have. The words we all use would seem to carry the structure of the language, but each of us plays a slightly different riff on that structure. Just as Tartuffe was mightily impressed to find that he spoke in actual PROSE, I love the idea that we are all jazzing language.

And one more thing for Mr. Nelson at UC Berkeley, the algorithm seems to be pretty good at identifying the writer, it is still up to the reader to decide whether it means anything, or whether it is any good. You ain't out of a job yet.

August 09, 2003 in Metadata | Permalink | Comments (0) | TrackBack

August 08, 2003

Annotations And Metadata That Matters

I think the next revolution on the net is going to be annotation. Blogs are the beginning of that and some of the developments with newsreaders will feed into it. For example, I can use a bookmarklet in IE (Unfortunately not Opera which is a far better browser) that launches Zempt which puts the page title into my title, the link into the main body, and the selected text into the extended entry, then I add my bit and blog's your uncleTM. As newsreaders add similar tools to integrate the reading and annotation process, the loops will be shortened and as tools such as Trackback weave opinion into ever denser webs of connections, the density itself will become metadata.

A Bit of History
It might help to get a bit of perspective on the process, because I think it has been growing and maturing, we are just now getting to the takeoff stage. First came Third Voice with its "Web notes" that allowed visitors to post comments on a Web page. That was followed (into oblivion mostly) by NovaWiz , Hypernix , uTok , and Zadu. Many of these companies took the idea further than Third Voice with tools for ranking, chatting, and exchanging information about the Web content. The goal was to enable users to build a community of Web commentary where users can share opinions and guidance on the content of Web sites. (Sound familiar?)

Interestingly, this para turned up in the original article. "The only way it can work if there's some kind of intelligent discussion going on [inside the applications]," says Dave Winer, editor of Scripting News and president of the software company UserLand . Mr. Winer tried Third Voice and didn't like it, noting that Web comments become outdated as soon as a page changes. "My bet is that it won't take off -- but then again I said that about Hotmail, so I don't have a good track record," he says.

Mhmm, and then came Radio U.

These mostly defunct tools almost had the right idea with an early form of web service, but they were pretty clunky and they had the whole thing, as it turns out, upside down. By focusing on the web page it made the comments too disembodied and too web centric. Yes, people had something to say, but they relied on the page itself to provide the context for the discussion, it was a disembodied, context free kind of commentary and that isn't what we need. Blogs, on the other hand, start by building a context; you can tell from my postings more or less who I am and what I think, then you can decide whether what I have to say is worth pursuing.

However, I think the rating and ranking idea that people like uTok came up with was a precursor to a process that is becoming more and more important. Being able to annotate the document, and organise, correlate and above all share that annotation is where this whole thing is going and right now blogs are the way to do that.

Now Some Futures

OPALES is An Environment for Sharing Knowledge among Experts Working on Multimedia Archives. It improves video indexing by accumulating the individual efforts of users who access information, and provides a community management mechanism to let them share the knowledge. The "Institut National de l'Audiovisuel" (INA), a National Video Library in Paris, is developing it to increase the value of its video archive collections. The interpretation work is done by expert users who work on their own job. OPALES has an 'authoring point of view' and a 'reading point of view' which specifies which categories of annotations a reader wants to see. New annotations always have an author and a perspective. Users export a point of view into the shared ontology so others can import it into their workspaces. A 'reading point of view' defines how a document is enhanced by annotations when presented. It is a mix of imported points of view and it doesn't exist until I set up the criteria, but then it uses other people's knowledge to provide the perspective, even though none of them have worked together to produce it.

This was a finalist in the Stockholm Challenge a couple of years ago and impressed me with its attempt to generate a virtual collaboration to improve the value and usability of the content, even when the virtual collaborators knew nothing of each other or their needs. The reading and editing "Point of View" acts as a filter which should become more and more effective, the more users there are on the system.

In the same vein, but in some ways less adventurous, is a new Australian tool is called Annodex from the CSIRO

Annodex allows any section within a multimedia file to be given a descriptive tag - 'love scene', 'fight' or 'interview', for example. Tags form a stream of information that runs alongside the file, changing to keep track of it. Multiple tracks are possible to cover different perspectives on the same file. They can also follow links into or out of the file, which change to reflect what's playing at that moment. Annodex allows users to divide files to into chunks - scenes of a film, for example - label them, and add links to each chunk. Sections, labels and links must currently be made by hand. The CSIRO team is working on ways to automate the process, such as with speech-recognition software.

I like some of the ideas here, but it needs to catch up quickly with what is happening on the net. At the moment Annodex sticks to the "publisher" model. To quote Annodex team leader Dr Silvia Pfeiffer, "anybody who wants to publish web pages creates HTML pages. Similarly, anybody who wants to publish Annodex encoded media will be able to create the files and include the links to other web resources into them. If you have a piece of text out of a book and you want to put it on the Web with your annotations, you'll create a web page for it. Similarly, people will copy the same piece of film onto their own web pages and Annodex them with their annotations and hyperlinks. Only the owner is able to define the content. However, Annodex provides the possibility for several annotation tracks. Thus, if a website owner wants to provide a service where different people can submit their annotations to the same piece, then these annotations can be included in a separate annotation track of the same file and also made available via the Web. ... you will not be able to edit Annodex files in our browsers. However, if you download a file, and have the right tools installed on your computer, you can Annodex the downloaded file in any way you like. and you can publish it on your own website."

What is missing here is the very thing that makes blogs so valuable, the ability for others to comment, link, Trackback etc. Annodex looks great, but I wish they'd do something to make it more available to the network. But it's a start.

One of my favourite annotation and sharing ideas is Le Bilan du siècle

This Mega-site on the History of Quebec in the 20th Century Gives access to all the items in the database. The user can select items and save them in an album and work on the items with a number of tools: statistical and graphing tools, text tool etc. built in the site. They can then present the results in a Web page or a slide show generated by the Bilan, modify the presentation parameters and personalise them with specific tools. They can make results accessible to other members or keep them in their own Bilan files via a built-in management tool.

This is a major advance, not even so much for the technology, but for the conceptual shift they have achieved. The material they host belongs to the community, they have professional guides to tell the story of that material, but they concede and encourage others to tell their own stories with the same material, and that makes great sense to me.

August 08, 2003 in Metadata | Permalink | Comments (0) | TrackBack

August 07, 2003

It All Depends on the Meaning of "Interest"

I've just been reading Michael Clarke on Google News Alerts After giving Google a well deserved shot across the bows for the way tney have implemented the service, he delivers himself of this excellent paragraph.

It also relies on the user to be able to specify decent search strings for the stuff they’re interested in. Problem is, other than the obvious client names and industry sectors you know you want to watch, most people don’t know what they’re interested in until it pops up.

Exactly.

The meaning of the document is not contained in that document, it depends on a contextual environment that is so complex as to defy calculation and description. Which is why tools such as Technorati's cosmos are so important to making sense of the net. What we need is a service at Google, or wherever, that intersects with keywords from our blogs, keywords and links from our Blogroll as well as both in and outbound trackbacks and links. In other words, a web service that uses intersections of information the way LibraryLookup does.

Google could help by providing us with a keyword extraction from our own blogs and/or those we find interesting. Maybe it returns a page with a radio button list of keywords and blogs, we check the ones we want to include as context for this particular news alert and away it goes. Maybe they could give me the option of keeping my keyword list private or making it public the same way that Amazon lets me publish my wishlist.

Then (damn, where are the brakes on this thing?) Then, my blogger uses the keyword list and the other tools like Blogroll, Blogdex and Cosmos to suggest links to related stories and postings that fall within my Horizon(TM)

What I keep coming back to is the Inuit family chewing the seal skin. Information is the sealskin but we can't do all the chewing by ourselves, so we rely on others to share the work and the more tightly connected and thoroughly chewed the information is, the more supple and easily worked it becomes. That's what Google depends on, that's what we Google users depend on, lets work together on this.

August 07, 2003 in Metadata | Permalink | Comments (2) | TrackBack

August 01, 2003

Contents and Containers - A Failed Digital Experiment

This started out as an aside in the piece on the Web as hologram, but I want to take it further.

Another problem with the semantic web is that it appears to use properties which attach themselves to documents, but the processes of information management are busy detaching the content from the container.
Even the latest version of Office uses applications such as Word or Excel essentially as filters through which to view the data. You pull the data from a separate file, pass it through the application which extracts information in certain ways and presents it according to its own capabilities. For example, if I look at a file through Word I might get a document with headings, if I look at it through Power Point I might get only the headings while the text becomes the speaking notes and the handout.

I'd go further. If I want to give you a document to download I have to take all this stuff, drop it into Word, format it as a Word document, save it, upload it and link to it. I create it in Zempt where I can save it as a a draft in a .zmt file. Then, if I change either the word document or the Zempt draft, I have to manually update the other one. Then I have to remember to upload a fresh version of both to the blog.

Here's What I Want

I want to be able to look at, and modify the content of the native file using any application through which it can be presented. If I am working in word and think of something I want to add or change, I just open it in word and make the changes. Next time I'm working on the blog in Zempt, I open the file and the changes are included. When I go to save the file in either application it reminds me that there is a copy of the file on my laptop and the blog and asks if I want to update them.

When you read the posting, you may be doing it in Awasu, or Opera, but you should also be able top do it in Word so you can print it out or Powerpoint so you can talk about it or any other application that meets your requirements. To expand on the idea of a hologram, we should be using applications as lenses through which we view the data. Its a perfectly acceptable model in the real world where I can look at a mountain through a telescope, a rock through a loupe, a grain through a microscope, a molecule in an Electron scanning microscope or the internal structure of its atoms through a supercollider. big deal, same rock, different view for different purposes.

It has to be one of the most idiotic failures of the digital world that such and infinitely reproducible, flexible source as a digital file, should be so rigidly locked away. If there was one thing that convinced me that the digital revolution meant something, it was Negroponte's suggestion that once you have digital data, you can access it as text, as graphs, as maps, as speech or as moving pictures, it doesn't matter.

Time to unwind the failure folks.

Update
Just found this again from one of my favourits bloggers David Weinberger. Maybe there is some sign of things getting better.

New User Experiences in which he quotes John Ko of Cincro Comms and Mena Trott of Six Apart.

RJ: We're seeing the notion of the web browser starting to dissolve. HTML can be in lots of devices, but we're even moving off of HTML. They're building self-adjusting user interfaces.

Mena: In the future there has to be more transparent use of tools. Multiple devices, not just browsers.

I've said for 5 years that the web is only a passing phase in the internet, looks like I might have bene close. Exit grinning.

August 01, 2003 in Metadata | Permalink | Comments (0) | TrackBack

July 25, 2003

More Noodling on the Semantic Web

Dave Weinberger has started something in here; actually, a little feedback is all it took to give me permission to dump a lot of stuff that has been sitting around for a while now. Here goes.

The semantic web breaks the internet model, it demands too high a level of precision and it tries to wrest control from the user and place it in the hands of the model maker. Wont work.

It presumes that whoever builds the taxonomy in the first place, knows and can define to everyone’s satisfaction, what they mean by the taxonomy. Before we can come close to Tim Berners-Lee’s idea, we have to have a taxonomy of taxonomies that defines how taxonomies work and what their common terms mean and how they develop, define and disseminate their uncommon terms. Sounds to me rather like TMBW (too much bloody work) for no effective return, and don’t get me started on ontologies.

THEN they have to define the content not only in terms that make current sense, but that leave open the certainty that uses will be found for the information at some future date, that have not even been thought of yet.

THEN we have the problem that creators of information can only vaguely tell us what that information is about. Here’s a conundrum; my wife delivered her PhD this year and one of the biggest problems is finding people to evaluate it. At that level, it is reasonable to expect that the candidate is working on something about which most of the world hasn’t a clue and which will require even highly informed people to learn significantly new material tangential to the speciality that qualifies them to evaluate the paper in the first place. No doubt we all feel that our work is seminal and suitably impressive to anyone who matters, but we are to say the least biased about its quality and totally ignorant about how others will view it or use it in the future.
This is a problem because metadata is a container. The more perfectly you define it, the narrower the container with the fewest holes. That makes it wonderful right now and a closed book in a year. The value, even the purpose of the information is defined by other people’s use of it and their opinion of it, not by the writer. Metadata cannot deal with that, nor can it anticipate its importance in the future. The Gettysburg Address includes this, “The world will little note nor long remember what we say here” – wrong, what if Lincoln had been responsible for the metadata?

THEN we have the very human problem that people lie. Metadata depends for its validity on the owner being both accurate and honest about the content and anyone who relies on that shops on the TV Shopping channel. Plenty of them, but would you depend on them?
From my bit at QuestNet:

Give Us the Tools
When Doc Searls & David Weinberger launched worldofends.com they used a telling phrase. “Take the value out of the centre and you enable an insane flowering of value among the connected end points.” Yes please, can we have that?

The watchwords are annotation, reputation and collaboration. We have to create educational value at the ends; more learning focused, useful tools that take advantage of the characteristics of connectivity and networking, and let the bandwidth demands follow.

Give us tools that manage information, not documents, they need to be reputation-aware annotation systems, with flexible tools for teachers and learners to assemble arguments and interact with each other. And I want the control of them in my hands as easily as I control a piece of chalk and a blackboard.

Annotation is more important than metadata. There’s a lot of talk about metadata, ontologies and the semantic web. I don’t listen to much of it because it hardly matters what you think your document or resource is about, or how you think it can be used, or how good it is. What matters is what I think about it and how I use it, then what the people I respect, and the people they respect, think of it and how they used it. That’s why annotation is crucial, I need to be able to be able to make my own annotations on, link them to the annotations of others and understand how the work enhances or diminishes your reputation in my chosen field.

Tools like that, attached to everything would be good, and ways for them to draw horizons defined as I require and then to maintain those horizons in the way that Kazaa and other file sharing P2P system work is mandatory.

Google gets it. Google doesn’t understand particle physics, rare plant physiology or the life of Van Gogh, but if you ask it a question on that subject, it returns very good results. That is because Google understands the internet economy of links and opinions. That is why Google has bought Blogger, because the Blogosphere is a snake pit of densely linked opinions and it is from that very dense web that Google draws for its services. It doesn’t matter what the content of a document is, beyond some basic keywords, Google doesn’t even read the meta tags. What it does is read its relationship with other documents on the same subject and, using the economy of the net, figure out how respected that information or informant is. Then it publishes the ranking.

As Blogs experiment with Trackback and other linking concepts, the possibility is that the web will turn into an isometric network with ranking and rating built into the links themselves. At that stage the emergent intelligence of the internet will take off and Google will be there to use it.

Ontologies and taxonomies are observations that arise from thinking about language, they are so difficult because language is an abominably complex process that lugs around information, manages relationships, encodes emotions, reveals, conceals, confounds and transmits secret messages in plain text. BUT, it is so simple that a child can learn it, in fact, any language that can’t be picked up in all its complexity by a child will die. Now, if you believe like Noam Chomsky, that we are born with an innate language module or set of processes that guides our acquisition of language, maybe the idea of extracting these into some formal structure and then trying to shove them back in makes sense. Not to this pixie.

On the other hand, if you go with Terrence Deacon in The Symbolic Species, language and the brain have co-evolved, then formal structures are useful for figuring out what happened, but they are no use for making something happen.

Emergence, keep the eye on emergence, and give us tools to help make it easier.
Download file

July 25, 2003 in Education, Metadata, Networks | Permalink | Comments (0) | TrackBack

July 24, 2003

If Dave Weinberger is Noodling Over the Semantic Web ...

I don't feel quite so dumb. It so often seems to me to be a discussion that approaches the semantic version of angels dancing on the head of a pin. Dave is having some difficulties himself and if you can help him with an article on it, he wants to hear from you.

I'd appreciate a copy as well. I'm having real trouble figuring out why anyone would spend time trying top down, command style tool for the web that requires people to know, deploy, and remember to use hosts of ontologies. Google doesn't know a thing about astrophysics or the painting of Van Gogh, but it still finds us the right answers most of the time.

To quote him, I think, "it is the Web's Theory of Authenticity with its corollary that Imperfection Is a Virtue. In macro you get the Messy Network Axiom with its corollary that Efficiency is the Enemy of Truth"

I have this nagging feeling that annotation, collaboration and reputation are much more important for where we are going than building ever more accurate definitions.

As someone said of learning objects, if you define them too loosely they can't be linked, if you define them too tightly, they can't be reused. That's how I feel about the semantic web, it should be an emergent property, not a set of definitions.

Or maybe I just don't get it. Oh, this is where I came in.

July 24, 2003 in Metadata, Networks | Permalink | Comments (0) | TrackBack

The Quality Fallacy

There was a nice comment from US Presidential contender Howard Dean's Internet strategist the other day. Dean has been posting on his own blog for a while and was guesting on Larry Lessigs' blog for a week.

David Weinberger questioned whether it was in fact Dean doing the posting and the answer was the perfect Internet response.

In response Joe Trippi, Dean's campaign manager, wrote: “don't you think that if we were ghostwriting this stuff we would have come up with something better than that?” The Web in one line. In the micro sense it is the Web's Theory of Authenticity with its corollary that Imperfection Is a Virtue. In macro you get the Messy Network Axiom with its corollary that Efficiency is the Enemy of Truth.

The Imperfection is the guarantee of authenticity and may give a clue to why so many of us prefer Internet sources, warts and all, than the predigested media and corporate marketing that masquerades as information.

The "quality" fallacy is predicated on an idea that we will prefer a higher fidelity of reproduction to an inferior one. It was the error made by the music business in assuming that, because MP3 is an inferior technical standard to CD, the former presented no challenge to the latter. As we have seen, that is manifestly wrong because the vast majority of us need only a minimal technical standard to be satisfied on that front, from then on we want the content.

It has changed the photography business as well. A friend who is a photographer noticed years ago that, after giving people cheap sittings and expecting to make a profit on expensive prints was not working any more. People were buying one copy of each photo, scooting down to the copy shop and getting colour photocopies for distribution to the relatives. The argument that photocopies are inferior quality doesn't hold because most of the people getting copies are only going to look at them a couple of times and even those who are going to put them on the mantelpiece are more likely to be elderly relatives whose eyes can't tell the difference any more. (More to come on this on my Gig blog shortly)

That approach has been having progressively more powerful consequences for the way the photography business works under the influence of the net. There is no point having a Hasselblad when the viewing tool is a JPG file on a web page or attached to an email. Yet again the digital revolution is driving away the technological barriers to entry into the buisness. My friend incidentally has shifted his pricing model to one that charges for the one thing people can't get easily, his skill in taking good photos. He charges for the sitting and nearly gives away the prints.

Companies like Kodak looked at digital cameras and decided their relatively poor technical standards could not compete with film. Wrong again. Most of us don't have the sight to discriminate the differences, and when factors such as speed, ease and cost are included, we will trade those standards madly. Which is why Kodak is cutting up to 6,000 jobs

Eastman Kodak Co. is cutting between 4,500 and 6,000 jobs, or up to 9 percent of its payroll, as it struggles to cope with a nearly three-year slump in film sales it blames largely on a sluggish economy and the rapid growth of filmless digital picture-taking.

The Dotcom mania was an expensive side road for this technology, we are now just getting back on track and, as per the initial publicity, the Internet is changing everything.

July 24, 2003 in Media, Metadata | Permalink | Comments (0) | TrackBack

July 11, 2003

Games Encoding Reality

In the Questnet gig last week I had a section on the importance of games as educational tools, the piece is in the extended entry.

What wasn't in the presentation, but is right on the nail, is this piece about The Sims Online game which is being adapted by its users far beyond its creators' assumptions. From Wired: Few probably ever envisioned The Sims as a tool for serious social and personal expression. Who would have thought, for example, that abuse victims might turn to The Sims to unburden themselves of past torments? Yet Sims players are expressing themselves in that and many other ways via the game's family album feature, which was originally conceived as a way for players to photograph, collect and publicly share important moments in their Sims' lives. What no one imagined -- least of all The Sims' designers -- was that thousands of players would quickly bypass the album's intended use and instead use it to create dozens of staged snapshots, crafting what can be complex, scripted, multi- episode social commentaries, graphic novels or even movies, as it were, with the Sims starring in the lead roles.
[..]
Service, known in the Sims community as nsknight, has created several albums that are highly ranked by her peers. Among them is her six-part Vanderbilt series, which took her months to write and stage and which revolves around the story of three sisters separated by the murder of their mother.

Other users have conjured up such storylines as a young woman's drug addiction and recovery; an African-American girl's adoption by a white family; and, naturally, poor girls falling in love with rich guys. Andrea Davis, known as VioletKitty, uses the albums to build narrative Sims tutorials. "Since my Sims weren't 'acting,'" she explained, "it (is) more like reality TV."

To Wright, one of the most memorable albums told the story of a woman's abusive relationship and how she eventually got out of it. But a search on the Sims Exchange of the word "abuse" reveals that Sims albums have become a common therapeutic tool. All told, 63 albums deal with abuse issues.

It doesn't come any better than that. A game created for one purpose, that has tools which can be adapted by the users to meet needs and desires that were never envisaged by the game creators. That is what I mean when I talk about interactive games. Long may they thrive

Myst and a Skateboard
Think about skateboards. Kids who wouldn't be seen dead swotting in the library, spend hours at the skateboard park. They attempt, they fail, they try again, they get bored with repeating success and push themselves to new skills, accepting failure, embarrassment and personal injury as simply a cost of learning. Surely there are people in the world smart enough to make our computer interactions as engaging, challenging and rewarding as a skateboard park.
People of all ages become addicted to games like Myst and simulations like SimCity because the engage our curiosity about how things work, the real challenge is to figure out the rules.

Games are just another word for play which is another word for learning the rules. Games weren't invented to keep us amused, they arose from our need to learn and the processes with which we are most comfortable. Games encode rules that we can learn and master. Education and games have a lot in common. There are many fundamental parts of our knowledge and skills base that everyone needs to learn and games could be an excellent way of doing it. Opening those games to the inputs of other real human beings instead of playing only against algorithms is a valid use of broadband, in other words, the network connects the players to each other, we need people on the other end, not a computer.
That is what the Klara Historical Laboratory does. It generates digital game-based learning to enable students to gain and use new knowledge through role-playing in virtual Stockholm buildings from 1899, working on subjects such as environmental care, social science and history. A SimCity for learning history and town planning, what about other topics?

July 11, 2003 in Education, Metadata | Permalink | Comments (0) | TrackBack

February 24, 2003

Coming Down Out of the Trees

I'm having an email discussion with a colleague who is a programmer and we are gnawing away at the problem of how we get at information which we know is there in documents but which the technology can't "see" because it is not structured in a way that is accessible to computers. Big news, OK. More from the marshes. I had suggested Ben Hammersley's Blog was on topic.

I assume you're talking about the transparent tools. People have been screaming out for tools that are easy to use and don't get in the way of them doing whatever it is they want to do. But it's really hard. The outliner example I gave before is a good instance of this. Trees are a natural way of storing information *for a computer* so we write these tools that let you arrange you ideas in a tree and expect people to somehow munge their thought processes to match. You said in one of your posts that techies write all these tools and then wave to the users to come on over from across a river that they're not interested in crossing. Too right.

Its interesting because I deal a bit with my wife's colleagues, all graduates and teachers at Uni of Syd. Not only do they not get any appropriate help in using the tools, they are thrown an email client, an Office kit, a browser and told to get on with it, so on top of a system that doesn't think like them, they invent their own ways of doing things, all slightly different, all based on the printed output and not even able to use the tools they have at hand for the things that CAN be useful. I've tried about three times to use Enfish because I think it is thinking along the right lines, but each time found that it falls down somewhere, latest was that it no longer supported Eudora so my emails were invisible to it.

But this idea of being able to blog from your mobile phone makes me want to cringe. If the man on the Clapham bus has something to say, I'd appreciate it if he took at least a little bit of time to compose his thoughts before I spend my time reading them instead of blurting out the first thing that comes into his head. The minute proportion of cases where this might actually be useful (e.g. the first report of Google's purchase of Pyra was done by somebody Blogging at the press conference) will be far outweighed by the torrent of crap we will be subjected to. Information for information's sake is worse than useless since it drowns out the good stuff.

Sadly, taking some time to compose thoughts (for publication) its not a skill possessed by most people so the delay wont make any difference to the quality, although it might just reduce the quantity a bit. I suppose we are in this the victims of Moore's law which, by driving down the cost of processing/storage/transfer to trivial levels, has made the explosion possible. I think it is a transitional phase and that much of what is being proposed will have a little flash and vanish The real benefits will be a couple of iterations down the track. Look at home videos, they are sold as "you, the next Steven Spielberg" but most of them wind up dead under the bed and unused after the first flush as people realise that making decent film is bloody hard. Which is why I keep harping on that the net is not a media thing, it just looks a bit like it. On the other hand I also think there is something very interesting about social filtering and reputation systems. If, instead of looking at information from the inside out, trying to define it into existence in a meaningful way, we look at it from the outside in and enable users of it to attach, either unconsciously or deliberately, some kind of encrustation that tells other users who used this information, how many of them, where, when and how did they find it etc. we might begin to make some headway. Knowledge is a socially constructed thing which is why I'm keen on Blogs, because they are the potential building blocks for a socially constructed view of the information they handle. I don't have to read all the crap in the net to find stuff that interests me, I only have to read some of it. And by doing that, and saving some of it and linking to it and sharing it and having others share their insights with me and accreting a cosmos of shared perspectives, a filtering and validation system is constructed that benefits all its participants. And yes, that will tend to a closed loop of self reinforcing prejudices, which is not new, but those communities that remain open to a flow of new, uncomfortable ideas will adapt and survive better than those that don't, situation normal except that the possible spread of participants is so much broader using this technology than we could ever have contemplated without it. None of which helps you I'm sure, but I have this naive faith that if enough of us keep talking about this stuff in enough places, one day some clever person will sit bolt upright in the night and shout Eureka.

February 24, 2003 in Metadata | Permalink | Comments (0) | TrackBack

February 22, 2003

What We Really Want in IM/KM

I'm conducting an enjoyable dialogue with the guys at Awasu who make my Newsreader, about what I really want all this new software to do. Awasu wants to be the glue between various bits of software that work together, which is understandable from their perspective, but as a user I really don't want to wind up with half a dozen more applications cluttering my drive, even if they all work together seamlessly. Well, maybe if they work seamlessly, but I want them all to arrive and install as a group, not to have to source each one separately. I don't have the time or knowledge to do that.

The posting here talks a bit about what I want from a news reader and the discussion has thrown in other stuff as well which must be really frustrating for developers. The truth is that I don't want to "see" the applications that make stuff work, every time I have to be aware of them, or change from one to another, is just a nuisance. However, this link to Cringeley talking about castBridge is very interesting. The idea that a tool can be developed to import information into a document from wherever that information is located, procided I have permission to access it, makies sense to me. I tried Groove networks but it is plainly an enterprise tool and probably runs best over a LAN or with fast connections, run over the net, even with cable/ DSL at both ends, it is sluggish and takes control of my documents by locating them on the remote server, ensuring that, unless I am logged in to Groove, i have no way to edit with the native software or sync the documents with others that are sharing them. The Internet is about giving me control, i don't want someone taking that away to provide a minor benefit. The most important thing is that the design is not end to end. It doesn't allwo the intelligence to reside on my machine or my partners' machines, it locates the intelligence in the network and forces me to open the documents across the network. With all the work being done on metadata right now I would have thought that all I should need to do is know what changes have been made to which document and to be able to have my version synched by that information. Much less bandwidth for a start. My reading of CastBridge is that it does something like this and my experience with stuff like Awasu and Blogroll and other web services like LibraryLookup tell me that it must be possible. This posting on Elearnspace is important to the discussion. It sets out some basic requirements for us users;

1. Make things possible, not necessary 2. Create choices 3. Make room for innovation

Especially I like the link to "What we really want, A PPT file that pleads for simplicity and openness. Damnit, I want to be able to create and organise and manage information and not even decide about the mode of presentation until I've finished. I want something that lets me create a document and then say, I want this as a slide show, email it as a WP document to these people and put it on my website as well, without changing a damned thing. I want to lift the information out of its container and be able to drop it in anywhere. More. this thread has some interesting thoughts about how real people read electronic documents, and create them. You would have trouble finding a better statement of the problem than this.

A lot of developers see XML editing as filling structured containers with appropriate content, and the containers should more or less guide you as to the content. This can mean that a huge amount of detail needs to be dealt with at one pass, and it often has meant that developers create interfaces which are actually more difficult to use than paper forms. Leaving markup for later lets people focus on the information as they see it rather than forcing them from the outset into someone else's preferred boxes.

Exactly. There is more, it keeps reminding me that we have to start where people are, not beckon them from across some river that they don't find interesting to cross. This post says a lot about non tech approaches as well. I love outlines, my wife uses them grudgingly. To me they give structure of the document while to her they give format for the meaning which she can obtain by using bold or larger fonts. We are both right, we just work in different ways. Software has to meet both sets of requirements, then find ways to find structure in documents by "reading them" the way people do. Parsers have to be able to see that keywords are indicated by bold or italic or by underline, that if most of the text is in 12 point, the bits in 14 are more important, and 14 bold more important again and 20 point at the top of a page also has meaning. Because we will not spend our precious creative or operational time assigning superfluous meanings to information. Its superfluous because we already know what it means. Push technologies don't work and can't be trusted, that's why RSS is so attractive. More later.

February 22, 2003 in Metadata | Permalink | Comments (0) | TrackBack

February 18, 2003

More on Google Blog

Neil Macintosh in the Guardian takes a similar view to yesterday's piece, but raises something that a couple of other commentators have mentioned.

This could create friction. Some net users already suggest Google is becoming too powerful, too much a gatekeeper to the net's riches. This news is hardly likely to allay their fears: will the dominant search engine start discriminating against Weblogs run on Moveable Type systems, or those hosted at UserLand?

Wont happen. Yes Google will experiment with the Blogger archive but a great deal of stuff there is already in the same category as Deja News/ Newsgroups. Useful Archive. The real value of the Blogosphere is its near real time ability to process information into knowledge via its reputation and commentary. By default the Blogger archive is the biggest, but by deficiency (mostly of funding - where WERE all those clever venture capitalists who could "get" Pets.com but missed blog?) much of the Blogosphere now exists on other systems like LJ and Moveable Type. If Google wants to maintain its business, it will have to make their contents available or it will lose its credibility which is founded on its impartiality. Google Gold should start generating some of the useful tools we want, preferably OS so they can percolate through the community and make the Google reliability and completeness meme even stronger. I have a wish list here. Perhaps its worth remembering too that Google has bought another archive and a publishing tool, it has notbought a publisher of any content at all. I hear Salon is going for a song, but, for perfectly good reasons, Google doesn't want it.

February 18, 2003 in Metadata | Permalink | Comments (0) | TrackBack

February 17, 2003

Google Buys Blogger - Of Course

I have no idea why Google chose Blogger than any one of a dozen equally, or more effective Blogging businesses, that is a business decision. But it is pretty clear why they would do that. The breathless story from Dan Gillmor does all the usual, but invalid stuff about the Internet.

Google is known best for its search capabilities, but the Pyra buyout isn't the company's first foray into creating or buying Internet content. Two years ago, Google bought Deja. com, a company that had collected and continued to update Usenet newsgroups, Internet discussion forums. More recently, it created Google News, a site that gauges the collective thoughts of more than 4,000 news sites on the Net. But now, Google will surge to the forefront of what David Krane, the company's director of corporate communications, called ``a global self-publishing phenomenon that connects Internet users with dynamic, diverse points of view while also enabling comment and participation.'' and yadda yadda yadda.

All that may be true but I don't believe it. This is not about "content" and Google isn't interested in "content", the news service is the result of the reliability that Google brings to other people's resource and content and the Blogger deal is the next stage in the development of Googling the net. Think about it, Google's king hit is PageRank which generates highly reliable sources of whatever kind of information you ask for, based on search terms generated by idiots like you and me. Bloggers have developed a whole new set of tools that is making the reliability of information much higher and is using human massively distributed human resources to find and rank that information very quickly. A story hits the net and the Blogosphere, using newsreaders, Blogs with Trackback links and the usual power rule processes of the network evaluates that information and weaves it into it proper place in the knowledge universe very quickly. By making Blogs the preferred system for publishing information in the first place, Google will be able to help improve the weaving and ranking processes even more reliably and in the otherwise untrustworthy world of the Internet, that will keep them on top. While other search engines are still trying to figure out how to turn their spidered information into a business, Google is focusing on what really matters, and that is reliability. The Blogosphere will benefit because Google will fund the development of the tools and they will be open source because the more of them they have out there the more valuable they are, because, can I say it again, Google does not sell search, it sells reliability, and every blogger and surfer and webmaster in the world is contributing to that. We do not get free services from Google, we pay for them with our clicks on their linked information. Pretty soon we will also be paying with our Blogging and we will be paid back with reliable information. That's what I mean by an information economy; the information is the currency, knowledge is the payoff and reliability is marketable to those whose reliance on it is highest. I've said for a long time that Google is not a search engine. Yes, it spiders, but that's not what it does. Now I have Larry Page's word for it.
Larry Page: "It wasn't that we intended to build a search engine. We built a ranking system to deal with annotations. We wanted to annotate the web--build a system so that after you'd viewed a page you could click and see what smart comments other people had about it. But how do you decide who gets to annotate Yahoo? We needed to figure out how to choose which annotations people should look at, which meant that we needed to figure out which other sites contained comments we should classify as authoritative. Hence PageRank. "Only later did we realize that PageRank was much more useful for search than for annotation..."
Here's another bit that makes so much more sense than most people get.
Information wants to be free? Copying doesn't cost anything. Distributing another copy costs basically zero. Google surveys the free part of the web.
Get it? Google surveys the free part of the web. Everyone wants to be on Google, because if you aren't in Google you don't exist. So you had better be free, and good, and referenced, and linked to the universe, or you don't exist and if you don't exist you sure as hell don't do business. Can we call that part of the debate settled please? To extend the discussion, have a look at SearchDay By Chris Sherman for some of the more practical, business oriented and applications of the Bloogle Empire.

February 17, 2003 in Metadata | Permalink | Comments (0) | TrackBack

A Networked World

At Play On A Connected Planet

Archives

Now Blogging

Booklist

TypePad Community