My Photo

WorldCat


Twitter Updates

    follow me on Twitter
    Blog powered by TypePad

    google analytics


    meter


    Categories

    Categories

    April 28, 2008

    Open Source as an Industry

    Hogrouse7538 Two reports on open source as an industry came to my attention this past week, each of which is encouraging for those who believe that open source software is a good thing.

    The first, Open source: 'World's largest software company' "The ultimate in disruptive technology" is coming up strong is a blog post by Matthew Broersma (via Liddy Nevile) that suggests that open source, a 60 billion dollar industry, may be a small part of a trillion dollar market, but it has a disproportionate impact on that market, as it averts (saves/costs, depending on your perspective) far more in expenditure/revenue. The post goes on to suggest that open source is moving up the food chain as well, displacing proprietary software in many instances, rather than simply providing a foundation for it.

    Lorcan Dempsey responded to this with a report from the Linux Foundation: Linux Kernel Development (April 2008) which describes, among other things, who is doing development on Linux, the mother of all open source projects.  Some interesting stat-bites:

    • the individual development community has doubled in the last three years
    • the top 30 developers have contributed 30 percent

    So, its always a tight group that does the heavy lifting, but in a healthy community, there are many who make modest but important contributions to the overall stability and usefulness of the system.  This certainly echoes my own experience in open source metadata development - the Dublin Core.

    Open source does not mean, of course, that people don't get paid, or that commercial interests are not being served.  Among the major corporations with whom developers are associated, each of the following  has contributed 1% or more of the changes: Red Hat, Novell, IBM, Intel, Linux Foundation, SGI, MIPS Technologies, Oracle, MontaVista, Google, and  Linutronix.  Not a surprise that Microsoft isn't on the list, but where is Apple, arguably one of the big winners in the game? 

    Why do companies contribute to products over which they do not have direct control, and which do not  feed directly into their bottom line?  Because their commercial well-being is served  by a stable, vendor-neutral operating system that makes their products appealing to others.  And maybe, a little bit of stick-it-to-the-Man? Just a cynical guess.

    The growth of developers in the Linux world, if it is representative of the open source milieu in general, is quite encouraging for those who believe we're better off when infrastructural hegemony is distributed broadly, rather than concentrated in monoculture.  As Bill Joy famously said, at any given time, most of the smart people don't work for you.  This is still true, even in the age of Google.

    So, the average open source developer is not slaving away in a windowless basement, he or she has a steady paycheck, benefits, and probably represents corporate self-interest.  Thats good.  Such interests reinforce sustainable development.  Having spent the majority of my time in recent months thinking about sustainable models for multi sector data curation, the observation that collaboration among the self-interested is not only possible, but thriving, is quite encouraging.

    -----
    What I believe to be a ruffed grouse, in the Ho Rain Forest on the Pacific coast of the Olympic Peninsula.  Wes and Christian and I were there last month, and Wes spotted him.  He was 'drumming' to attract  females on an unusually dry day in one of the wettest locales in the country (and attracted us, instead).

    April 19, 2008

    Public Spaces and Libraries

    Img_3649 Much of our thinking in the library community these days is focused on the transition from the real to the virtual: the challenges of moldy oxidation giving way to bit rot, and the uneasy feeling that Google is making our profession less relevant. The opening keynote at the British Columbia Library Conference took a different tack. 

    The theme of the meeting – creativity in libraries – started tonight with a take on creativity relating to the role of libraries in public spaces.   Fred Kent and Cynthia Nikitin of the Project for Public Spaces spoke about the importance of public spaces in general, and the role of libraries as central to those spaces.  The concept of making community hubs of libraries is not new, of course, and the principles that Fred and Cynthia espouse you’ve probably heard before… Design cities around cars, and you’ll attract more cars.  Design cities around people, and you’ll get places that people will use. 

    What was inspiring about this presentation was the collection of examples and principles tossed in a dressing of enthusiasm, humor, and passion for reclaiming public spaces.  And they told us how to do it.  If you’re part of a community planning a new library or renovations to an existing one, and you want it to work better as a public place, these folk deserve some of your attention.

    A great start to the conference.
    -----
    late summer grasses from a visit to Vancouver in 2006

    March 21, 2008

    The Magnificent Seven

    MagnificentsevenI've just come from the last of many marathon editing sessions with my DataNet proposal colleagues at the University of Washington.  A group of 7 people, most of whom scarcely knew of one another's existence a few months ago, have just concluded thousands of hours of writing, cajoling, arguing, conceptualizing, persuading, and trying (unsuccessfully) to defeat Word auto-outlining. 
    Missed holidays, shared dinners over laptops, and ridiculously long days have forged an estimable team with the courage to criticize, the tentative collegiality to tease and the confidence to compromise.
    Tomorrow we will upload the last of some 200 pages of Arial 10 pt type, each of which has been combed and coaxed and coerced from ideas and colleagues and spreadsheets.  Our once and future lives.  We hope. 
    We commit our ideas to the judgments of unknown peers with a measure of trepidation, but also with considerable pride and satisfaction.  And hope, of course, that we will have the opportunity to bring our ideas to bear on the future we hope to create.
    For now, I'm simply grateful to have shared this arduous journey with splendid colleagues who care about commas.
    Oh! And Shannon… no freaking way we get that budget done without Shannon!
    -----
    My colleagues near the end of our last editing session.  Yeah, we were feeling pretty good, and not just because we were nearing the end.

    February 29, 2008

    Rumors of our death....

    Vancouverlib_4339_2 Library Link of the Day offers an interesting, if superficial, photo essay on Public Library architecture in the age of Google (in Slate).  Nice eye candy for bibliophiles.   Seattle's Koolhaus is there (how could it not be?).  Salt Lake City's Library-cum-shopping mall is as well, though not Vancouver's similarly-Galleria inspired homage to the Coliseum (my image of which decorates this post).  I suspect the latter predates the former in the race to build shopping into every dimension of life.

    The Slate article references Ross Dawson's Trends in Living Networks, and in particular a post on an extinction timeline that predicts the demise of a variety of familiar elements of life:

    2009: Mending things
    2014: Getting lost
    2016: Retirement
    2019: Libraries
    2020: Copyright
    2022: Blogging, Speleeng, The Maldives
    2030: Keys
    2033: Coins
    2036: Petrol engined vehicles
    2037: Glaciers
    2038: Peace & Quiet
    2049: Physical newspapers, Google
    Beyond 2050: Uglyness, Nation States, Death

    Our predicted demise may be softened somewhat by the soon-to-follow death of copyright.  And while death makes the list, where are taxes???  Do Democrats prevail after all?  Physical newspapers last as long as Google?  And I (think) I just got the 'speleeng' joke.

    February 25, 2008

    BOOK REVIEW: The Inheritance of Loss

    Img_6677 Kiran Desai's Booker Prize winner, The Inheritance of Loss is an elliptical tale looping backward and forward through the twentieth century, India, New York, Cambridge, the legacy of British imperialism, class tensions, and the age-old distrust of other.

    The story takes place in Kalimpong, a peninsular extrusion of India into the surrounds of Nepal, Tibet, and Bhutan (I don't know the boundary history of the area, but it sure looks suspicious on the map).  A retired judge, his granddaughter, her tutor, the cook, his son, and myriad supporting characters all struggle for stability and dignity in a time and place short of both.  Shifting sands of political conflicts leave everyone struggling for footing, amplifying mistrust and prejudice. Loss is the currency common to all.  Early on, we find Sai, the orphaned granddaughter and harbinger of love and hope, in the company of the embittered judge and his cook, contemplating coming of age alone:

    Could fulfillment ever be felt as deeply as loss? ...love must surely reside in the gap between desire and fulfillment, in the lack, not the contentment.

    So is the tone of the narrative established.  Even as Sai awakens to possibility, the countryside of disenfranchised awake to political discontent.  Timid belligerence erupting and met with authoritarian brutishness that, of course, spreads with the virtue of vengeance in every ethnic direction.  Gyan, Sai's Nepalese tutor, is swept into the maw of this malice, and his and Sai's chaste and budding love is set in opposition to betrayal and reluctant militancy.

    The hapless cook has only his indenture to the judge, and his son in the limbo of illegals in New York on which to hang his life's expectations (well, that and his still). The judge's stern cruelty is facade to alienation that fills the interstices of contempt for his homeland, the condescension of imperial culture, and his isolation that is their product.

    Desai's language is vivid and incisive. In a passage on the mindless escalation of violence:

    This was how history moved, the slow build, the quick burn, and in an incoherence, the leaping both backward and forward, swallowing the young into old hate.  The space between life and death, in the end, too small to measure.

    Sounds terribly familiar.  In a description of the cook, fleeing a riot:

    Clawing at his heart as if it were a door was his panic--a scrabbling rodent creature.

    And unsurprising testament to the human capacity for acceptance of anything:

    While residents were shocked by the violence, they were also often surprised by the mundaneness of it all.  Discovered the extent of perversity that the heart is capable of as they sat at home with nothing to do, and found that it was possible, faced with unimaginable evil, for a human being to grow bored, yawn, be absorbed by the problem of a missing sock....

    Desai displays a convincing understanding of the 'old hates' that make marionettes of her characters, and have them twitching at the flames of flashing insurrections. Yet, it was hard to find my way into the lives of these characters.  Desai is a clinical expositor of culture and human nature rather more than a narrator of lives that draw us in (as for example, does Hosseni in The Kite Runner and A Thousand Splendid Suns).  There is little to be hopeful about in her descriptions, but one is left with some confidence in the whys and wherefores.

    I could find solace of hope in only two acts of free will in this book.  One, an acceptance of heart and determination of mind, and the other so daring and futile as to buttress our belief in the commonplace of courage.  Perhaps she is an optimist after all.
    -----
    public conveyance in Jaipur, 2004

    February 20, 2008

    Heros

    Yurirubinsky Tim Bray recently posted a retrospective on 10 years of XML history, and his expository device is just right - the people.  One of those people is the late Yuri Rubinski.  Is, is, is. Reading Tim's piece brought back the terrible sense of loss so many felt at Yuri's death.  The twinkle in Yuri's eye sparked Eric Miller and I to action that led to the Dublin Core.  He had a way of inspiring confidence and courage.  Please indulge my reissuing of what I wrote then about a personal hero.

    ------------
    A Tribute to Yuri Rubinsky
    August 2, 1952 - January 21, 1996

    The SGML community and the Web community were stunned by the news of the untimely death of Yuri Rubinsky on Sunday, January 21, 1996. Charles Goldfarb, the inventor of SGML, said of Yuri "His life was half a life long, but it was four lives wide, and eight deep."  Indeed, he had more energy, more enthusiasm, more humor, and more compassion than most roomfulls of people.  He infected those around him with these qualities, and the result was often consensus where there had been contention, common purpose where there had been self interest.

    Yuri brought leadership of a very special character to the communities in which he worked.  He was a talented businessman who helped spark the growth of a burgeoning industry and a successful company, and he did it in a way that benefited the entire community.  Yuri was a prominent exemplar of the philosophy that success flows naturally from helping others to achieve their goals.

    One of the stories told by his coworker, Bill Clarke, was of Yuri at a presentation to investors, at a time when SoftQuad was struggling financially.  Yuri waxed eloquent and enthusiastic about a new product that Softquad was introducing, and he easily convinced the investors of its merits.  Pens lifted, they were ready to sign, but Yuri would have none of it...  "Wait, Wait!  That's not all!  There's more!"  He wanted more than their signatures, more than their investment, he wanted their understanding, he wanted them to know the significance of the "quiet revolution" of SGML. This was his passion... to share its power so that "what ought to be done, can be done."

    His efforts on behalf of the visually impaired are a wonderful example of his passion to do what ought to be done.  His work on behalf of the International Committee on Accessible Document Design (ICADD) has helped establish a reasonable expectation that modern computer technology can serve the sightless as well as the rest of us.  The Web has the foundations for implementation of the ICADD technology largely due to his efforts.  Many who will never know his name will be able to participate in the Web Revolution because of Yuri's tenacious promotion of these standards.  Yuri took great delight in the fact that ICADD standards made it possible for the book he co-authored with Marc Giacomelli (Christopher Columbus Answers All Charges) to be available in its braille edition prior to being available in print.

    His work in the World Wide Web community is well illustrated in his spearheading of the award given to Doug Englebart at the Boston World Wide Web conference December of 1995.  Yuri not only conceived the award, but funded it with a $US 10,000 contribution from SoftQuad. This in itself was an exceptional and generous act, but he went to the further (and substantial) effort of assembling (in consultation with Englebart's daughter) selected historic writings of Englebart's into a booklet that commemorates this early and seminal contribution to hypertext systems.  Few  would have gone the extra mile that Yuri did, and as with everything he did, his energy and enthusiasm  made it seem the only natural thing to have done.

    It is disheartening to consider the enormity of the loss of Yuri's leadership in the text markup and the Web communities.  There is no one else like him.  For those fortunate enough to have shared his companionship, there is, as well, the heavy sadness of the loss of a friend and compatriot.   One of Yuri's close friends, Quentin Yardley, said of Yuri, "He could turn a walk into a parade"  For those who had the good fortune to walk in his parades, the cadence and music of his life will not fade... 

                "Wait, Wait!  That's not all!  There's more!"

    Stuart Weibel
    January 29, 1996

    -----

    The image of Yuri in this post is not mine - I scanned it those many years ago from a pamphlet... possibly from his funeral... I can't recall.  Many other eloquent tributes to this amazing person are available at http://xml.coverpages.org//yuriMemColl.html

    February 19, 2008

    Uncoupling identification and resolution

    Melbourne_riverfront_night_6952 Conflating identity and resolution of Web resources is often useful... it is usually the right thing to do. But I've written in the past about the need, on occasion, to uncouple these fundamental functions.

    There is a fairly long standing and often vitriolic debate among Web technologists about whether there should be a component of web architecture that does this: identifiers that simply identify, and carry no implication of resolution.  The Just-Use-HTTP camp insists that there is no place in the naming architecture of the Web for identifiers not grounded in the HTTP protocol, even when resolution is not intended.

    Others of us have argued that persistent identifiers without a direct resolution mechanism are useful and desirable.  DOI's are a purpose-built example of this, to support the management of commercial publishing assets.  INFO URIs are intended to meet the need in other niches.  URNs were the earliest effort in this direction, though they have not been wildly successful.

    Thus, it was interesting and ironic to see a post on a W3C Team blog about  excessive traffic (100 million hits a day!) resulting from the static HTTP identifiers associated with DTDs (document type definitions) hosted by the W3C.  It is important to declare, maintain, and serve such resources, but they are not intended for routine retrieval by applications or users. Instead, such structural declarations are meant to be parsed by applications that intend to process data according to a set of declarations in the DTD, or more often, simply to confirm... 'yep... this is a document of a type known to me.'

    The use of HTTP identifiers (URLs) is an implied promise... a label that says 'there's something here for you to retrieve'.  Yes, I know about HTTP headers.  I understand that an application written to the latest protocols will understand the return codes and should take intelligent action based on those codes.  But it is laughable to expect all web applications to be well behaved in this way.  The blog post speaks of applications "creating a Distributed Denial of Service (DDoS) attack against W3C" and "abusive request patterns".  In fact, the root cause of the very real dilemma faced by the W3C and others with this problem is the ideological opposition to an alternative solution: Identifiers uncoupled from resolution. 
    -----
    The Yarra riverfront in downtown Melbourne

    February 18, 2008

    Metadata: Semantics; Structure; Syntax

    Ibises_6636 Peter Murray, aka the Disruptive Library Technology Jester posted an encapsulated history of the origins of the Dublin Core, and observed that he still is

    trying to reconcile what differences exist between RDF and the DCAM based on these postings and comments from Stu’s blog.

    I'm glad that people are engaged in trying to sort this out, even as I'm unhappy that its still unclear at this late date.  That it still IS unclear is incontrovertible (look at the caliber of people trying!).  I'm not very confident at this point that I can wash away the confusion, but it does seem potentially useful to reprise a part of my metadata talk that I used to give a lot.

    Sharing metadata requires agreements on three topics:

    1. Semantics: what is the meaning we are trying to convey in metadata assertions?  Meaning, of course, resides in the minds of people, not machines.  The focus of the Dublin Core effort has been to promote those shared meanings... and make them sharable.  The semantics bit is about agreeing about elements: author, publisher, date, etc.
    2. Syntax: how do you take a set of metadata assertions and pack them so that one machine can send them to another, where they can be unpacked and parsed by machine logic or displayed and read by a person  with high probability that the meaning of the assertions travel unchanged from one mind to another. RDF documents refer to  serialization... the order of bits in a stream... actually putting the stuff 'on the wire.' (The careful readers and jaded among you may wonder why i changed the order of exposition  from the title of this post.  Best for last? no... hardest.)
    3. Structure: You can't do syntax reliably unless you have unambiguous structure.  The sorts of things you have to specify in a well-structured metadata assertion (not an exhaustive list):
    • The boundaries of a set of assertions (what constitutes a record)
    • Cardinality - Can an element be repeated, and if so, is there a limit on the number?
    • How is a name structured? What is the delimiter separating elements of a compound name (Prince and Bono excepted, most names are compound structures, many with surprising and confounding complexity).
    • How is nesting managed?
    • How are dates encoded? YYYY-MM-DD? DD-MM-YYYY? MM-DD-YYYY?
    • How does one identify an encoding scheme that specifies the above question?
    • How does one identify a value encoding scheme (rg. LSCH, MeSH, Dewey) from which metadata values can be chosen?  Are such schemes required or optional?
    • Are metadata values specified by reference (URI) or by value (literal strings)?

    Most of these issues are not addressed in RDF. The can be, of course... but without agreements about how to do so, people tend to do them this way and that, leaving us without the ability to share data effectively.  This is where the Dublin Core Abstract Model (DCAM) comes in, as it specifies how to structure these sorts of things in a way that makes the data sharable.

    Is it perfect and generalizable?  No... its authors, in comments on my posts, have made evident that they make no such claim. Is it the best that is available for descriptive metadata?  I assert that it is, and that efforts to work towards an Uber-Metadata-Model should start with this effort and simplify or complexify as is necessary and sufficient to assure that metadata  can be shared across communities.

    One last point.  DCAM is articulated in the vernacular of RDF, but the structure that it creates is independent of RDF.  If RDF passes into the graveyard  of once-or-never-mighty technologies, the abstractions it (DCAM) declares survive quite nicely.  Syntax independence: a goal we strove for from day 1 of the first DC metadata workshop.  It is a worthy metadata engineering principle.

    To sum up: Defining semantics is a political process of reaching consensus.  Syntax is arranging the bits reliably so they travel comfortably between computers (RDF is a fine way to do this, but by no means the only way), and structure is the specification of the details necessary to layout and declare metadata assertions so they can be embedded unambiguously in a syntax.  A data model is the specification of this structure. 
    -----
    I was influenced to include semicolons in the title of this post by an article in today's NYTs, forwarded to me by Marguerite.  I LIKE semicolons, even if they are stodgy.
    -----
    Wary Ibises (or something like them) in Barwon Heads, Australia

    RESTful Repositories?

    Melbourne_maze_6954 Andy Powell has made notable contributions to best practices concerning persistent identifiers for quite a long time.  I have always found his recommendations practical and free of ego.  I almost said free of ideology, but of course we all suffuse our musings with the aggregate of experience and beliefs which round to ideology.  The ideology of others tends to disappear to the extent that it matches our own.  Protective coloring rears its barely-discernible head.

    Andy's opening keynote at the VALA2008 conference in Melbourne a fortnight ago exposed a theme of his ideology which serves the community well in the domain of persistent identifiers, and which he brings to bear on the evolution of repositories.  I paraphrase:

    Deviation from mainstream Web idioms reduces uptake and quenches the natural interconnectivity which underlies the richness of the Web.

    Andy's own summary of the issues includes the following:

    our current preoccupation with the building and filling of 'repositories' (particularly 'institutional repositories') rather than the act of surfacing scholarly material on the Web means that we are focusing on the means rather than the end (open access). Worse, we are doing so using language that is not intuitive to the very scholars whose practice we want to influence.

    One way to think about repositories is as the bookshelves of the digital library.  They are designed to impose order and facilitate management of content.  We don't ask scholars, having just published an article or book, to 'go to the library to find the most appropriate place for it... and don't come back until you do!'  Not a perfect analogy, but it speaks to the issue of mandating overhead to authors in order that their work is fixed in the scaffold of their discipline's knowledge stores.  Still, we have bookshelves for a reason, and something like them is necessary to support the management of digital assets as well.  It is hard enough to look after digital resources in a persistent way.  Current repository technology is not yet mature, for sure, but it isn't the case that we don't need what it is trying to deliver. (I don't think Andy would disagree with this).

    Andy goes on to say:

    our focus on the 'institution' as the home of repository services is not aligned with the social networks used by scholars...  As a result, we resort to mandates and other forms of coercion in recognition that we have not, so far, built services that people actually want to use. We have promoted the needs of institutions over the needs of individuals.

    Well, yes, but it isn't as though we don't see this all the time.  It is a rare case when the institutions I have administrative dealings with tailor their procedures and requirements to my needs.  Instead, procedures are designed to increase management efficiency, often at the time-expense of individuals.  I myself have been known to whine about just such impositions (duh), but presumably, the gains in efficiencies of such requirements redound to the general benefit of all.  Thats the theory, anyway.  Sometimes its even true.

    The question of where the natural home for repository functionality might be is tricky.  Lorcan Dempsey refers to institutional reputation management -- a natural and important piece of the puzzle.  Publishers are loathe to lose control of the content, but their time is passing... open access is simply too compelling a juggernaut to be resisted.  OA is a when question, not an if question. Professional societies want to play, and some of them sit, somewhat uncomfortably, astride the roles of domain advocacy and commercial publishing. Witness the American Chemical Society doing the splits as the open data boat slowly slides away from the commercial asset management pier.

    It is still possible that another entirely different model will emerge... more in-the-cloud.  A distributed model does seem to complicate curation, (and that institutional reputation thing), but I wouldn't count it out just yet.  Still, some institution has to take care of this stuff... responsibility involves the attachement to artifacts, even if they are bitstreams.

    Andy goes on to appeal for more RESTful architectural design, and in this I think he is dead on the mark:

    the 'service oriented' approaches that we have tended to adopt in standards like the OAI-PMH, SRW/SRU and OpenURL sit uncomfortably with the 'resource oriented' approach of the Web architecture and the Semantic Web.  We need to recognise the importance of REST as an architectural style and adopt a 'resource oriented' approach at the technical level when building services.

    There are some details of Andy's perspective that I'm happy to contend, but as usual he forces our attention back to design for the Web, of the Web, by the Web.  Sounds pretty much right to me. An excellent keynote to start off VALA 2008.

    Post Script: Speaking of architectures, I see that Roy Fielding, the prime progenitor of all things RESTful, has tired of endlessly explaining the same things on list-after-list, and has started a blog.  This is welcome news indeed.  And you gotta love his colorful domain name.
    -----

    Public sculpture along the Southbank Promenade of the Yarra River in Melbourne

     

    February 17, 2008

    List Making Meets Redirection

    Torquay_gull_7108 My Belltown blogging and photography friend, Bruce Moore, passed along this hot tip of a new service that integrates the virtues and vices of redirection (ala PURLs and TinyURLs) with list making.

    Submit a list of URLs and it creates a short URL that takes the client to a page with the list of the links (including an option to open them all).

    Good for twittering a collection of links or sharing or 'naming' a list of web-addressable resources.

    There is apparently a Firefox extension that will create a list of your open tabs, and an API (at this time, according to the website, the twitter bot is still in development).

    Spam-redirects are not a problem (at least an automatic problem), as you are redirected to a list of links, rather than to the links themselves (though, that open all links button could bite).

    One hears more and more about personal online identity management.  Interesting possibilities with this approach. The collection of links representing my current web presence:

    http://linkbun.ch/0nw

    No mention of persistence... how could there be?  I wouldn't bundle up all your URLs behind this service just yet. In any case, the idea is intriguing and potentially useful. I wonder if Eric Miller is listening....
    -----
    A gull at Torquay Beach, south of Melbourne (2008-02)