My Photo

WorldCat


Twitter Updates

    follow me on Twitter
    Blog powered by TypePad

    google analytics


    meter


    Categories

    Categories

    March 03, 2009

    Are Data Repositories the New Institutional Repositories?

    Seattle-night-353


    Having spent the better part of 18 months in the hunt for the DataNet grail, I read with interest Dorothea Salo's Caveat Lector rebuke of Cliff Lynch's remarks about the library's stake in data curation.  Her angst is worth heeding.   The community's first-generation institutional repository experience has been desultory at best, resulting in modest progress and (at least in Dorothea's view) loss of credibility with those who dispense local resources - administrators.

    Repositories are fundamental components of the digital library -- the 'shelves' and the 'catalog' rolled into software, and it is perhaps understatement to say we haven't gotten this quite right.  There are design issues, usability issues, turf to fight over.  Dorothea's comments suggest the major problem, however, may be one of social engineering:

    Libraries have already been sold one pig in a poke—that was the “faculty want someplace to put their papers” pig. I, as a repository-rat, have zero credibility left. Seriously, none.

    Institutional repositories did not, as far as I can tell, emerge from faculty needs.  Rather, they were born of issues of ideology (Open Access), technology (how do we manage our electronic collections), and institutional reputation (our reputation will increase in proportion to our cache of IP).  All eminently reasonable motivations, but the needs of faculty scarcely entered into it.  A big part of the problem.

    And now comes Big Data.  At least institutional repositories contain human-readable artifacts.  Its not so much of a stretch, then to imagine that our 4th generation Kindle-like-devices may be calling home to one or many IRs.  Data, on the other hand, is not nearly so warm and fuzzy, needing agreements about structure, rendering, analytic methods and more.  What are the use- and reuse-cases? Are there tractable business models?

    Enter the DataNet solicitation.  NSF will award five $20 million USD grants to data curation teams charged with protecting research investments while developing sustainable business models that  increase the efficacy of science through reuse and repurposing of data.  This is a huge task, with conflicting goals, uncertain methodologies, and unresolved incentive structures .  One might forgive the skepticism of the Dorothea's of the community.  One of the lynchpins of success will be the ability to make faculty's lives easier while serving the larger technological and economic needs.

    Among the compelling common threads concerning institutional repositories and data repositories is that the learning and research communities of the future must include the capabilities of both in one form or another, or risk wholesale losses of digital data and the investments they represent.  However badly we've done them in the past, we must redo them until we get them right.  They will be basic enabling infrastructure for our communities, supporting not only institutions, disciplines, and faculty, but the very fabric of innovation upon which we rely for our prosperity. 

    The social engineering of incentives and services will be as critical to success as the business models and cost structures. Dorothea suggests:

    Wait until the NSF and NIH and Mellon get serious. Then go to libraries. Or better yet, tell faculty to go to their libraries; faculty asking for help have more credibility with library administrators than I will build up in my entire library career.

    And seriousness here means systematic commitments of funders to sustainability, not just grant programs.  System designers, on the other hand, must develop practical systems that assure researchers better access to data without compromising resources for innovation or ensnaring them in time- and soul-destroying submission procedures -- and with suitable professional incentives for participation.

    But can we wait for all that, as a profession?  I don't think we can.  Data curation curricula are emerging in the iSchools (UIUC and UNC both have data curation activities of note).  The DataNet Federation will provide important pointers to necessary services (and the skill sets that our community must develop and nurture to be useful), and of course we have provided strong leadership in the development of preservation standards such as PRISM and OAIS that are key pieces of the curation puzzle. Jane Greenberg, of UNC, and I have recently launched a Dublin Core community to look at metadata and scientific datasets, another important piece of the puzzle.

    The DataNet solicitation proposes:

    ...new types of organizations ...[that]... will integrate library and archival sciences, cyberinfrastructure, computer and information sciences, and domain science expertise to:

    • provide reliable digital preservation, access, integration, and analysis capabilities for science and/or engineering data over a decades-long timeline;
    • continuously anticipate and adapt to changes in technologies and in user needs and expectations;
    • engage at the frontiers of computer and information science and cyberinfrastructure with research and development to drive the leading edge forward; and
    • serve as component elements of an interoperable data preservation and access network.

    Hardly a challenge we can do alone... or afford to turn away from.

    _____

    Seattle at night, from the perch of Kerry Park on Queen Anne hill,  February, 2009

    February 16, 2009

    Canonical Identifiers and the Link Tag

    Pre-wash One of the grand challenges on the Web is canonical identification. Its easy enough launch resources into the aethers.  If the organization is naturally oriented towards persistence, even that, in general, is tractable.  One's constituents tend to howl when you don't keep this promise.

    But choosing a good identifier and making it stick is pretty hard.  Identifier functional characteristics  often conflict with one another, and the identifiers favored by business purposes (which pretty much always carry the day) aren't necessarily optimal from a perspective of long term persistence and management. 

    One aspect of the problem just got easier.  Google, Yahoo, and Microsoft have agreed on a convention for identifying the canonical identifier for a given resource that may be rendered under different transactional URLs. 

    It is a simple, elegant solution that requires no new tags... rather, a new attribute on the link tag that has been around since the beginning.  No new technology: just add consensus.  Search engine optimization (SEO) just got easier.

    This is a step forward in making the Web a more coherent place.  See, among others, Mark8t.com blog
    -----
    The Buddy System: our dishwasher has a pre-wash setting with marginal carbon footprint of zero.  Other sorts of footprints can be problematic.

    January 13, 2009

    Generatives

    Little-bighorn-2118

    Kevin Kelly's latest manifesto: Better than Free  (via Helen Blower's twitter feed, hblowers) is about what he describes as generative characteristics of otherwise-free resources... traits you can monetize in the interest of supporting free services that people want, but won't click through currency to acquire on their own.
    His metaphor of the internet as copy machine is an interesting take, and leads to

    roughly eight categories of intangible value that 
    we buy when we pay for something that could be free.
    1. Immediacy
    2. Personalization
    3. Interpretation
    4. Authenticity
    5. Accessibility
    6. Embodiment
    7. Patronage
    8. Findability

    The value-add in these generatives is in most cases self evident, but he provides lucid amplifications of each.  Twenty years ago libraries could be thought of as the preeminent purveyors of at least half of these resource traits in the information value chain.   In the value-web we live in today, there is lots of competition, but by no means a monopoly (Google and Amazon notwithstanding).  This is a good thought-piece to factor into our continuing effort to reinvent our profession.
    -----
    Image: The graves (or at least markers) of those who fell at the side of George Armstrong Custer are no longer the only totems of the Little Bighorn monument.  Inclusion is good.

    Mission versus Task

    Latte-rorschak_0577 


    Today's Library Link of the Day points to a terrific audio entry from the Chronicle of Higher Education Techtherapy feature about Libraries and IT departments. I confess I'm not a fan of podcasts... reading is faster, and unless production qualities are very high, it just isn't compelling for me.  This discussion between Scott Carlson and Warren Arbogast, the Chronicle's TechTherapy hosts, is 13 minutes worth listening to.

    The discussion highlighted several differences, and even more similarities between the two groups. What caught my attention in this Mars-versus-Venus discussion is the Mission versus Task orientations of the respective communities.  Librarians are motivated by the mission of providing access, whereas IT departments are driven by the mission and the tasks of others... the academy (and administration). Librarians often (not always) have faculty status and responsibilities, whereas IT workers are typically staff.

    The similarities between the groups outnumber the differences, though, and these similarities are probably the greater source of tension between the groups.  According to the dialog:

    • the people skills of both groups are often perceived to be broken
    • both groups work in a changing work environment characterized by uncertainty about their future roles
    • resistance to change is widespread (sorry folks, but isn't this true for most everyone?)
    • both communities are threatened by commoditized web services
    • distractions of turf wars often impair effectiveness
    • service flailing -- rapid deployment of services to see what sticks -- is common (not necessarily a bad thing in an environment of fast-paced change)
    • second-class consciousness - librarians are not 'real' faculty... IT staff are just techies....

    The podcast asks the question... why is there such a rift?  Sibling rivalry is probably the easiest characterization.  Sibling conflict has more to do with similarities than differences in my substantial experience (youngest of 4 strong-willed siblings).  And the fight for the affections (resources) of the 'parent'.

    Much of my time is currently engaged in response to the NSF DataNet solicitation, which characterizes the future of data curation as an activity that will be managed by organizations that do not currently exist, and for which the library is the organizational metaphor and the machine room is the operational model.  An amalgamation of precisely these two cultures, brought together in a marriage of neccessity.  Success will depend on a happy marriage of mission and tasks.
    -----
    Latte Rorschak at the NaNung Cafe in the University District (December, 2008)

    December 18, 2008

    Blue Ribbon Task Force on Digital Preservation - 1

    Rainier_7272

    The NSF Blue Ribbon Task Force on Digital Preservation has released its Interim report, which sets the conceptual framework for a second report in about a year.  That report will complete the NSF brief, with recommendations and strategies for an overall approach to preserving public digital information.  This initial report is a fine piece of work that will inform the national agenda on digital preservation for the foreseeable future.  It synthesizes a broad spectrum of testimony by practitioners and embeds that testimony within principles of sustainability and business models that are the central core of the problem.  And there is a lot of data, as well... tables summarizing notable preservation activities in many countries that are navigating uncharted waters.

    In 2007 we passed a major inflection point.  According to IDC analysts, quoted in the report (page 10),

    the quantity of digital information created, captured, or replicated exceeded available storage capacity.

    And the discrepancy is projected to grow, leaving half of all digital information without a permanent home by 2011 [seen through the perspective of someone who has jettisoned a third of his life ‘cargo’ in the past year, this is not an unalloyed tragedy]. The question becomes one of selection and election… How do we choose what to save, and how do we pay for it?

    The features of the vulnerable datasets give texture to what otherwise might easily be a dull abstraction (however important).   The task force entertained testimony from managers of data with wildly different characteristics and value propositions, including:
    •    Scientific data sets, including the Protein Data Bank and the ICSPR data repository
    •    Commercial endeavors  (Boeing, RealNetworks, Microsoft)
    •    Libraries
    •    Think tanks and educational organizations (IDC, the Wharton School of Business)
    •    Public Television and the Academy of Motion Picture Arts and Sciences
    •    The Internet Archive , Portico,  and LOCKSS

    To bring the issue home (literally)… do you know who is charged with the responsibility for looking after all those great public television shows you grew up with?  No one… not stations, not the creators of the work, not the Corporation for Public Broadcasting, not PBS… no one.   In fact, the pervading climate is one of resistance to preservation, or at least to the expenditure of scarce resources on then when the now is hard enough to pay for.  At least for some of these materials, there are resale markets (be part of the solution… go buy some DVDs from your local PBS station!).  The rest is teetering on the edge of digital oblivion… or already gone.  And this is the stuff that has presence in the public mind.

    The problem gets harder for more obscure data (that is, almost everything else). The then versus now issue is core.  Though a portion of this report speaks of the tension between 'dividing the pie' to include preservation (and doing less of something else), versus 'growing the pie', this seems a hollow distinction.  Make the pie bigger, and you still have an allocation problem.  The question will always be what part of the budget must we allocate for preservation, and, more to the point, what portion of current production must be foregone in order to do so?  

    Beyond the cost issues there lie systemic organizational and social issues which will be difficult to solve, paraphrased here from the executive summary:
    •    Funding models which do not match long term needs
    •    Unclear preservation responsibilities among various stakeholders
    •    Inadequate incentives for collaboration among the stakeholders
    •    Complacency – we shouldn’t feel desperate yet
    •    Fear – we feel too desperate – there is no solution!

    This report is a lucid characterization of the problem, and documents the current state of affairs well.  It deserves careful study (my own efforts in that direction will be noted in additional posts here).  The next part – actionable strategies and recommendations -- is going to be far harder still.

    Congratulations to the task force on their achievements to date.
    ---------------

    I've lost the view that afforded images of The Mountain like this for the paltry cost of turning my head as I awoke in the morning.  I wonder if I'll ever have it again?  Seattle is a great city... even (especially?) kneeling in the snow.

    December 15, 2008

    Promises, promises

    GW-rushmore-2108

    The MacArthur Foundation, among whose fundable ‘causes’ is credibility in online information, is funding the start up costs of the Reference Extract project, that the New York Times characterized as “Google if built by librarians”.  Michael Eisenberg, emeritus Dean of the UW iSchool, is leading this effort involving David Lankes of Syracuse University, and a team at OCLC led by Jeff Penka.


    I spent a day last week among a dozen or so invited advisors brainstorming the direction and development of this system.  The group included librarians, technologists, entrepreneurs, venture capitalists, and researchers, and the diversity of their reflections mirrored the variety of their backgrounds.


    The effort is intended to augment, rather than displace, search capacity on the open Web, adding the credibility of librarianship to the mix of ‘special sauces’ that distinguish commercial search engine relevance ranking today.  


    Librarians have been known to turn up their collective noses at open search, impugning the quality of search results, and a plethora of online dross bolsters the position.  But it is mostly good enough for most of us, most of the time.  Will reference-librarian links improve that quality?


    It might be more useful to think of librarian-vetted links as another cut on relevance ranking… useful primarily in circumstances when ‘good enough’ is not quite good enough.  Still, such results are more likely to live comfortably within existing search environments, rather than in competition with them.
    Business models comprised a significant part of the day’s discussion.  Early in the discussion, one of the venture capitalists in the group posed the questions asked of any new venture:
    •    Who are the buyers?
    •    What are the compelling reasons to buy?
    •    Does the approach enjoy a sustainable competitive advantage?


    It is commonplace for buyers and payers for a given service to not be the same people, and this will almost certainly be the case in this venture.  It is hard to imagine a search service garnering by-the-drink or subscription support in today’s ubiquitous search environment.  This leaves advertising models (generally unappealing to public institutions such as libraries), foundation support (not a strong contender for long term sustainability), and the perennial favorite, subsuming the operational costs (losing money on every transaction, and making it up on volume).  


    The latter model we know well, and there are reasons it may be particularly effective in the present case.  But why should the library community take on yet-another cost in a time of flat or declining support?  Because future library use will depend over the long run on transferring brand equity from the physical to the digital world. 


    One of the UW librarians present at the meeting evoked a statistic related to their implementation of WorldCat Local, of which UW has been an early adopter.  Inter library loan fulfillment doubled almost immediately after launch – users found things and asked for them… twice as often as before the launch.  It is hard to ignore a doubling of a basic service demand that translates into more satisfied patrons.


    Thus, it is not outlandish to imagine substantial return on a reference/search/relevance-ranking system that amplifies the value of mediated search – generating results that enjoy the imprimatur of an information professional.  A successful service will raise the profile of what librarians have to offer web users, and return substantial dividends to libraries, helping to achieve credibility at web-scale that we have enjoyed in the physical world for a century and more.  And the costs of such a system will be substantially mitigated by mining existing reference workflows, not creating new ones.  Another example of making library data work harder.


    Librarians enjoy a degree of public trust that is rare among any group of professionals.  As is evident in the OCLC report Perceptions of Libraries and Information Resources, this trust translates to a brand equity that can be captured in a single word: books. The promise of RefEx is, in part, to extend the brand equity of physical information formats (books) to digital information credibility on the Web.  No single project will accomplish this, however well it succeeds.  But to fail to effect this transformation over time is to allow decay of public trust in proportion to the decline of the impact of print relative to digital systems.


    So, the value proposition for imbuing online information with greater ‘credibility ‘ is to convert the brand-equity of traditional book-bound trust into its digital equivalent – a brand-promise for managed, evaluated information stores that can be counted upon to meet needs when just-good-enough… isn’t.  This is a  promise librarians must keep. 

    ------

    George Washington's profile (Mt. Rushmore) from the flanks of the monument.  Taken in October on our trip west from Ohio to Seattle.

    October 12, 2008

    Transitions

    259-richards It is a perfect autumnal Sunday morning in Ohio, quite beyond improvement.  Sitting in the quiet sunshine, I am acutely aware that it is the last such morning we’ll spend with the New York Times on the stone patio of our home in Columbus.  Through our own choices and efforts, our home is becoming a house that in two days (bankers willing) will become someone else’s home.

    We have been somewhat taken by surprise at the difficulty of disassembling the physical manifestations of nearly two decades of our lives at 259 Richards Road (goodness, what did we expect?).  These walls are lacquered with the most important years of our lives – our children, grown and launched on their own trajectories, many years of Thanksgiving dinners with otherstemporarily exiled from home or country, and the precious friendships of parenting, poker games, and the patina of shared everyday lives.

    Some of those friends gathered us in for a farewell last evening, and the affection and care shown us in those hours will sustain us through the wrenching uprooting of the coming days, and perhaps, too, help amend the rocky Pacific Northwest soils we hope will be hospitable to our divided hostas and hearts. 

    We embark on our westward adventure with both deep sadness and hopeful expectation.  No time in national memory has been so fraught with uncertainty, and on this we overlay our own transitional insecurities.  But we leave our home of many years with confidence that our place in the hearts of our friends is secure.   Thank you, all.

    October 07, 2008

    Searching for Grace

    IMG_7375

    Christopher Lydon used to have a radio show on NPR called Open Source Radio.  The vagaries of media funding failed to sustain his airwave presence (a temporary state, I hope), but his voice remains online at http://www.radioopensource.org/.  His recent interview with Anna Deavere Smith (What We're Going Through) is a long drink of clarity for the spiritually parched.  As Christopher said in his email... "If you were able to stop listening, I want to know when, and why!"

    -----

    Neon signage in the Fremont area of Seattle

    September 02, 2008

    Chrome Plaited

    Seattle-thunder

    Even as a child, I didn't like comics much... I'm a fan of the well-crafted paragraph -- perhaps part of this is that writing is within my reach, whereas drawing is another thing entirely. 

    But its a visual age we live in, and I'm certainly not a fan of a text-only web, and there's that picture-worth-a-thousand-words thing, too.  But do we really need dialog balloons to convey textual information?  Art Spiegelman may have let the camel into the tent with Maus, but I mostly find the idiom of 'graphic novels' boring and tedious... even (especially!) if its in the New Yorker, which has been doing comic-idiom pieces for many years now.

    So, along comes Chrome, Google's latest foray in the Web-as-OS campaign, and curiously, the comic is one of the main launch communication channels.  I nearly winced.  Then I read it.   It is a convincing and clear explication of why the Web needs a Web browser built from the ground up, and I'll certainly try the browser at the earliest opportunity (later today, apparently, but given that even the comic server is overwhelmed as it seems to be at the moment, how long before it will be possible to slurp down the Chrome code?).

    The 38 page comic is remarkable in its clarity on topics such as multi-threaded processes versus multiple-process architecture, garbage collection, testing regimes (and why Google is likely to do this better than most), the virtues of Webkit and virtual machines, intricacies of user Interface choices, privacy, security, and more.  I couldn't have been more surprised.

    One of the charming aspects of the comic is that it features some of the engineers behind the browser, so there is yet another benefit for working there... you have a chance at becoming a comic book super-hero.  They missed an important one, though... the communications wizard who is responsible for this terrific archetype of a new age of documentation.

    -----

    An early morning thunderstorm from my balcony -- a rare thing in Seattle, and one of the things I miss about Midwestern weather.

    July 09, 2008

    Fruits of Passion

    IMG_9974

    Last night I attended a technoception at one of the hip, high tech companies in Seattle... Zaaz.  You know you can trust them for Web design, because their name is globally unique, cool, and SEO'd*.  Their home page is so cool you don't want to sully it with actual keystrokes.  I was afraid to go in, even though they promised the presentations would be there.  Quartered in Seattle's rendition of the Flatiron building, their digs are open, under-airconditioned for a city summer night, and full of the accoutrement of forward leaning, savvy makers and movers of the digital tribe.  Man, was I out of place.

    Good beer, wings, Whole-Foods catering (and WF made it into one of the speaker's slide decks as well).  Hair gel (guys), hair colors not found in nature (gals), and a really neat dog who'd make clients of skeptics just for petting rights (pictured here). This is not a work place my high school guidance counselor ever imagined.

    There were five talks on community software stuff, three of which had pretty compelling content.  Wendy Chisholm fought the good fight for universal accessibility, though she agreed when pressed that it will be legislation rather than clear-headed enlightened societal interest that will tip the balance.  Wendy got her accessibility bonafides at the W3C, and she speaks with earnest authority. You can get her book on the topic soon.

    Brian Fling gave a rapid fire slide presentation on mobile computing that had some great content. He admitted his time-allotment strategy was simply to talk fast until dragged from the stage.  Pity, because he had great content and a terrific, if gratuitous, story about his dad the inventor (you've used his stuff). Almost forgiveable.  He's right about the iPhone, though, and asserted (you can look this up) that mobile computing worldwide will double by 2010.  Two years.

    The first speaker  was Justin somebody of Zaaz (they didn't have programs), and he gave the stock Passion/Value/Strategy talk about social communities, with some monetization thrown in.  It was pretty convincing, my description notwithstanding.  One of his examples of passionate community was about the Big Green Egg (known to un-hip me due to Eric Miller, who makes stunning pizzas on his). 

    Passion was a topic of more than passing interest for me yesterday, as we learned that our own passions (about data curation) will remain unrequited for the coming NSF funding cycle.  An interesting end to a really lousy day.

    -----

    *SEO'd: Search engine optimized... yeah... someone actually acronymulated that in their talk.