# # Copyright (c) 2005 Richard Cameron, CiteULike.org # All rights reserved. # # This code is derived from software contributed to CiteULike.org # by # # Redistribution and use in source and binary forms, with or without # modification, are permitted provided that the following conditions # are met: # 1. Redistributions of source code must retain the above copyright # notice, this list of conditions and the following disclaimer. # 2. Redistributions in binary form must reproduce the above copyright # notice, this list of conditions and the following disclaimer in the # documentation and/or other materials provided with the distribution. # 3. All advertising materials mentioning features or use of this software # must display the following acknowledgement: # This product includes software developed by # CiteULike and its # contributors. # 4. Neither the name of CiteULike nor the names of its # contributors may be used to endorse or promote products derived # from this software without specific prior written permission. # # THIS SOFTWARE IS PROVIDED BY CITEULIKE.ORG AND CONTRIBUTORS # ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED # TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR # PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS # BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR # CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF # SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS # INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN # CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) # ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE # POSSIBILITY OF SUCH DAMAGE. # # Each plugin needs a description so the driver can advertise the details # to the users on the site plugin { # Integer version number for the plugin code. When this number is incremented, # CiteULike may reparse all existing articles with the new code. version {1} # The name of the plugin, as displayed on the "CiteULike supports..." page name {CiteSeer} # The link the front page of this service url {http://citeseer.ist.psu.edu} # Any additional information which needs to be displayed to the user. # E.g. "Experimental support" blurb {} # Your name author {Richard Cameron} # Your email address email {camster@citeulike.org} # Language you wrote the plugin in language {tcl} # Regular expression to match URLs that the plugin is # *potentially* interested in. Any URL matching this regexp # will cause your parser to be invoked. Currently, this will # require fork()ing a process, so you should try to reduce the number # of false positives by making your regexp as restrictive as possible. # # If it is not possible to determine whether or not your plugin is # interested purely on the basis of the URL, you will have a chance # to refine this decision in your code. For now, try to make a reasonable # approximation - like, check for URLs on the right hostname # # Note: Some universities provide mirrors of commericial publishers' sites # with different hostnames, so you should provide some leeway in your # regexp if that applies to you. regexp {^http://citeseer[^/]+(edu|unizh.ch|edu.sg)/([^/]+.html)$} } # # Linkout formatting # # CiteULike doesn't store URLs for articles. # Instead it stores the raw ingredients required to build the dynamically. # Each plugin is required to define a small procedure which does this formatting # See the HOWTO file for more details. # # # The variables following variables are defined for your use # in the function: type ikey_1 ckey_1 ikey_2 ckey_2 # format_linkout CITES { return [list "CiteSeer (at Penn State)" \ "http://citeseer.ist.psu.edu/${ckey_1}.html" \ "CiteSeer (at MIT)" \ "http://citeseer.lcs.mit.edu/${ckey_1}.html" \ "CiteSeer (at Zurich)" \ "http://citeseer.ifi.unizh.ch/${ckey_1}.html" \ "CiteSeer (at Singapore)" \ "http://citeseer.comp.nus.edu.sg/${ckey_1}.html"] } # # TESTS # # Each plugin MUST provide a set of tests. The motivation behind this is # that web scraping code is inherently fragile, and is likely to break whenever # the provider decides to redisign their site. CiteULike will periodically # run tests to see if anything has broken. # Please provide as comprehensive a set of tests as possible. # If you ever fix a bug in the parser, it is highly recommended that # you add the offending page as a test case. test {http://citeseer.ist.psu.edu/wexelblat99footprints.html} { formatted_url {{CiteSeer (at Penn State)} http://citeseer.ist.psu.edu/wexelblat99footprints.html} formatted_url {{CiteSeer (at MIT)} http://citeseer.lcs.mit.edu/wexelblat99footprints.html} formatted_url {{CiteSeer (at Zurich)} http://citeseer.ifi.unizh.ch/wexelblat99footprints.html} formatted_url {{CiteSeer (at Singapore)} http://citeseer.comp.nus.edu.sg/wexelblat99footprints.html} linkout {CITES {} wexelblat99footprints {} {}} abstract {Inspired by Hill and Hollan's original work [6], we have been developing a theory of interaction history and building tools to apply this theory to navigation in a complex information space. We have built a series of tools --- map, trails, annotations and signposts --- based on a physical-world navigation metaphor. These tools have been in use for over a year. Our user study involved a controlled browse task and showed that users were able to get the same amount of work done with significantly...} author {Wexelblat Alan A {Alan Wexelblat}} author {Maes Pattie P {Pattie Maes}} title "Footprints: History-Rich Tools for Information Foraging" year 1999 title_secondary {CHI} url {citeseer.ist.psu.edu/wexelblat99footprints.html} type INCONF start_page 270 end_page 277 status ok } test {http://citeseer.ist.psu.edu/moffat96selfindexing.html} { formatted_url {{CiteSeer (at Penn State)} http://citeseer.ist.psu.edu/moffat96selfindexing.html} formatted_url {{CiteSeer (at MIT)} http://citeseer.lcs.mit.edu/moffat96selfindexing.html} formatted_url {{CiteSeer (at Zurich)} http://citeseer.ifi.unizh.ch/moffat96selfindexing.html} formatted_url {{CiteSeer (at Singapore)} http://citeseer.comp.nus.edu.sg/moffat96selfindexing.html} linkout {CITES {} moffat96selfindexing {} {}} abstract {Query processing costs on large text databases are dominated by the need to retrieve and scan the inverted list of each query term. Here we show that query response time for conjunctive Boolean queries and for informal ranked queries can be dramatically reduced, at little cost in terms of storage, by the inclusion of an internal index in each inverted list. This method has been applied in a retrieval system for a collection of nearly two million short documents. Our experimental results show...} author {Moffat Alistair A {Alistair Moffat}} author {Zobel Justin J {Justin Zobel}} title {Self-Indexing Inverted Files for Fast Text Retrieval} year 1996 journal {ACM Transactions on Information Systems} url {citeseer.ist.psu.edu/moffat96selfindexing.html} issue 4 volume 14 start_page 349 end_page 379 type JOUR status ok }