![]() |
eSI Visitor Seminar: "Fast Approximate String Searching for Wikipedia, P2P and Biological Sequences" by Ela Hunt01 May, 2008 04:00 PM - 05:00 PMe-Science Institute, 15 South College Street, Edinburgh |
![]() | |||
| |||||
Any slides or other material generated as a result of this event can be found at: www.nesc.ac.uk/action/esi/contribution.cfm?Title=887 | |||||
AbstractApproximate string searching on natural language text uses database indexing only to a limited extent and does not work for short words. In biological string searching, indexing avenues are being actively investigated. In both contexts, suffix trees and n-grams are the main index types used.
I will present a new development in indexing for natural language, with application to web documents and tested in both client-server and P2P scenarios, and possibly extensible to biological string searching. I will first discuss the concept of the deletion neighbourhood and outline some complexity issues related to this idea. Then, I will move to the application areas where the concept proved to be beneficial. I will summarise the results of various performance tests with Wikipedia, Moby Dick, and natural language dictionaries, and then move on to a P2P DHT-based scenario which was the subject of another test. Finally, I will discuss possible extensions of this work, and the forthcoming tests with biological sequences.
BiographyEla is originally from Poland where she graduated in English from the Jagiellonian University of Krakow (MA thesis on Mervyn Peake). She then obtained further qualifications in Scottish Literature (MPhil thesis on John Galt) and Computing Science (BA and Diploma in Computing, OU), with a PhD in Computing from the University of Glasgow in 2002 on the creation of very large disk-based suffix trees. She worked at the Jagiellonian University of Krakow as lecturer in English, at BP Exploration in Scotland and Max Planck Institute for Molecular Genetics in Berlin as a software analyst, at the University of Glasgow in Scotland as a Polish lector and a research fellow in computing science, and recently as an Oberassistentin in the Database Technology Research Group led by Prof. Klaus R. Dittrich at the University of Zurich. She joined the Global Information Systems Group in June 2006. WebcastThis meeting was webcast live. For the majority of the meetings that we broadcast, we keep a copy (for a limited period) and make it available from the event material page. This copy of the webcast is normally available the day after the meeting. Related Links
http://www.globis.ethz.ch/hunt TravelFull details on how to get to the e-Science Institute are available at: EnquiriesEnquiries should be made directly to our Conference Administrator.
|
|||||
| The e-Science Institute Visitor Seminar | |||||