605.744: Information Retrieval, Spring 2009
- Lecturer
- Paul McNamee,
<paulmac@apl.jhu.edu>,
Textbooks
Course Times and Location
- Lecture: Thursday, 7:20pm - 10:00pm, Kossiakoff Center Room K-5.
- Email should be the primary means of out-of-class communication; however
I can meet with students by appointment.
- Course Overview
- This course covers the storage and retrieval of unstructured digital
information. Topics include automatic index construction,
retrieval models, textual representations, efficiency issues,
search engines, text classification, and multilingual retrieval.
- Grading Policy
- Grades will be given based on the A,B,C, etc..., scale, per
university policy. Work for the class will include homework
assignments, an independent research project, a midterm exam,
and classroom participation (e.g., quizzes, oral presentations).
Refer to the course outline for details.
- Academic Integrity
- Work for this class is expected to be the result of individual
effort; however, unless explicitly prohibited, it is perfectly
acceptable to make use of published examples and source code
from the literature or public domain - but only if attribution
is given.
Furthermore, while it is permissible to discuss the general
nature of lecture material and assignments with your peers, this
does not extend to discussing or revealing solutions or source
code.
Students are expected to uphold the academic integrity of the
university.
Students using without reference, published material or copying
the work (i.e., particulary source code) of another individual
will face consequences such as receiving a zero on the
assignment and having the matter referred to the dean.
Contact me if you have any questions, no matter how
slight, about this policy, or if you have questions about a
particular assignment.
Assigned Readings
- 1/29/09 Chapters 1 and 2 in Manning, Raghavan, and Schütze.
- Michael Lesk, The Seven Ages of Information Retrieval
- 2/5/09 Chapters 3 and 4 in Manning, Raghavan, and Schütze.
- 2/12/09 Chapters 5 - 7 in Manning, Raghavan, and Schütze.
- 2/12/09 G. Salton and C Buckley, Term-Weighting Approaches in Automatic Text Retrieval, IPM 24(5), pp. 513-523, 1988.
- 2/19/09 Chapters 8 and 9 in Manning, Raghavan, and Schütze.
- 2/26/09 Chapters 11 and 12 in Manning, Raghavan, and Schütze.
- 3/5/09 Chapters 13, 14, and 15 in Manning, Raghavan, and Schütze.
- 3/5/09 Goodman et al., Spam and the on-going battle for the inbox., CACM 50(2), pp. 24-33, 2007.
- 3/12/09 Chapters 19, 20, and 21 in Manning, Raghavan, and Schütze.
- 4/2/09 K. Kashida, Technical issues of cross-language information retrieval: a review, IPM 41, pp. 433-455, 2005. (Handed out in class)
- 4/16/09 M. Sanderson, Retrieving with Good Sense, Information Retrieval 2(1), pp. 49-69, 2000.
Handouts
Assignments
Course related web-links
- Sources for on-line papers:
CiteSeer
ACL Anthology
TREC Publications
ACM Digital Library
- IR Textbooks:
Managing Gigabytes,
Information Retrieval: Algorithms and Heuristics
Readings in Information Retrieval (Amazon),
Foundations of Statistical Natural Language Processing
- IR Evaluations: TREC,
CLEF,
NTCIR,
FIRE
- Organizations that distribute corpora:
LDC,
ELRA
- IR Journals: JASIST,
IP&M,
IR
- IR-related conferences:
SIGIR,
CIKM,
KDD,
ACL
WWW-2009
- On-line magazines:
The Noisy Channel,
Search Engine Watch,
D-Lib Magazine
- Berkeley Primer: Finding Information on the Internet
- HLT Central Repository
- Discrete Mathematics Primer
- Web Protocols:
HTML,
Z39.50 (Information Retrieval)
Software Resources
Frivolity
Cool Demos
- A 'meta' search engine: Dogpile
- A question-answering system: START
- An online joke recommendation system that demonstrates
collaborative filtering:
JESTER
- A faux computer science paper generator,
SCIgen, from MIT
IR Test collections
Web Search Engines
JHU Links
Paul McNamee:
http://apl.jhu.edu/~paulmac/
(paulmac@apl.jhu.edu)