There is so much text in our lives, we are practically drowning in it. Fortunately, there are innovative tools and techniques for managing unstructured information that can throw the smart developer a much-needed lifeline. In this talk, based on the outline of the book Taming Text, you will receive an introduction to a variety of Java-based open source tools that aide in the development of search and NLP applications.
In this presentation, you will be introduced to useful techniques like full-text search, proper name recognition, clustering, tagging, information extraction, and summarization. We wll explore real use cases as you systematically absorb the foundations upon which they are built. Discussed in a clear and concise style, avoiding jargon, we will explain the subject in terms understandable without a background in statistics or natural language processing. Examples are in Java, but the concepts can be applied in any language.