11-07, 16:05–16:45 (US/Eastern), Central Park East
This talk presents our novel work on detecting changes in topics over time and visualizing the change in the meaning of words within a corpus. Utilizing cosine similarity from word embeddings, our method maps the convergence or divergence of topic clusters over time. Applied to a diverse text dataset, such as news articles, this approach provides deep insights into public discourse and information flow during major global events. We demonstrate this through sports data, COVID-19-related data, and nursing home reviews. This talk benefits anyone interested in understanding how words and languages change over time, particularly for data scientists who need an analysis tool for gauging insights into text data. No prior knowledge is required, as the talk will offer a high-level overview of the methodology through case studies. The presentation will consist of a 10-15 minute discussion of the paper's findings with various examples, followed by a 10-15 minute demonstration of visualizing word changes in various datasets through our Django web app.
Original paper: https://arxiv.org/abs/2209.11717
In this talk, we will present our research paper, Temporal Analysis Utilizing Word2Vec, which outlines a novel method for analyzing and visualizing temporal trends in topics. This paper serves as a resource for those interested in the technical details behind the methodology. This talk is targeted towards professionals and academics who are interested in linguistic data exploration using Python.
Outline:
Introduction (5 minutes): The talk begins by introducing the key concepts and context behind the methodology, emphasizing the need for such a tool due to the difficulty analysts face in identifying meaningful linguistic patterns and changes within vast amounts of text data.
Methodology Overview (5 minutes): We'll intuitively describe the methodology, followed by visual presentations of expected outcomes. Details include describing data formatting, Word2Vec, k-means clustering, embedding spaces, and techniques for measuring similarity.
Use Cases (10 minutes): We'll demonstrate the practical application and relevance of the methodology through specific use cases. This includes news datasets for sports and COVID-19, as well as nursing home survey analysis.
Live Demo (10-15 minutes): The talk concludes with a live demo, allowing attendees to interact with the tool via a public link and explore the code through a shared GitHub repository.
Github: https://github.com/angadsinghsandhu/trends-demo
Paper: https://arxiv.org/abs/2209.11717
No previous knowledge expected
Vishesh is a second-year undergraduate student at the University of Maryland and an AI/ML Intern at ExploreDigits, primarily researching and developing NLP techniques for healthcare data tasks.
MSE student at Johns Hopkins University. Working on advanced multimodal models at Johns Hopkins Medicine, building tools for the best doctors in the world.
Faizan Wajid is a PhD Candidate at the University of Maryland, Computer Science department where his research focus is on analyzing people's full-day respiration patterns in non-clinical settings. He is also the Senior Data Scientist at ExploreDigits, Inc. where he is developing NLP models to analyze data pertaining to nursing homes and health policy.