Brown Bag is a weekly seminar for students to learn about the research currently going on around Cornell Computer Science. Each week a faculty member will give a presentation about one or more of their research projects. Students from all areas are encouraged to attend!
Catering is provided. To receive email announcements, send an email to cs-brownbag-L-request@cornell.edu with the subject line "Join".
Click here to show past talks.
- Tuesday, September 1, 2009Introductory Social Event
- Tuesday, September 8, 2009Directions in Computer ScienceJohn Hopcroft
- Tuesday, September 15, 2009Applications of Matrix StructureDavid Bindel
- Tuesday, September 22, 2009The Future Sounds Good!Doug James
- Tuesday, September 29, 2009The DBToaster ProjectChristoph Koch
- Friday, October 2, 2009AI Lunch on Friday 12:00-1:15 @ Upson 5130Ashutosh Saxena
- Tuesday, October 6, 2009A Smorgasbord of LearningThorsten Joachims
- Tuesday, October 13, 2009-- Fall Break --
- Tuesday, October 20, 2009Verifiable High-Level RoboticsHadas Kress-Gazit
- Tuesday, October 27, 2009Perceptually-Based Realistic RenderingKavita Bala
- Tuesday, November 3, 2009Building Rome in a Day from Internet Photo CollectionsNoah Snavely
- Tuesday, November 10, 2009A Consensus Protocol Taxonomy based on Stepwise RefinementRobbert van Renesse
- Tuesday, November 17, 2009Commodity Middleboxes Connecting Data Centers over Vast Geographical DistancesHakim Weatherspoon
- Tuesday, November 24, 2009Sensing and Planning for Autonomous Vehicles: Lessons from the DARPA Urban Challenge and BeyondDan Huttenlocher & Mark Campbell
- Tuesday, December 1, 2009Cancelled
- Tuesday, September 1, 2009
- Tuesday, January 26, 2010
- Tuesday, February 2, 2010Machine Learning Techniques for Vision and RoboticsAshutosh Saxena
- Tuesday, February 9, 2010Student Input on Faculty RecruitingÉva Tardos
- Tuesday, February 16, 2010b-Bit Minwise HashingPing Li
- Tuesday, February 23, 2010High Precision Network Analysis with the Softare-Defined Network AdapterDaniel Freedman
- Tuesday, March 2, 2010Things I Learned at the Museum (Tags, Design, Navigation, and Experience)Dan Cosley
- Tuesday, March 9, 2010Tips on preparing and giving a good talkKen Birman
- Tuesday, March 16, 2010
- Tuesday, March 23, 2010-- Spring Break --
- Tuesday, March 30, 2010Research Funding and More: Computer Scientists in DCFred Schneider
- Tuesday, April 6, 2010Using Fabric to escape centralized cloudsAndrew Myers
- Tuesday, April 13, 2010What is Research? A Reconstruction from 15 SnapshotsCharles Van Loan
- Tuesday, April 20, 2010Research in Industry and AcademiaDexter Kozen & Bobby Kleinberg
- Tuesday, April 27, 2010How to write and review papersChristoph Koch & Lillian Lee & Thorsten Joachims
- Tuesday, May 4, 2010Procrastination / Time managementMichael Chen
- Tuesday, January 26, 2010
- Tuesday, August 31, 2010Introductory Social Event
- Tuesday, September 7, 2010No Meeting
- Tuesday, September 14, 2010Dexter Kozen
- Tuesday, September 21, 2010No Meeting
- Tuesday, September 28, 2010Ken Birman
- Tuesday, October 5, 2010David Bindel
- Tuesday, October 12, 2010-- Fall Break --
- Tuesday, October 19, 2010Hakim Weatherspoon
- Tuesday, October 26, 2010Nate Foster
- Tuesday, November 2, 2010Doug James
- Tuesday, November 9, 2010
- Tuesday, November 16, 2010
- Tuesday, November 23, 2010Noah Snavely
- Tuesday, November 30, 2010Joe Halpern
- Tuesday, December 7, 2010Bob Constable
- Tuesday, August 31, 2010
- Tuesday, January 25, 2011(Mostly Interdisciplinary) Research Projects in Natural Language ProcessingClaire Cardie
- Tuesday, February 1, 2011Procrastination / Time managementMichael Chen
- Tuesday, February 8, 2011The Job-Talk Colloquium and Visit: How Good is the Spotlight?Charles Van Loan
- Tuesday, February 15, 2011Cancelled
- Tuesday, February 22, 2011Tensor ComputationsCharles Van Loan
- Tuesday, March 1, 2011Research opportunities in machine creativity, curiosity, and meta-cognitionHod Lipson
- Tuesday, March 8, 2011
- Tuesday, March 15, 2011Hadas Kress-Gazit
- Tuesday, March 22, 2011-- Spring Break --
- Tuesday, March 29, 2011How to give a theory talkDavid Bindel
- Tuesday, April 5, 2011
- Tuesday, April 12, 2011Robbert van Renesse
- Tuesday, April 19, 2011Dan Cosley
- Tuesday, April 26, 2011Foundations for Computer Security PoliciesMichael Clarkson
- Tuesday, May 3, 2011
- Tuesday, January 25, 2011
- Tuesday, August 30, 2011Introductory Social Event
- Tuesday, September 6, 2011
- Tuesday, September 13, 2011Rendering visibly thick materials using volume appearance modelsSteve Marschner
- Tuesday, September 20, 2011Exploring the scalability limits of consistency using Isis2Ken Birman
- Tuesday, September 27, 2011Iterated Regret Minimization: A New Solution ConceptJoe Halpern
- Tuesday, October 4, 2011People-Aware Computing: Towards Societal Scale Sensing using Mobile PhonesTanzeem Choudhury
- Tuesday, October 11, 2011-- Fall Break --
- Tuesday, October 18, 2011Commoditization of the Cloud: A Research AgendaHakim Weatherspoon
- Tuesday, October 25, 2011
- Tuesday, November 1, 2011Buying sight unseen: Predictive Modeling and RenderingKavita Bala
- Tuesday, November 8, 2011Safely evolving software systems by sharing familiesAndrew Myers
- Tuesday, November 15, 2011Communities, Spectral Clustering, and Random WalksDavid Bindel
- Tuesday, November 22, 2011Bidirectional Programming LanguagesNate Foster
- Tuesday, November 29, 2011Welfare and Revenue in Ad AuctionsÉva Tardos
- Tuesday, August 30, 2011
- Tuesday, January 24, 2012Organizational meetingHussam Abu-Libdeh
- Tuesday, January 31, 2012Open Q&A sessionJoe Halpern & Johannes Gehrke
- Tuesday, February 7, 2012The Distributed CameraNoah Snavely
- Tuesday, February 14, 2012Monitoring and Controlling the Smart Electric Power GridKen Birman
- Tuesday, February 21, 2012
- Tuesday, February 28, 2012Computing and IncentivesBobby Kleinberg
- Tuesday, March 6, 2012Elastic Replication for the CloudRobbert van Renesse
- Tuesday, March 13, 2012Systems for supporting self-awareness through reflecting on social mediaDan Cosley
- Tuesday, March 20, 2012-- Spring Break --
- Tuesday, March 27, 2012
- Tuesday, April 3, 2012
- Tuesday, April 10, 2012New methods for controlling leakage via timingAndrew Myers
- Tuesday, April 17, 2012How to give a talkDavid Bindel
- Tuesday, April 24, 2012New methods for controlling leakage via timing<Andrew Myers
- Tuesday, May 1, 2012
- Tuesday, January 24, 2012
- Tuesday, August 28, 2012An Epistemic Approach to Mechanism DesignRafael Pass
- Tuesday, September 4, 2012An Introduction to the Cornell NYC Tech CampusDan Huttenlocher
- Tuesday, September 11, 2012Improving Programming LanguagesRoss Tate
- Tuesday, September 18, 2012Semidefinite Programming Hierarchies and the Unique Games ConjectureDavid Steurer
- Tuesday, September 25, 2012A Formal Approach to Computer System DesignRajit Manohar
- Tuesday, October 2, 2012Computational Models for Social Phenomena in On-Line NetworksJon Kleinberg
- Tuesday, October 9, 2012-- Fall Break --
- Tuesday, October 16, 2012How to Give a TalkKen Birman
- Tuesday, October 23, 2012How to GraduateEmin Gün Sirer
- Tuesday, October 30, 2012Optimal Link-state Hop-by-hop RoutingKevin Tang
- Tuesday, November 6, 2012An overview of CS publication venuesFaculty Panel
- Tuesday, November 13, 2012Carla Gomes
- Tuesday, November 20, 2012Haiyuan Yu
- Tuesday, November 27, 2012José Martínez
- Tuesday, August 28, 2012
- Tuesday, January 22, 2013Welcome back / social event
- Tuesday, January 29, 2013How should a robot perceive the world?Ashutosh Saxena
- Tuesday, February 5, 2013Mobile Health (mHealth): from smart phone apps and sensor streams to behavioral biomarkersDeborah Estrin
- Tuesday, February 12, 2013Computer Science graduate student/department town hallJohannes Gehrke & Joe Halpern
- Tuesday, February 19, 2013Mathematics for the information ageJohn Hopcroft
- Tuesday, February 26, 2013What is it like to have a job at a smaller college?Walker White
- Tuesday, March 5, 2013The End of the Service Provider -- Is a Cellular Commons Inevitable?Steve Wicker
- Tuesday, March 12, 2013Hardware Enhanced Security: From Circuits to Architecture.Ed Suh
- Tuesday, March 19, 2013-- Spring Break --
- Tuesday, March 26, 2013Industry vs Academia (come with your questions prepared)Faculty Panel
- Tuesday, April 2, 2013Provably correct, high-level roboticsHadas Kress-Gazit
- Tuesday, April 9, 2013Microarchitectural Mechanisms to Exploit Value Structure in SIMT ArchitecturesChristopher Batten
- Tuesday, April 16, 2013Cancelled
- Tuesday, April 23, 2013Cancelled
- Tuesday, April 30, 2013Gates Hall (@ room 315)Building working group
- Tuesday, January 22, 2013
- Tuesday, September 3, 2013HyperDex: A Consistent, Fault-tolerant, Searchable NoSQL StoreEmin Gün Sirer
- Tuesday, September 10, 2013NetKAT: Semantic Foundations for NetworksNate Foster
- Tuesday, September 17, 2013Adapting High Assurance Distributed Computing Techniques for Cloud-Scale SettingsKen Birman
- Tuesday, September 24, 2013Medical Problems with Abundant Labeled DataRamin Zabih
- Tuesday, October 1, 2013Incentives in Crowdsourcing: A Game-theoretic ApproachArpita Ghosh
- Tuesday, October 8, 2013Excursions in Computational SustainabilityBart Selman
- Tuesday, October 15, 2013-- Fall Break --
- Tuesday, October 22, 2013How languages can secure the future distributed environmentAndrew Myers
- Tuesday, October 29, 2013Cancelled
- Tuesday, November 5, 2013Progress in Digital Sound Synthesis for Physically Based AnimationDoug James
- Tuesday, November 12, 2013Writing and reviewingLillian Lee & Thorsten Joachims
- Tuesday, November 19, 2013Department Town Hall
- Tuesday, November 26, 2013A Tale of Two Eigenvalue ProblemsDavid Bindel
- Tuesday, December 3, 2013Predicting the Visual Appearance of Real-World MaterialsKavita Bala
- Tuesday, September 3, 2013
- Tuesday, January 28, 2014Department Town Hall
- Tuesday, February 4, 2014Structure and appearance of fibers, yarns, and clothSteve Marschner
- Tuesday, February 11, 2014How to give a talkDavid Bindel
- Tuesday, February 18, 2014-- February Break --
- Tuesday, February 25, 2014Hamiltonian Monte Carlo: a hands-on tutorialDavid Mimno
- Tuesday, March 4, 2014Detecting Deception in On-line ReviewsClaire Cardie
- Tuesday, March 11, 2014-- Visit Day --
- Tuesday, March 18, 2014Toward a More Reflective Approach to 3D Model BuildingFrançois Guimbretière
- Tuesday, March 25, 2014Procrastination / Time managementMichael Chen
- Tuesday, April 1, 2014-- Spring Break --
- Tuesday, April 8, 2014Tensor DecompositionsCharles Van Loan
- Tuesday, April 15, 2014Network-centric recommendationDan Cosley
- Tuesday, April 22, 2014Finding a Research TopicFaculty Panel
- Tuesday, April 29, 2014Adventures in a teaching careerDaisy Fan
- Tuesday, May 6, 2014Triumph and Disaster: Life as an EntrepreneurStuart Staniford
- Tuesday, January 28, 2014
- Tuesday, September 2, 2014Designing Engaging Learning ExperiencesErik AndersenA key challenge in education is designing engaging instructional content that can be tailored to the needs of each student while making as few assumptions as possible. I argue that we can do this by modeling the knowledge we want to teach, analyzing these models to generate learning materials automatically, and optimizing these materials through large-scale experimentation. In this talk, I will present my work in co-creating three video games for teaching fractions that have attracted seven million players: Refraction, Treefrog Treasure, and Creature Capture. I will show how we can use test input generation tools to automatically generate progressions of practice problems for teaching a procedural skill, and how this technique can produce a level progression for Refraction – all of the playable content in the game – that engages players for as long as the original expert-designed progression. I will then present a programming-by-demonstration system that can categorize and reproduce 28 systematic misconceptions demonstrated by real students across nine procedures in K-12 math. Finally, I will demonstrate data-driven optimization of engagement in two online games by presenting results from a multivariate test with 27,000 players that measured the impact of secondary game objectives on player behavior. Future directions include designing games to teach conceptual topics such as reading comprehension, foreign language, critical thinking, and programming, restructuring content to match learner skills and strategies, and discovering optimal learning pathways.
- Tuesday, September 9, 2014Efficient and effective haplotype phase inference of large scale genetic datasetsAmy WilliamsThe recent and ongoing explosion of genetic data has enabled wide-ranging discoveries but created computational and analytic challenges. One such challenge is the inference of haplotypes—a series of DNA variants that occur on a single chromosome copy in an individual. While haplotypes are essential to many genetic studies, direct haplotype assays are costly. In this talk, I describe two methods for inferring haplotypes from genotype datasets, one that applies to families, the other to unrelated individuals. Inferring haplotypes in pedigree family data has been shown to be NP-hard, yet the hardness proof relies on large numbers of recombination events that do not occur in real genetic data. The family-based method HAPI takes advantage of the sparsity of recombination events to infer both minimum recombinant and maximum likelihood haplotypes for nuclear families in polynomial time on real data. When applied to a dataset containing 103 nuclear familes, HAPI ran over 300 times faster than state of the art methods. The second method, HAPI-UR, applies to unrelated and/or trio and duo family data. Using adapted form of a commonly used hidden Markov model (HMM), HAPI-UR runs more than 18 times faster than other methods and also achieves comparable or greater accuracy. These improvements are practically important because error rates in inferred haplotypes drop with sample size. We used HAPI-UR to infer haplotypes in a dataset of more than 58,000 samples and show that its error rate continues to drop with sample size, even using samples of diverse ancestry. The talk concludes with a discussion of ongoing applications of these methods to genetic studies and future directions.
- Tuesday, September 16, 2014Online Learning: From Theory to Algorithms and ApplicationsKarthik SridharanIn recent years online learning (sequential prediction) has received much attention as it often produces fast and simple learning algorithms that enjoy robustness to changing or even adversarial data sources. However, despite the extensive existing literature on online learning, our theoretical understanding of the framework has been rather lacking. Most existing analyses have been case by case, and there is a lack of a general theory and methodology for designing online learning algorithms for the problem at hand. The goal of this talk is to first present a new general theory for online learning that parallels results from statistical learning theory. Next, building on this general theory, I will provide a generic recipe for deriving online learning algorithms. Finally, we shall see how the tools and techniques presented can be used for designing efficient learning algorithms for several interesting problems, including online collaborative filtering, node classification in social networks, etc. I will conclude the talk with future extensions and ongoing research.
- Tuesday, September 23, 2014Autonomous Assembly In a Human WorldRoss KnepperFifty years ago, robotic automation revolutionized manufacturing. Modern factory robots, like their quinquagenarian counterparts, require humans to keep away during operation. Safety excludes the benefits of live human-robot interaction during the assembly process. Instead, interaction is restricted to tedious, inefficient, offline programming. A new generation of safe robots promises to permit humans to work side-by-side with machines, yet the technology still falls far short of human capabilities for many tasks. To increase productivity, we must reintroduce humans into assembly automation and allow them to work closely with robots as peers in order to leverage the best skills of both human and robot teammates.
In this talk, I present research in three technical themes necessary to endow robots with the capabilities to work as peers with humans. The first theme is cooperative motion, including the ability to navigate and manipulate among people in crowded and cluttered environments. The second theme is cooperative manipulation. Broadly construed, this category includes the capabilities to interpret a part's or tool's function from form, reorient and attach parts together, and assemble complex objects. Tasks may require these capabilities to be realized as an individual or collectively as a mixed human-robot team. The third theme is cooperative communication. To work with others, robots must be able to understand a concept of group activity and anticipate future actions. Robots must address humans' needs and allow humans to address their needs. I present the theory, algorithms, mechanisms, and instrumentation that will enable a collaborative human-robot assembly system. - Tuesday, September 30, 2014Visipedia Tool EcosystemSerge BelongieTo support scalable computer vision applications, we have built a suite of tools that allow for efficient collection and annotation of large image datasets. The tools are designed to both reduce data management overhead and foster collaborations between vision researchers and groups seeking the benefits of a computer vision application.
- Tuesday, October 7, 2014How To Give A TalkKen BirmanThis talk will talk about how to give a talk.
- Tuesday, October 14, 2014-- Fall Break --
- Tuesday, October 21, 2014From the Cloud to SoNIC: Precise Realtime Software Access and Control of Wired NetworksHakim WeatherspoonWe are, at last, on the verge of realizing the computer utility vision (Multics: 1965). Its name today is cloud computing. It promises to catalyze the technology economy, revolutionize health care, military, government, and financial systems, scientific research, and of course society. Central to the cloud and all of its promise is the network. Unfortunately, much of the network protocol stack is a black box to systems programmers, especially the physical and data link layers. These two layers contain valuable information to help ensure the network is reliable and performing. The issue: These two layers are often inaccessible in software as a result much of there potential goes untapped. In this talk, I will introduce SoNIC, Software-defined Network Interface Card, which provides access to the physical and data link layers in software. By implementing the creation of the bitstream in software and the transmission of the bitstream in hardware, SoNIC provides complete control over the entire network stack in realtime. As an example of SoNIC’s fine-granularity control, it can perform precise network measurements (in realtime) at the pico-second scale, accurately characterizing network components such as routers, switches, and network interface cards. Further, SoNIC enables timing channels with nano-second modulations that are undetectable in software.
BIO: Hakim Weatherspoon is an assistant professor in the Department of Computer Science at Cornell University. His research interests cover various aspects of fault-tolerance, reliability, security, and performance of large Internet-scale systems such as cloud computing and distributed systems. Professor Weatherspoon received his Ph.D. from Berkeley in 1999. Before receiving his PhD, Prof. Weatherspoon received his B.S. from University of Washington. Prof. Weatherspoon is an Alfred P. Sloan Fellow, Kavli Frontiers Fellow from the National Academy of the Sciences, and recipient of an NSF CAREER award, DARPA Computer Science Study Panel (CSSP), IBM Faculty Award, the NetApp Faculty Fellowship, Intel Early Career Faculty Honor, and the Future Internet Architecture award from the National Science Foundation (NSF). - Tuesday, October 28, 2014Materials In The WildKavita BalaOur daily lives bring us in contact with a rich range of materials that contribute to both the utility and aesthetics of the built environment. Human beings are remarkably good at making subtle distinctions in material appearance (e.g., silk vs. cotton, laminate vs. granite). We are working on a broad range of projects in my group on understanding, recognizing, modeling and rendering materials in the wild. I will describe our recent work on creating crowdsourced material databases, material recognition, and modeling and rendering of translucent materials.
This work has applications in graphics and vision applications including virtual and augmented reality, e-commerce and retail, and virtual prototyping for industrial, interior, and textile design. - Tuesday, November 4, 2014Civitas: Coercion-Resistant Remote VotingMichael ClarksonCivitas is an electronic voting system that enables voting from remote locations. Voters can be convinced that their votes are tallied correctly, while the secrecy of those votes is also maintained---even when someone tries to buy votes or to physically coerce voters. Civitas offers assurance through both cryptographic security proofs and information-flow analysis.
- Tuesday, November 11, 2014How Trustworthy Can Systems Become?Vincent RahliFor many of their essential activities, science, governments, businesses, and individuals depend on critical, often distributed, software systems that must be correct. When building such systems, programmers strive to provide evidence that their components and the interactions among them satisfy specifications. This is a non-trivial task in general, made especially difficult for cloud based systems, where data and programs are distributed and replicated, and yet must remain consistent and secure.
The PRL group at Cornell has built a framework within the Nuprl proof assistant to specify, verify and generate provably correct distributed systems. In this talk I will discuss this framework and will show how, once again, the magic of monads made all this possible. We will then discuss in what sense our code is considered correct, as well as solutions to gain even more trust in our code. - Tuesday, November 18, 2014A Bodyguard of Lies: The Use of Honey Objects in Information SecurityAri JuelsDecoy objects, often dubbed “honey” in computer security, are a powerful and time-honored tool for detecting and mitigating system compromise. They are underappreciated and underexplored, though, at a time when perimeter defenses are eroding and a steady drumbeat of major breaches (JP Morgan Chase, Home Depot, etc.) is afflicting industry systems. In this talk I’ll describe honeywords and honey encryption, principled approaches to the use of deception in information security that can help protect sensitive data in password managers, medical records, and elsewhere.
BIO: Ari Juels is a Professor of Computer Science in the Jacobs Institute at Cornell Tech. Visit http://www.arijuels.com for more - Tuesday, November 25, 2014Towards Reconfigurable Computing for Mainstream ProgrammersZhiru ZhangOver the last two decades, FPGAs have evolved from a small chip with a few thousand logic blocks to heterogeneous system-on-chips containing hardened DSP blocks, embedded memories, and billions of transistors. These advances have made FPGAs an attractive hardware device for high-performance reconfigurable computing. However, there is still a considerable productivity gap between register-transfer level FPGA design and traditional software design. Enabling high-level programming of FPGAs is a critical step in bridging this gap and pushing FPGAs further into the computing space.
In this talk we give an introduction to FPGAs and modern high-level synthesis (HLS) tools. We present case studies which motivate the need for HLS tools, as well as explore their benefits and limitations. We further introduce novel scheduling and mapping algorithms to improve the quality of synthesized designs. - Tuesday, December 2, 2014Department Town Hall
- Tuesday, September 2, 2014
- Tuesday, January 27, 2015The Semantics of ShapeSiddhartha ChaudhuriVisual media surrounds us, and there is growing interest in new applications such as 3D printing and collaborative virtual worlds. As more and more people engage in producing visual content, there is a demand for interfaces that help novice users carry out creative design. Such an interface should allow people to easily and intuitively express high-level design goals, such as "create a cute toy" or "create a comfortable chair", while allowing the final product to be customized according to each person's preferences.
Current interfaces require the design goal to be reached through careful planning and execution of a series of low-level drawing and editing commands -- which requires previsualization, dexterity and time -- or serendipitiously through largely unstructured exploration. The gap between how a person thinks about what he wants to create, and how he can interact with a computer to get there, is a barrier for the novice.
In this talk, I will present recent work on capturing high-level design intent to aid the creative process. The crux of our work is studying the semantic identities of shapes, not just their geometric descriptions. We analyze large classes of three-dimensional objects from three perspectives: structure, attributes and interaction. We study how objects are constructed from components ("chairs combine seats, backs and legs"), how they can be described using continuously varying natural language attributes ("this chair is more elegant than that one"), and how their functional design depends on human interaction ("how we sit on an armchair is different from how we sit on a kitchen chair"). Our work combines probabilistic shape analysis, machine learning and crowdsourcing. The approaches are data-driven: large repositories of existing designs are used to learn shared semantics, and repurposed for synthesizing new designs.
I will conclude with a discussion of directions, opportunities and challenges for new tools for high-level design that exploit the inter-relationship of semantics, function and form. - Tuesday, February 3, 2015Decision theory with resource-bounded agentsJoe HalpernThere have been two major lines of research aimed at capturing resource-bounded players in game theory. The first, initiated by Rubinstein, charges an agent for doing costly computation; the second, initiated by Neyman does not charge for computation, but limits the computation that agents can do, typically by modeling agents as finite automata. We review recent work on applying both approaches in the context of decision theory. For the first approach, we take the objects of choice in a decision problem to be Turing machines, and charge players for the ``complexity'' of the Turing machine chosen (e.g., its running time). This approach can be used to explain well-known phenomena like first-impression-matters biases (i.e., people tend to put more weight on evidence they hear early on) and belief polarization (two people with different prior beliefs, hearing the same evidence, can end up with diametrically opposed conclusions) as the outcomes of quite rational decisions. For the second approach, we model people as finite automata, and provide a simple algorithm that, on a problem that captures a number of settings of interest, provably performs optimally as the number of states in the automaton increases. Perhaps more importantly, it seems to capture a number of features of human behavior, as observed in experiments.
This is joint work with Rafael Pass and Lior Seeman.
No previous background is assumed. - Tuesday, February 10, 2015Cancelled
- Tuesday, February 17, 2015-- February Break --
- Tuesday, February 24, 2015Mutating Matrices from a Gamut of Graphs: A Play in Two ActsDavid BindelIn this talk, I give an overview of two recent results, both of which
feature methods to reason about parametric family of linear systems
from the analysis of different types of networks.
In the first part, I will describe our work on estimating changes to
topology in the bulk power transmission network on the basis of
scattered measurements. Our method compares the signal we actually
see to change predicted under various contingencies, and ranks the
contingencies by how well the prediction matches the observation. The
key technical insight in our approach is that standard updating
formulas can be combined with lower bounds to rule out most
contingencies with a very cheap computation.
In the second part of the talk, I will describe work on model
reduction for fast computation of PageRank for graphs in which the
edge weights depend on parameters. For an example learning-to-rank
application, our approach is nearly five orders of magnitude faster
than the standard approach. This speed improvement enables
interactive computation of a class of ranking results that previously
could only be computed offline. - Tuesday, March 3, 2015Learning from User Interactions through InterventionsThorsten JoachimsThe ability to learn from user interactions can give systems access to unprecedented amounts of world knowledge. This is already evident in search engines, recommender systems, and electronic commerce, and other applications are likely to follow in the near future (e.g., education, smart homes). More generally, the ability to learn from user interactions promises pathways for solving knowledge-intensive tasks ranging from natural language understanding to autonomous robotics.
Learning from user interactions, however, means learning from data that does not necessarily fit the assumptions of the standard machine learning models. Since interaction data consists of the choices that humans make, it has to be interpreted with respect to how humans make decisions, which is influenced by the decision context and constraints like human motivation and human abilities.
In this talk, I argue that we need learning approaches that explicitly model user-interaction data as the result of human decision making.
To this effect, the talk explores how integrating micro-economic models of human behavior into the learning process leads to new learning algorithms that have provable guarantees under verifiable assumptions and to learning systems that perform robustly in practice. These findings imply that the design space of such human-interactive learning systems encompasses not only the machine learning algorithm itself, but also the design of the interaction under an appropriate model of user behavior. - Tuesday, March 10, 2015Cancelled
- Tuesday, March 17, 2015XChange: Scalable Dynamic Multi-Resource Allocation in Multicore ArchitecturesJosé MartínezEfficiently allocating shared on-chip resources across cores is critical to optimize execution in chip multiprocessors (CMPs). Techniques proposed in the literature often rely on global, centralized mechanisms that seek to maximize system throughput. However, global optimization may hurt scalability: as more cores are integrated on a die, the search space grows exponentially, making it harder to achieve optimal or even acceptable operating points at run-time without incurring significant overheads.
In this paper, we propose XChange, a novel CMP resource allocation mechanism that delivers scalable high throughput and fairness. Through XChange, the CMP functions as a market, where each shared resource is assigned a price which changes over time, and each core seeks to maximize its own utility, by bidding for these shared resources. Because each core works largely independently, the resource allocation becomes a scalable, mostly distributed decision-making process. In addition, by distributing the resources proportionally to the bids, the system avoids unfairness, treating each core in an unbiased manner.
Our evaluation shows that, using detailed simulations of a 64-core CMP configuration running a variety of multiprogrammed workloads, the proposed XChange mechanism improves system throughput (weighted speedup) by about 21% on average, and fairness (harmonic speedup) by about 24% on average, compared with equal-share on-chip cache and power distribution. On both metrics, that is at least about twice as much improvement over equal-share as a state-of-the-art centralized allocation scheme. Furthermore, our results show that XChange is significantly more scalable than the state-of-the-art centralized allocation scheme we compare against. - Tuesday, March 24, 2015Games, Learning, and the Price of AnarchyÉva TardosSelfish behavior can often lead to suboptimal outcome for all participants, a phenomenon illustrated by classical examples in game theory, such as the prisoner dilemma . In this talk, we'll consider how to quantify the impact of strategic user behavior on overall performance developed over the last decade. We'll will consider traffic routing as well as online auctions from this perspective, and providing robust guarantees for their performance even when the system is not in equilibrium, assuming participants are using learning strategies to deal with an uncertain environment.
- Tuesday, March 31, 2015-- Spring Break --
- Tuesday, April 7, 2015An overview of CS publication venuesAndrew Myers & Faculty PanelA faculty panel giving an overview of where you might consider publishing your results.
- Tuesday, April 14, 2015Why Teaching Computer Science at the Undergraduate Level is a Worthy ChallengeAli Erkan & Walker WhiteAli Erkan of Ithaca College and Walker White of Cornell University tell us about the importance and practice of teaching positions in computer science. This will include a Q&A session, so come prepared with any questions about careers in undergraduate education!
- Tuesday, April 21, 2015A Fast Compiler for NetKATNate FosterHigh-level programming languages play a key role in a growing number of networking platforms. Languages such as nlog and Pyretic are being used in systems such as VMware NVP and SDX to streamline application development and enable formal reasoning about network behavior. But the use of high-level languages comes with a cost: current compilers can take tens of minutes to generate the forwarding state for the network, even on relatively simple programs and small topologies. This forces programmers to waste time working around performance issues or even revert to using hardware-level APIs.
This talk will presents a compiler pipeline for the NetKAT programming language that is orders of magnitude faster than previous compilers for high-level network languages. The compiler is based on new algorithms that use a generalization of binary decision diagrams as an intermediate representation and symbolic automata to generate optimized forwarding state. It also handles programs that use network-wide features such as regular paths and virtual topologies. I will describe the design and implementation of three essential compiler stages: from local programs (which specify single-switch behavior) to forwarding tables, from global programs (which specify network-wide behavior) to local programs, and from virtual programs (which specify behavior in terms of virtual topologies) to global programs. I will also discuss our implementation and present results from experiments on real-world benchmarks that quantify performance in terms of compilation time and forwarding table size.
Joint work with Steffen Smolka (Cornell), Spiros Eliopoulos (Inhabited Type), and Arjun Guha (UMass). - Tuesday, April 28, 2015ENCAPP: elastic-net-based prognosis and biomarker discovery for human cancersHaiyuan YuWith the explosion of genomic data over the last decade, there has been a tremendous amount of
effort to understand the molecular basis of cancer using informatics approaches. However, this has proven to be
extremely difficult primarily because of the varied etiology and vast genetic heterogeneity of different cancers and
even within the same cancer. One particularly challenging problem is to predict prognostic outcome of the disease
for different patients. Here, we present ENCAPP, an elastic-net-based approach that combines the reference human protein
interactome network with gene expression data to accurately predict prognosis for different human cancers.
Our method identifies functional modules that are differentially expressed between patients with good and bad
prognosis and uses these to fit a regression model that can be used to predict prognosis for breast, colon, rectal,
and ovarian cancers. Using this model, ENCAPP can also identify prognostic biomarkers with a high degree of
confidence, which can be used to generate downstream mechanistic and therapeutic insights - Tuesday, May 5, 2015Cancelled
- Tuesday, January 27, 2015
- Tuesday, August 25, 2015Machine Learning under Resource ConstraintsKilian Quirin WeinbergerResource constraints during runtime are a crucial aspect of real world applications of machine learning. Depending on the application domain, these constraints can appear in many different forms. For example, in medical applications, the average cost per patient must be kept within budget. In search engines, the search results must be returned to the user within a fraction of a second and the overall CPU cost cannot exceed available computing resources. Finally, on mobile devices the available memory is often highly restricted and small energy consumption can be a crucial requirement. To reduce CPU consumption during test-time, we propose cascades and trees of classifiers that extract features on-demand, carefully trading off expected benefit and extraction cost.
For the scenario with active memory constraints I present our most recent deep learning architecture, HashedNets, that exploits inherent redundancy in neural networks to achieve drastic reductions in model sizes. HashedNets uses a low-cost hash function to randomly group connection weights into hash buckets, and all connections within the same hash bucket share a single parameter value. Our hashing procedure introduces no additional memory overhead and shrinks the storage requirements of neural networks substantially while mostly preserving generalization performance. - Tuesday, September 1, 2015Adventures in Computer SecurityThomas RistenpartI work in computer security. In this brown bag lunch I'll introduce myself and my background. I'll then quickly outline the various research topics that I've been working on over the past decade including cryptographic theory, applied cryptography, cloud computing security, privacy, embedded systems security (including playing with quadcopter drones), ML security, passwords, etc. I'll leave plenty of time for Q&A, where we can go deeper on select research topics, and/or about career stuff: what it's like being a junior(ish) professor, being tenacious as a grad student, imposter syndrome, networking in your community, getting prepared for the job market, etc.
- Tuesday, September 8, 2015Cancelled
- Tuesday, September 15, 2015Blockchain: The good, the bad, the composable universe, and everything.Elaine ShiBlockchains represent a new platform for incentive-compatible, distributed computation. I will talk about selected research projects on cryptocurrency and blockchains, and discuss research challenges at the intersection of cryptography, programming languages, and systems.
Additionally, I will announce our new Initiative for Cryptocurrency and Contracts (http://www.initc3.org/). - Tuesday, September 22, 2015Cancelled
- Monday, September 28, 2015The Ring of Gyges: Using Smart Contracts for CrimeAri JuelsThanks to their anonymity (pseudonymity) and lack of trusted intermediaries, cryptocurrencies such as Bitcoin have created or stimulated growth in many businesses and communities. A number of resulting activities, however, are harmful or criminal, including money laundering, marketplaces for illicit goods, and ransomware.
Emerging next-generation cryptocurrencies such as Ethereum will include rich scripting languages in support of *smart contracts*, programs that autonomously intermediate transactions and can consume authenticated data feeds as inputs. We show how these new cryptocurrency environments will enlarge the range of criminal activities that can be achieved with anonymity and minimal trust assumptions and may thus fuel new criminal ecosystems. Specifically, we show how cryptographically secure and incentive-compatible criminal smart contracts can facilitate leakage of confidential information, theft of cryptographic keys, and various real-world crimes (murder, arson, terrorism).
While some contracts for some of these crimes are efficiently realizable in existing scripting languages, others require cryptographic primitives such as succinct non-interactive arguments of knowledge (SNARKs). Today's cryptocurrencies such as Ethereum can in principle support these primitives, but with minimal changes would enable far more efficient implementation. These changes would also benefit benign contracts, and are thus already envisioned by the community.
Joint work with Ahmed Kosba (UMD) and Elaine Shi (Cornell Univ.) - Tuesday, September 29, 2015Situated Learning and Understanding of Natural LanguageYoav ArtziRobust language understanding systems have the potential to transform how we interact with computers. However, significant challenges in automated reasoning and learning remain to be solved before we achieve this goal. To accurately interpret user utterances, for example when instructing a robot, a system must jointly reason about word meaning, grammatical structure, conversation history and world state. Additionally, to learn without prohibitive data annotation costs, systems must automatically make use of weak interaction cues for autonomous language learning.
In this talk, I will present a framework that uses situated interactions to learn to map sentences to rich, logical meaning representations. The approach jointly induces the structure of a complex natural language grammar and estimates its parameters, while relying on various learning cues, such as easily gathered demonstrations and even raw conversations without any additional annotation effort. It achieves state-of-the-art performance on a number of tasks, including robotic interpretation of navigational directions and learning to understand user utterances in dialog systems. Such an approach, when integrated into complete systems, has the potential to achieve continuous, autonomous learning by participating in actual interactions with users. - Tuesday, October 6, 2015FellowshipsNate FosterThis hands-on seminar will discuss some basic strategies for preparing applications for competitive fellowships (NSF, Hertz, Microsoft, Facebook etc.)
- Tuesday, October 13, 2015-- Fall Break --
- Tuesday, October 20, 2015Inverting Human Understanding Models in ContextRoss KnepperRobots behave according to the sense-plan-act loop, in which complex programs react to the content of sensor inputs. The planning process can be thought of as one of inverting various predictive models of physics, uncertainty, and unobservable state. Although robots have long reacted to humans, they largely fail to consider the way their functional actions will be perceived socially by humans, who tend to infer meaning in every action. Consequently, robots inadvertently send a lot of random social signals to humans. In this talk, I describe recent research into how humans perceive robots, how robots can model human understanding, and how robots must invert those models -- along with all the others -- to plan and act effectively on a team with humans.
- Tuesday, October 27, 2015Cloud hosted computing for demanding real-time applications.Ken BirmanMy group is working with a consortium of bulk electric power transmission operators for the Northeastern US on a cloud-based "smart grid" infrastructure. Our long goal is to use the cloud to host machine-learning and optimization technologies, but the scale of the problem forces us to think about how one can build a cloud-scale solution secure enough to support a nationally critical resource, strongly consistent, and seamlessly recoverable after disruption. Today we have a platform running: we call it GridCloud, and in this BB talk I'll describe the main technology. My focus will be on the forms of consistency needed in this kind of system, and how we address those needs in GridCloud.
- Tuesday, November 3, 2015Designing and Building Mobile Technologies for Underserved CommunitiesNicola DellThe goal of my research is to design, build, and evaluate novel computing systems that improve the lives of underserved populations in low-income regions. As computing technologies become affordable and accessible to diverse populations across the globe, it is critical that we broaden the scope of our research to study the social, technical, and infrastructural challenges faced by these diverse communities and build systems that address problems in critical domains such as health care and education. In this talk, I describe my general approach to building technologies for underserved communities, including identifying opportunities for technology, conducting formative research to fully understand the space, developing novel technologies, iteratively testing and deploying, evaluating with target populations, and handing off to global development organizations for long-term sustainability. I focus specifically on two examples of systems that I built to address challenges faced by rural health workers: one that automatically digitizes data from paper forms, and another that automatically interprets diagnostic tests for infectious diseases. Both these systems run on cheap, commercially available mobile devices and use computer vision and machine-learning techniques to automate tasks that were previously tedious or error prone. Through extensive evaluations with target populations in Sub-Saharan Africa, I highlight the potential for novel technological solutions to help new and diverse populations address global challenges.
Bio:
Nicola Dell is an Assistant Professor of Information Science at Cornell Tech in New York City. Her research interests are in information and communication technologies for development (ICTD), human-computer interaction (HCI), and mobile computing with a focus on designing and evaluating systems that improve the lives of underserved populations in low-income regions. Nicki recently completed her Ph.D. in Computer Science and Engineering at the University of Washington in Seattle where she was advised by Gaetano Borriello and Linda Shapiro. At UW CSE she was a member of the Open Data Kit (ODK) research team and she also helped to organize the Change group from 2011-2015. - Tuesday, November 10, 2015How to Give a TalkDavid BindelResearchers give talks to share their knowledge and excitement with colleagues, students, funding agencies, and potential employers. In this talk, I share some broad ideas about what makes a talk "successful", along with more detailed thoughts about the logistics of giving a talk, including thinking about speech and body language, slide designs, presentation technology, and the fine art of answering questions.
- Tuesday, November 17, 2015Department Town HallMembers of the department leadership join us to discuss issues relevant to graduate students and the department.
- Tuesday, November 24, 2015Cancelled
- Tuesday, December 1, 2015Research RoundsKavita Bala & Ken Birman & Hadas Kress-Gazit & Andrew MyersN/A
- Tuesday, August 25, 2015
- Tuesday, February 2, 2016Time management and procrastinationMichael ChenMichael Chen joins us from the Learning Strategies Center at Cornell. Please note the change in starting time.
- Tuesday, February 9, 2016Cancelled
- Tuesday, February 16, 2016-- February Break --
- Tuesday, February 23, 2016The Stealth Challenges of Teaching Computer Science at the Undergraduate LevelAli Erkan & Kyle WilsonWhen interest in Computer Science is on the rise and key courses of the major are over-enrolled, our motivation to explore new and more effective ways of teaching can understandably be diminished. On the contrary, this is in fact the best time for such explorations because we have a chance to work with (and teach) very heterogeneous audiences. In this talk, we will outline some of the associated challenges regarding undergraduate CS education and we will report on one inspirational lesson from the field of Physics. We will also present a few classroom-tested techniques that increase the accessibility of complex ideas so that the fundamental ideas of our discipline disseminate to larger and more diverse groups of learners.
- Tuesday, March 1, 2016Fibers and the appearance of materialsSteve MarschnerMany beautiful materials derive their appearance and mechanical properties from a microstructure composed of fibers. This is true of hair, fur, cloth, and even wood. This talk will discuss recent research on modeling, simulation, and rendering that applies to these materials, which shares the theme of modeling and measuring structure to get the right appearance. In hair, the key problem is to model the complex patterns formed by light interacting with individual transparent fibers, which are caused by the small-scale structure of the fibers and their surfaces. Smaller fibers packed together into a solid give rise to the varied appearance of wood, which reflects the three-dimensional anatomy trees, leading to distinctive grain and figure on cut and polished surfaces. The final topic is a new technique for realistically rendering textiles using volume data originating from micro CT scans of small samples of cloth.
- Tuesday, March 8, 2016Cancelled
- Tuesday, March 15, 2016-- Visit Day --
- Tuesday, March 22, 2016TBA
- Tuesday, March 29, 2016-- Spring Break --
- Tuesday, April 5, 2016Networked Human ComputationHaym HirshOnline crowdsourcing resources such as Amazon Mechanical Turk have made it possible to write programs that call on human labor as if they were subroutines, with people performing tasks that humans are better than computers at performing. Just as we might write better algorithms for parallel computers by being aware of the underlying parallel architecture, we can write better "human computation" algorithms by being aware of relevant aspects of the human cognitive architecture. I'll discuss examples of how results in cognitive and social psychology are informing the design of human computation algorithms, and how we can nonetheless bring computer science sensibilities to the design of such systems. This is an emerging area with numerous unexplored questions, and I'll discuss a number of promising open directions for research in this area.
- Tuesday, April 12, 2016Cancelled
- Tuesday, April 19, 2016PyMTL and Pydgin: Python Frameworks for Highly Productive Computer Architecture ResearchChristopher BattenHardware specialization is an increasingly common technique to enable improved performance and energy efficiency in spite of the diminished benefits of technology scaling. Exploring hardware specialization requires a vertically integrated research approach spanning applications, compilers, run-times, instruction set design, microarchitectures, and VLSI implementation. In this talk, I will describe PyMTL and Pydgin, two new Python-based frameworks designed to improve the productivity of vertically integrated computer architecture research. PyMTL is a hardware modeling framework for vertically integrated computer architecture research. The PyMTL framework encourages a philosophy of "modeling towards layout" in which a microarchitecture is incrementally refined from a high-level functional-level model, to a timing-approximate cycle-level model, to a bit-accurate RTL implementation. PyMTL is particularly well-suited for rapid design space exploration of microarchitectures for novel accelerators, specialized coprocessors, or any design proposal that could benefit from the additional credibility provided by an RTL implementation. Pydgin is a framework for rapidly developing instruction-set simulators (ISSs) from a Python-based architecture description language. Pydgin creatively adapts existing meta-tracing JIT compilation frameworks designed for general-purpose dynamic programming languages to automatically generate ISSs augmented with dynamic binary translation. Pydgin is suitable for generating very fast ISSs for general-purpose instruction sets, but is particularly well-suited for exploring the hardware/software abstraction of emerging specialized architectures.
Bio: Christopher Batten is an Assistant Professor in the School of Electrical and Computer Engineering at Cornell University, where he leads a research group focusing on energy-efficient parallel computer architecture for both high-performance and embedded applications. His work has been recognized with several awards including an AFOSR Young Investigator Program award (2015), Intel Early Career Faculty Honor Program award (2013), an NSF CAREER award (2012), a DARPA Young Faculty Award (2012), and an IEEE Micro Top Picks selection (2004). His teaching has been recognized with a Michael Tien '72 Excellence in Teaching Award (2013) and a James M. and Marsha D. McCormick Award for Outstanding Advising of First-Year Engineering Students (2013). Prior to his appointment at Cornell, Batten received his Ph.D. in electrical engineering and computer science from the Massachusetts Institute of Technology in 2010. He received an M.Phil. in engineering as a Churchill Scholar at the University of Cambridge in 2000, and received a B.S. in electrical engineering as a Jefferson Scholar at the University of Virginia in 1999. - Tuesday, April 26, 2016Getting the Most Out of Academic ConferencesThomas RistenpartAcademic conferences in computer science play a central role in your career in research. In this talk I'll relay some ideas about how to get the most out of conferences, including giving talks, networking, and learning to navigate your chosen research community.
- Tuesday, May 3, 2016A general way to diagnose type errors in expressive type systemsAndrew MyersRich type systems promise to improve software reliability and security. But type checkers often give terrible error messages, and the more sophisticated the type system, the worse the problem. We show that in a variety of languages, including the highly expressive type system implemented by the Glasgow Haskell Compiler (GHC)--with type classes, GADTs, and type families--it is possible to identify the _most likely source_ of the type error, rather than the _first source_ that type inference trips over. To determine the likely error sources, we apply a simple Bayesian model to a graph representation of the typing constraints; the satisfiability or unsatisfiability of paths within the graph provides evidence for or against possible explanations. Using a large corpus of Jif, OCaml, and Haskell programs, we show that this error localization technique is general and practical and significantly improves accuracy over the state of the art.
- Tuesday, May 10, 2016Machine Learning and Privacy: Friends or Foes?Vitaly ShmatikovMachine learning is eating the world. Modern machine learning methods, especially deep learning based on artificial neural networks, rely on the training data collected from millions of users to achieve unprecedented accuracy and enable powerful AI-based services.
In this talk, I will discuss the complex relationship between machine learning and digital privacy. This includes new threats, such as adversarial use of machine learning to recover hidden user data, and new benefits, such as privacy-preserving machine learning that protects the confidentiality of training data while constructing accurate models.
- Tuesday, February 2, 2016
- Tuesday, August 23, 2016Context and Non-compositional Phenomena in Language UnderstandingYoav ArtziSentence meaning can be recovered by composing the meaning of words following the syntactic structure. However, robust understanding requires considering non-compositional and contextual cues as well. For example, a robot following instructions must consider its observations to accurately complete its task. Similarly, to correctly map temporal expressions within a document to standard time values, a system must consider previously mentioned events. In this talk, I will address such phenomena within compositional approaches, and focus on the non-compositional parts of the reasoning process. I will also review some of our ongoing research in this space.
- Tuesday, August 30, 2016Language and Social DynamicsCristian Danescu-Niculescu-MizilMore and more of life is now manifested online, and many of the digital traces that are left by human activity are in natural-language format. In this talk I will show how exploiting these resources under a computational framework can bring a new understanding of online social dynamics; I will be discussing three of my efforts in this direction.
The first project explores the relation between users and their community, as revealed by patterns of linguistic change. I will show that users follow a determined life-cycle with respect to their susceptibility to adopt new community norms, and how this insight can be harnessed to predict how long a user will stay active in the community.
The second project proposes a computational framework for identifying and characterizing politeness, a central force shaping our communication behavior. I will show how this framework can be used to study the social aspects of politeness, revealing new interactions with social status and community membership.
I will conclude by showing that conversational patterns can be predictive of the future evolution of a dyadic relationship. In particular, I will characterize friendships that are unlikely to last and examine temporal patterns that foretell betrayal in the context of the Diplomacy strategy game.
This talk includes joint work with Jordan Boyd-Graber, Dan Jurafsky, Srijan Kumar, Jure Leskovec, Vlad Niculae, Christopher Potts, Moritz Sudhof and Robert West. - Tuesday, September 6, 2016Why are today's computing systems so far away from their fundamental performance limits?Rachit AgarwalFundamental performance limits of a computing system can be characterized using lower bounds on algorithms running atop these systems, and the physical limits of the underlying hardware. This talk will explore two questions that make my neurons fire relentlessly: Why is the performance of today's computing systems so far away (typically 10x, and often more than 100x) from their fundamental limits? How can we bridge this gap?
Answering the above questions often requires understanding complex interactions between various components of a computing system --- algorithms, software, and hardware (memory, CPU, networks). I will discuss my research in these areas (and at their boundaries) that aims to resolve the above two questions. - Tuesday, September 13, 2016Fellowship Application WorkshopRoss KnepperI will describe why PhD students may want to apply to competitive fellowships and give advice on how to maximize the chances of having a successful application. I will also demonstrate how the skills developed in writing fellowship applications will be important throughout your time in grad school and beyond.
Recommended reading before the seminar: "Good Writing" by Marc Raibert -- https://www.cs.cmu.edu/~pausch/Randy/Randy/raibert.htm - Tuesday, September 20, 2016Query optimization for data analysisImmanuel TrummerBusiness and industry and almost all scientific disciplines rely nowadays on large-scale data analysis. Some of the most popular analysis tools offer declarative interfaces where users simply describe the data they need instead of specifying how to generate it. In order to enable such interfaces, we must however solve the NP-hard query optimization problem.
In this talk, I will give an overview of my recent work on query optimization. I will explain how the specific context of large-scale data analysis motivates novel problem variants that are particularly hard to solve. Then I will show how techniques such as approximation, parallelization, and pre-processing make solving those problems under real-time constraints practical. I will quickly cover recent results of a collaboration with NASA in which we used a D-Wave 2X adiabatic quantum computer for solving certain query optimization variants. Finally, I will discuss a project on large-scale text mining and machine learning at Google Mountain View that could benefit from all proposed techniques.
At the end of the talk, I will give an overview of future and ongoing projects in the database group.
Biography: Immanuel Trummer is assistant professor for computer science at Cornell University. His research led to various publications at the main database conferences, his publications were selected for the ACM SIGMOD Research Highlight Award, for "Best of VLDB", and for publication in CACM as CACM Research Highlight. He is recipient of the European Google PhD Fellowship in structured data analysis and alumnus of the German National Academic Foundation ("Studienstiftung des deutschen Volkes"). - Tuesday, September 27, 2016Developing Robots for Fluent Collaboration and CompanionshipGuy HoffmanWithin the next decade, personal robots are expected to enter our homes, offices, schools, hospitals, construction sites, and workshops. For these robots to play a successful role in people's professional and personal lives, they need to display the kind of efficient and satisfying interaction that humans are accustomed to from each other. Developing this human-robot interaction is a multifaceted challenge, balancing requirements of the robot's intelligent behavior, physical form, and mechanical structure.
In this talk I present the development of several robotic systems, combining methods from Artificial Intelligence, Design, and Human-Computer Interaction. All three research paths share the same underlying principles: Movement, timing, and embodiment. In terms of AI, I introduce the notion of human-robot fluency - the ability to accurately mesh the robot's activity with that of a human partner. I present computational cognitive architectures rooted in timing, joint action, and embodied cognition. Specifically, I discuss anticipatory action for collaboration, and a model of priming through perceptual simulation. I then describe an interactive robotic improvisation system that uses embodied gestures for simultaneous, yet responsive, joint musicianship.
In terms of the robot's physical form, I use techniques from 3D character animation, sculpture, industrial, and interaction design. Dynamic gestures and behaviors drive decisions on the robot's surface and mechanical design, and are then combined with aesthetic and functional requirements to settle on the robot's form and structure. The third pillar of my work is the experimental study of people interacting with robots. My lab developed a series of low-cost smartphone-based robots, which we use in situations of disclosure, conflict, compliance, and joint experiences. Our studies investigate the role of movement, timing, and nonverbal behavior in the social relationship between humans and robots, in an effort to design robots that don't threaten, but enhance people's everyday lives. - Tuesday, October 4, 2016Building Verifiably Secure Multi-Core Processors with Applications to High-Assurance Self-Driving CarsEd SuhThis talk will discuss how static information flow analysis can be used to design a computing system with comprehensive and verifiable information flow assurance, and how such a system may be leveraged in the context of a self-driving car to protect safety-critical functions. In particular, the talk will focus on designing verifiably secure multi-core hardware and introduce an extension to today's hardware design language, named SecVerilog. SecVerilog enables designers to statically analyze information flow at the hardware level. Our prototyping experiences show that SecVerilog can be used to formally verify traditional software isolation properties in access control architectures such as ARM TrustZone. Combined with careful redesigning of multi-core architecture, SecVerilog also enables strong timing isolation necessary to guarantee real-time deadlines. The verified hardware can then be leveraged by software to provide strong isolation guarantees for safety-critical components. The talk will briefly discuss how we are applying this protection approach to provide collision avoidance guaratnees for Cornell's autonomous driving vehicle.
- Tuesday, October 11, 2016-- Fall Break --
- Tuesday, October 18, 2016Programming Intelligent AssistantsAdrian SampsonIntelligent user interfaces are taking over the world. Chatbots, voice assistants, recommendation systems, and proactive suggestion generators all exploit machine learning to guess the user's intent. But implementing an intelligent UI exposes programmers to new kinds of pitfalls and bugs that do not exist in traditional software engineering.
This talk is about an early-stage research project that addresses the problems in intelligent system design using programming language abstractions. It's a collaboration between NLP, PL, and systems researchers. We've designed a language embedded in JavaScript that lets domain experts apply machine learning without becoming ML experts themselves. The language's core concept is ambiguity: the programmer specifies the space of resolutions for the ambiguity in user input, and ML algorithms search that space.
We've demonstrated the programming language by showing how it simplifies the construction of tasks for natural-language chatbots. - Tuesday, October 25, 2016Rethinking Internet-Scale ConsensusElaine ShiTraditionally, consensus protocols were typically deployed in controlled environments: for example, a company like Google may deploy consensus protocols to replicate critical services such as Google Wallet. The deployment is typically small scale, and nodes are inter-connected with fast internal networks.
New cryptocurrencies such as Bitcoin and Ethereum have pushed the deployment of consensus protocols to a newer height. The community's common wisdom is that classical consensus protocols are *not* "robust" enough for Internet-scale deployment, although few have clearly articulated what "robustness" exactly means.
In this talk, we will explore what robustness means for Internet-scale consensus, and how to design more robust consensus protocols for these Internet-scale applications. - Tuesday, November 1, 2016Fast Fingerprints for Power System EventsDavid BindelIn order to operate the power grid aggressively enough to make full use of renewable power, operators need new tools for situational awareness and control. Phasor measurement units (PMUs) have been developed for the past thirty years, but first saw wide-scale production grade deployments in the US after DOE investments funded by the American Recovery and Reinvestment Act of 2009. PMUs report voltage and current phasors thirty or more times every second, promising operators a real-time picture of the state of the grid -- but only with systems and algorithms that transmit the data and analyze this information at similar rates. In this talk, we describe fast analysis using PMU-sensed "fingerprints" of different types of system events (e.g. changes in line status or reconfiguration of substations). Our system, FLiER (Fingerprint Linear Estimation Routine) identifies system changes in close to real time through a novel filtering operation that lets us discard most potential events from consideration with little computation. We describe the elements of our approach, as well as giving an overview of work in progress to improve the quality of our results (and the range of contingencies we can handle) by monitoring the frequency content of transient "ringing" as the system passes from one state to another.
- Tuesday, November 8, 2016Writing a Research StatementBobby KleinbergWriting a Research Statement
- Tuesday, November 15, 2016Verbal behavior without syntactic structures: beyond Skinner and ChomskyShimon EdelmanWhat does it mean to know language? Since the Chomskian revolution, the textbook answer to this question has been: to possess a generative grammar that exclusively licenses certain syntactic structures. Decades later, not even an approximation to such a grammar, for any language, has been formulated; the idea that grammar is universal and innately specified has proved barren; and attempts to show how it could be learned from experience invariably come up short. To move on from this impasse, we must rediscover the extent to which language is like any other human behavior: dynamic, social, multimodal, patterned, and purposive, its purpose being to promote desirable actions (or thoughts) in others and self. Recent psychological, computational, neurobiological, and evolutionary insights into the shaping and structure of behavior may then point us toward a new, viable account of language.
- Tuesday, November 22, 2016Changes to Evaluation Critiera for Faculty Hiring and PromotionFred SchneiderThe Computing Research Association (CRA) comprises all Computer
Science Ph.D. granting departments in North America and many
industrial labs. CRA provides a forum for discussions about the health
of our field, and it represents computing research to congress and the
executive branch.
In Febrary 2015, CRA issued a "Best Practices Memo" that advocates for
changes to the way that publications are evaluated for hiring and
promotion. The 3-page memo is the product of a blue-ribbon committee,
who met and deliberated over a 2 year period. We discuss what the
memo says, give a historical context, and speculate about the future.
The memo is available on-line, and you'll want to look it over before
the meeting:
http://cra.org/wp-content/uploads/2016/02/BP_Memo.pdf - Tuesday, November 29, 2016Construction by Robot CollectivesKirstin H. PetersenIn robot collectives, interactions between large numbers of simple agents lead to complex global behaviors. A great source of inspiration is social insects, where thousands of individuals coordinate to handle advanced tasks like nest construction in a remarkably scalable and error tolerant manner. Likewise, robot collectives have the ability to address tasks beyond the reach of single robots, and promise more efficient parallel operation and greater robustness due to redundancy. Key challenges involve both control and physical implementation. In this seminar I will discuss an approach to such systems relying on embodied intelligent robots designed as an integral part of their environment, where passive mechanical features replace the need for complicated sensors and control. I will discuss three systems; the first can assemble three-dimensional structures according user-specified shapes; the others build structures according to user-specified functionality. This work advances the aim of robot collectives that achieve human-specified goals, using biologically-inspired principles for robustness and scalability.
- Monday, December 5, 2016
- Tuesday, August 23, 2016
- Tuesday, January 31, 2017TBA
- Tuesday, February 7, 2017Learning with Big Messy DataMadeleine UdellModern data sets are often big and messy: they may feature a mixture of real, boolean, ordinal, and nominal values; and often many (or even most) of the values of interest are missing.
My research centers on exploiting structure in big messy data sets to infer missing data, detect patterns, speed up optimization, and promote better decisions.
As a case study, this talk will introduce Generalized Low Rank Models (GLRMs), a class of optimization problems designed to uncover structure in big messy data sets.
These models generalize many well known techniques in data analysis, such as (standard or robust) PCA, nonnegative matrix factorization, matrix completion, and k-means.
We'll discuss use GLRMs to impute missing values; to design recommender systems; to and to perform dimensionality reduction; all in a setting with heterogeneous and missing data.
The resulting optimization problems often have millions or even billions of parameters.
We'll discuss efficient optimization techniques for these problems, and will conclude with a discussion of outstanding challenges and open problems in this area. - Tuesday, February 14, 2017Optimizing Shared Vehicle Systems (or how I learnt to stop worrying and love surge pricing)Sid Banerjee(UNUSUAL SCHEDULED TIME) Shared vehicle systems, such as those for bike-sharing (e.g., Citi Bike in NYC, Velib in Paris), car-sharing (e.g., car2go, Zipcar) and ride-sharing (Uber, Lyft, etc.) are fast becoming essential components of the city life. The technology behind these platforms enable fine-grained monitoring and control tools, including good demand forecasts, accurate vehicle-availability information, and the ability to do dynamic pricing and vehicle repositioning. However, with great technology comes great complexity, and as a result optimizing the operations of such systems is challenging. I will talk about some of my work in designing 'primetime pricing' at Lyft.
- Tuesday, February 21, 2017-- February Break --
- Tuesday, February 28, 2017Panel on how to get a faculty job -- Part I: Before You ApplyAdrian Sampson & Rachit AgarwalThe speakers each share their experience on effective practices in the years leading up to applying for a faculty job. Most of the session will be a discussion/Q&A format. Students are encouraged to come prepared with questions. Topics in Part I include building your portfolio, networking, giving strategic talks, and what your advisor should be doing to help. This Brown Bag seminar is targeted at Ph.D. students of all years, though the later parts of this sequence will focus on the 3rd and 4th years.
- Tuesday, March 7, 2017TBA
- Tuesday, March 14, 2017Cancelled
- Tuesday, March 21, 2017-- Visit Day --
- Tuesday, March 28, 2017How do we build models that learn?Andrew WilsonTo answer scientific questions, and reason about data, we must build models and perform inference within those models. But how should we approach model construction and inference to make the most successful predictions? How do we represent uncertainty and prior knowledge? How flexible should our models be? Should we use a single model, or multiple different models? Should we follow a different procedure depending on how much data are available?
In this talk I will present a philosophy for model construction, grounded in probability theory. I will exemplify this approach for human learning, scalable kernel learning, and deep learning. - Tuesday, April 4, 2017-- Spring Break --
- Tuesday, April 11, 2017Reflections on ReplicationRobbert van RenesseReplication remains a hot topic in distributed systems. Within managed settings such as data centers there is a renewed interest in replication protocols for the fail-stop model. In settings with multiple administrative domains, there is renewed interest in Byzantine replication protocols. I will touch on various related projects I have been involved in recently.
- Tuesday, April 18, 2017Panel on How to Get a Faculty Job -- Part II: the Application ProcessRoss Knepper & Adrian Sampson & Rachit AgarwalThe speakers each share their experience on effective practices in the months of the faculty job application process. Most of the session will be a discussion/Q&A format. Students are encouraged to come prepared with questions. Topics in Part II include deciding where to apply, how to write effective research and teaching statements, what to put in a cover letter, networking, giving strategic talks, and what your advisor should be doing to help. This Brown Bag seminar is targeted at PhD students in their 3rd through 5th years, although everyone is welcome.
- Tuesday, April 25, 2017TBA
- Tuesday, May 2, 2017I Know What You Did Last Summer... In the CloudChristina DelimitrouCloud providers routinely schedule multiple applications per physical host to increase efficiency. The resulting interference on shared resources often leads to performance degradation and, more importantly, security vulnerabilities. Interference can leak important information ranging from a service's placement to confidential data, like private keys.
In this talk I will discuss Bolt, a practical runtime system that accurately detects the type and characteristics of applications sharing a cloud platform based on the interference the adversary sees in shared resources. In a multi-user study on EC2, Bolt correctly identifies the characteristics of 385 out of 436 diverse workloads. Extracting this information enables a wide spectrum of previously-impractical cloud attacks, including denial of service attacks (DoS) that increase tail latency by 140x, as well as resource freeing (RFA) and co-residency attacks. Finally, I will discuss the role advanced isolation mechanisms can play in countering such attacks, and I will show that while helpful, they are insufficient to completely eliminate them. - Tuesday, May 9, 2017TBA
- Tuesday, January 31, 2017
- Tuesday, August 22, 2017Situated Language Understanding with Visual ObservationsYoav ArtziAn agent following instructions requires a robust understanding of language and its environment. This talk will be divided to two parts. In the first part, I will propose a neural network model for mapping instructions to actions. The model jointly reasons about instructions and raw visual input obtained from a camera sensor. Training uses reinforcement learning in a few-samples regime with reward shaping to exploit training data. This approach does not require intermediate representations, planning procedures, or training different models for visual and language reasoning. In the second part, I will present a new visual reasoning language dataset, containing natural statements grounded in synthetic images. The data demonstrates a broad set of linguistic phenomena, requiring visual and set-theoretic reasoning. The data contains 92K examples and demonstrates a challenging task for state-of-the-art methods.
The research presented in the first part is led by Dipendra Misra, and the research in the second part is led by Alane Suhr. - Tuesday, August 29, 2017Sparse Representations and Fast Algorithms in Computational Quantum ChemistryAnil DamleIn this talk we explore the question of how to build localized basis functions for a subspace arising in Kohn-Sham Density Functional Theory (KSDFT). This includes a brief introduction to KSDFT and discussion of the computational benefits of working with localized orbitals. Our methodology provides a simple, robust, and efficient means for constructing localized basis functions based on a column-pivoted QR factorization (QRCP). Importantly, our methods avoid the use of an optimization procedure and hence have no dependence on an initial guess for the localized basis. Finally, we discuss recently developed algorithms that significantly accelerate the method by avoiding explicit computation of a large QRCP.
- Tuesday, September 5, 2017Building Smart Memories and Cloud Services with DerechoKen BirmanThe Derecho platform was created to support a new generation of Internet-of-Things applications with online machine-learning components. At cloud-scale, such applications require us to build smart memory systems. I’m using this term to refer to a customizable service designed to accept high-bandwidth data pipelines from sources, able to apply machine-learning tools to analyze and understand received content, and offering ways to query the resulting knowledge base with minimal delay. Such services would also need to scale out, yet must maintain their rapid responsiveness and strong consistency.
Derecho, which is now fully implemented (github.org/Derecho-Project), leverages persistent memory and RDMA to solve this problem with exceptional performance and scalability. Derecho is also interesting from a theoretical perspective. In particular, the core protocols used implement Paxos state machine replication in a novel manner optimized for RDMA settings. These protocols have been proved correct, and are also optimal in terms of delay before message delivery, progress during failures and even the mapping to RDMA hardware.
Derecho is an ongoing activity (and there are many open questions that could be explored if additional students become involved). The current version was built primarily by Sagar Jha, Jonathan Behrens, Matt Milano, Weijia Song and Edward Tremel. On the faculty side, the main people have been myself and Robbert van Renesse. - Tuesday, September 12, 2017Fellowship Application WorkshopRoss KnepperI will describe why PhD students may want to apply to competitive fellowships and give advice on how to maximize the chances of having a successful application. I will also demonstrate how the skills developed in writing fellowship applications will be important throughout your time in grad school and beyond.
Recommended reading before the seminar: "Good Writing" by Marc Raibert -- https://www.cs.cmu.edu/~pausch/Randy/Randy/raibert.htm - Tuesday, September 19, 2017Deep Learning on a Diet: Reducing the Supervision Required for Visual RecognitionBharath HariharanOver the past half-decade, computer vision researchers have found an extremely effective way of building visual recognition systems: pre-train a convolutional network on a massive labeled dataset such as ImageNet, and then fine-tune this convolutional network on the visual recognition task of interest using another massive labeled dataset. Both stages of this two-step pipeline require hundreds of thousands, if not millions, of labeled images.
This effectively puts visual recognition out of the reach of anyone except Google, Facebook and their ilk. It also is in stark contrast to the ability of humans to build their vision systems with essentially no labels at all.
How can we get computers to be as label-efficient as humans? How can we get computers to build visual representations from unlabeled data? How can we get computers to generalize to new tasks or visual concepts without annotations? This talk will provide some answers to these questions, and some directions for future research. - Tuesday, September 26, 2017Verifying Network Data PlanesNate FosterP4 is a new language for programming network data planes. The language provides domain-specific constructs for describing the input-output formats and functionality of packet-processing pipelines. Unfortunately P4 programs can go wrong in a variety of interesting and frustrating ways including reading uninitialized data, generating malformed packets, and failing to handle exceptions. In this talk, I will present the design and implementation of p4v, a tool for verifying P4 programs. The tool is based on classic software verification techniques (due to Hoare, Dijkstra, Flanagan, Leino, etc.), but adds several important innovations: a novel mechanism for incorporating control-plane assumptions and domain-specific optimizations, both of which are needed to scale up to large programs. I will discuss our experiences applying p4v to a variety of real-world programs including switch.p4, a large program that implements the functionality of a conventional switch.
p4v is joint work with Bill Hallahan (Yale), JK Lee (Barefoot), Cole Schlesinger (Barefoot), Steffen Smolka (Cornell), Robert Soule (Barefoot and USI), and Han Wang (Barefoot). - Tuesday, October 3, 2017Title: Algorithms for Multirobot and Human-robot InteractionRoss KnepperWhen building distributed approaches for multirobot systems, designers typically take control of most aspects of the system (communication methodologies, protocols, algorithms). In contrast, when designing multirobot systems to work together with humans, we are forced to adopt the standards implicitly defined by human social norms. In this talk, I illustrate how to build human-inspired distributed multirobot algorithms through the example of pedestrian social navigation.
Social navigation requires a robot to exhibit many strengths, from perceiving the intentions of others through social signals to acting clearly to convey intent. It is made more difficult by the presence of many individual people with their own agendas as well as by the fact that all communication and coordination occurs implicitly through social signaling (chiefly gross body motion, eye gaze, and body language). Furthermore, much of the information people glean about one another's intentions is derived from the social context. For example, office workers are more likely to be heading towards the cafeteria if it is lunchtime and towards the exit if it is time to go home.
In addition to exploring some of the mathematical tools that allow us to tease apart the problem of social navigation, I also briefly touch on several other new projects that have potential openings in my lab. These topics include persistent autonomy (how to make robots act with self-sufficiency for long periods of time), and collaborative assembly (how can groups of people and robots work together to build structures). - Tuesday, October 10, 2017-- Fall Break --
- Tuesday, October 17, 2017Accelerating Machine Learning with Fast Stochastic AlgorithmsChristopher De SaAs machine learning applications become larger and more widely used, there is an increasing need for efficient systems solutions. The performance of essentially all machine learning applications is limited by bottlenecks, such as parallelizability and memory bandwidth, with effects that cut across traditional layers in the software stack. The key property that helps us address these bottlenecks is the fact that machine learning problems are statistical and thus have some built-in error tolerance: this gives us additional degrees of freedom that we can use when designing and optimizing machine learning algorithms. To use these extra degrees of freedom effectively, we need to develop techniques that can leverage noise-tolerance to increase the throughput of our systems, while provably having little effect on their accuracy.
In practice, there is a broad class of algorithms, stochastic iterative algorithms, that often determine the performance of machine learning systems. In this talk, I will describe several methods that can be applied to speed up stochastic iterative algorithms in a principled way by using high-level structural information about a problem. I will also discuss future research directions, including a new approach to getting highly accurate solutions while mostly using energy-efficient low-precision computation. - Tuesday, October 24, 2017SE(3) Research Group OverviewSerge BelongieI'll be giving a sampling of my group's research, including fine grained visual categorization, perceptual embedding, and mixed reality.
- Tuesday, October 31, 2017Let's Fix OpenGLAdrian SampsonFrom windowing systems to virtual reality, real-time graphics code is ubiquitous. Programming models for constructing graphics software, however, have largely escaped the attention of programming languages researchers. This talk introduces the programming model of OpenGL, a ubiquitous API for real-time graphics applications, through a language-oriented lens (no prior graphics knowledge is assumed). It highlights six broad problems with the programming model and connects them to traditions in PL research. The issues range from classic pitfalls, where established thinking can apply, to new open problems, where novel research is needed. Finally, I will introduce an experimental programming language based on multi-stage programming that can address some of the problems with traditional APIs.
- Tuesday, November 7, 2017Database Group Research OverviewImmanuel TrummerIn this talk, I will give a broad overview of recent and ongoing work in my group:
Data Vocalization: most prior research on how to optimally present data to users focuses on data visualization. The communication between user and computer is however more and more shifting towards voice-based interfaces, evidenced by devices and services such as Google Home, Amazon Echo, or Apple's Siri. This motivates research on "data vocalization", i.e. how to optimally transmit data via voice output. I will describe results from a recent publication, in which we introduce the problem field of data vocalization, as well as several ongoing projects.
Query Optimization: the goal of query optimization is to translate declarative queries into optimal executable query plans. Query optimization is an NP-hard optimization problem which makes it difficult to find optimal plans for large queries. I will give an overview of our recent results on leveraging integer programming solvers to solve query optimization instances within seconds where traditional optimizers would need weeks of optimization time. I will also describe ongoing work on leveraging reinforcement learning to replace traditional cost models in query optimization.
Fact Checking: relational data sets are often published together with text articles, summarizing key statistics. The majority of the population never accesses raw data but relies on summaries alone. This raises the question of how we can trust such summaries to be accurate. I will describe our ongoing work on the "FactChecker", a novel tool, similar in spirit to a spell checker, that supports authors in creating accurate data summaries. The FactChecker translates textual claims into SQL queries and displays, via markup, whether evaluation results match the values claimed in text. Using the current version, we were already able to identify erroneous claims in articles from several major newspapers. - Tuesday, November 14, 2017Machine Learning and Privacy: Friends or Foes?Vitaly ShmatikovMachine learning is setting the world on fire, but what does this imply for the privacy of the data used to train all these amazing ML models? I will talk about the surprising connections between ML and data privacy, including how to steal sensitive data from trained ML models and what it might mean for ML to "preserve" privacy.
- Tuesday, November 21, 2017Five New Programming Models for Trustworthy ComputingAndrew MyersComputing systems keep growing bigger, more complex, and more critical. Modern applications are geo-distributed and concurrent and are built on complex distributed and cryptographic protocols. Increasingly they are integrated across organizational boundaries and with blockchain. Building these applications keeps getting harder, but programmers aren't becoming any smarter. Something has to give.
Our philosophy is that developers should have a higher-level programming model, and the compiler and run-time system should figure out how to map high-level programs onto the available resources for computing and storage, as automatically as possible. Here are some of the challenging problems that arise:
* Simple, high-performance abstractions for stateful distributed programming
We are exploring new abstractions for transactional programming that offer strong atomicity and isolation guarantees but don't pay the huge latency penalties of traditional transactions. The core insight is to only enforce the consistency applications really need, which the programming model can expose.
* Compiling to advanced crypto primitives
Security-typed program code appears to provide the information needed to automatically generate various cryptographic primitives in the back end of the compiler. These include commitment, zero-knowledge proofs, multiparty computation, and homomorphic encryption. Programming gets easier and programmers don't have to worry about making mistakes using crypto.
* A new language for blockchain programming
Blockchain systems are fully decentralized and involve interaction of code - "smart contracts" - and data from distrusting players. Existing programming models don't help us build smart contracts correctly or figure out how to interface them to off-chain applications. We've used security type systems to build secure decentralized code before, but smart contracts create new challenges.
* Secure hardware
When security of the whole stack is the goal, hardware matters too. We are developing new secure processors using new hardware description languages that let designers verify lack of timing channels and other vulnerabilities. A new challenge is how to design operating systems that virtualize these strong hardware-level protection mechanisms.
* Secure cyberphysical systems
Messy, complex, and dangerously buggy software is controlling vehicles and other safety-critical systems. We are exploring new programming models for these systems based on controlling information flow at the software and hardware levels. We have two robots running our code already! - Tuesday, November 28, 2017Department Town Hall
- Tuesday, August 22, 2017
- Tuesday, January 30, 2018Cancelled
- Tuesday, February 6, 2018Borel Coalgebras and Non-wellfounded LogicDexter KozenI will introduce Borel coalgebras and Borel automata as a computational approach to basic descriptive set theory. We show that over any Polish space, Borel automata accept exactly the coanalytic sets, and total Borel automata (those that halt on all inputs) accept exactly the Borel sets. The latter result is a computational version of the Kleene--Suslin theorem. The ordinal rank of a Borel set is characterized as the running time of a Borel automaton. We show how these ideas lead to a general notion of non-wellfounded logic in which syntactic objects such as terms and formulas are elements of a final coalgebra. We relate these notions to the categorical theory of recursion schemes (Adamek, Milius, and Velebil 2006, Milius and Moss 2006) to provide a foundation for non-wellfounded logic.
- Tuesday, February 13, 2018Cancelled
- Tuesday, February 20, 2018-- February Break --
- Tuesday, February 27, 2018Cancelled
- Tuesday, March 6, 2018How Do We Build Models that Learn and Generalize?Andrew WilsonTo answer scientific questions, and reason about data, we must build models and perform inference within those models. But how should we approach model construction and inference to make the most successful predictions? How do we represent uncertainty and prior knowledge? How flexible should our models be? Should we use a single model, or multiple different models? Should we follow a different procedure depending on how much data are available?
In this talk I will present a philosophy for model construction, grounded in probability theory. I will then discuss recent works in my group that exemplify this philosophy: (1) constant-time predictive distributions for Gaussian processes ; (2) probabilistic word embeddings; (3) our just-released paper on loss surfaces, mode connectivity, and fast ensembling of deep neural networks. - Tuesday, March 13, 2018CS PhD Requirements ReviewBobby KleinbergA committee of faculty and staff in the CS field is meeting this spring to review the Ph.D. requirements and potentially recommend revisions to the requirements. Your input to this process is vitally important! The goal of this brown bag lunch is to hear about some of your views on CS Ph.D. requirements at Cornell. We are interested in hearing not only about what aspects of the existing requirements are or aren't working well, but also your views on what values and priorities our system of requirements should reinforce.
As context for the discussion, there are certain requirements that are mandated by the Graduate School (residency, special committee, minor, A exam, B exam) and we don't have the power to change those requirements. The requirements mandated by the CS Field are competency, breadth, project, and teaching. These are detailed at http://www.cs.cornell.edu/phd/requirements. - Tuesday, March 20, 2018-- Visit Day --
- Tuesday, March 27, 2018Avoiding the Familiarity Trap: What's Mundane to a CS Ph.D. is not Mundane to a CS MajorAli ErkanAs one teaches a course multiple times, topics that initially appear to be novel (with respect to teaching) start to feel mundane. It is important for the instructor not to be a victim to this trap, especially for the sake of student-learning; just like excitement is contagious, so is apathy. One way to deal with this problem is to poke around even in familiar territory. Sometimes neat things pop up in unexpected ways.
- Tuesday, April 3, 2018-- Spring Break --
- Tuesday, April 10, 2018Cancelled
- Tuesday, April 17, 2018Reading in the Panopticon: eBook Reader SurveillanceSteve Wicker
In this talk I will explore the surveillance technologies that Amazon has embedded in the Kindle. We start with the technology, developing
an understanding of the type and granularity of data that can be collected. I will show how this technology was initially intended as a DRM
mechanism, but quickly became a marketing tool. We will then consider the extent to which anonymous reading is a first
amendment concern, linking expression (output) to reading (input). We will conclude with a consideration of potential technical and policy
solutions. - Tuesday, April 24, 2018Giving Professional TalksKen BirmanWhether your career will center on teaching, research at a company or launching a product, you need to get used to presenting ideas to professional audiences. This isn't as simple as it sounds! I'll share some of the secrets.
- Tuesday, May 1, 2018Why Don't Bicycles Fall Down?Andy RuinaYou can balance a bike with hands on the handlebars. Or off. And, surprising if you haven't seen it, a bike can balance itself, with no person touching it. When viewed from the back, a bicycle looks like a stick balanced on end. If it tips a little, gravity pulls it down more.
A bike should 'want' to fall over, but why doesn't it? There are famous popular theories going back over 100 years, some by famous people. Maybe, the spinning wheels are like a top or gyroscope? Maybe it's the steering geometry? New experiments show that these ideas are (mostly) wrong.
Some of the bike ideas are related to how people don't fall down when they walk, and how to make better walking robots. - Tuesday, May 8, 2018Cancelled
- Tuesday, January 30, 2018
- Tuesday, August 28, 2018Cancelled
- Tuesday, September 4, 2018Fellowship Application WorkshopRoss KnepperI will describe why PhD students may want to apply to competitive fellowships and give advice on how to maximize the chances of having a successful application. I will also demonstrate how the skills developed in writing fellowship applications will be important throughout your time in grad school and beyond.
Recommended reading before the seminar: "Good Writing" by Marc Raibert -- https://www.cs.cmu.edu/~pausch/Randy/Randy/raibert.htm - Tuesday, September 11, 2018Continuous Reconfiguration of Polymorphic HardwareAdrian Sampson & Chrisopher BattenThe slowing advances in the efficiency of general-purpose machines has given rise to an era of specialized computing. While one-off hardware accelerator designs offer new leaps in efficiency, they sacrifice flexibility and programmability. We are designing a new kind of reconfigurable architecture based on a programmable memory system and configurable spatial compute fabric. The system is designed for high-frequency reconfiguration based on shifting application demands. It combines general-purpose cores, reconfigurable arrays of processing elements, a flexible ensemble of on-chip memories, and a reconfigurable interface to 3D stacked DRAM. The project aims to approach ASIC-like efficiency by continuously optimizing the system's organization to specialize the computation and storage structures for specific applications.
This work is in its early stages. It will encompass the collaborative design of the new hardware with a compiler infrastructure to exploit it. This talk will focus on our vision for the project and how PhD students can get involved. - Tuesday, September 18, 2018Pseudorandom generators from Fourier BoundsEshan ChattopadhyayPseudorandom generators (PRGs) are objects that take a short random seed, and stretches it to a much longer string that looks random to a targeted computation class. Constructing PRGs for interesting classes of functions is a central goal in complexity theory. However, the state-of-art constructions of PRGs are far-off from our derandomization goals (e.g., proving every polynomial time randomized algorithm can be turned into a deterministic algorithm a.k.a. proving P=BPP).
In this talk, I will describe a new framework for constructing PRGs that provides a unified construction for various important complexity classes such as constant depth circuits (AC0), algorithms with limited memory (read-once branching programs), low-sensitivity functions, etc. A tantalizing possibility of this approach is a PRG for the class of constant depth circuits that also contain parity gates (ACO[Mod 2])--a complexity class that is just beyond AC0 but no known PRG exists with less than seed length 0.99n.
I will conclude with a few open problems that comes out of this new approach.
Based on joint works with Pooya Hatami, Kaave Hosseini, Shachar Lovett, and Avishay Tal. - Tuesday, September 25, 2018Context-dependent Natural Language UnderstandingYoav ArtziUnderstanding natural language requires considering both sentence meaning and signals from the context of the interaction. In this talk, I will describe two projects about learning to map context-dependent sentences to code or system actions. In the first, we generate SQL code from natural language queries to a database. The user provides the queries within an interaction where it gradually refines their intention based on the system response. The queries are heavily dependent on the history of the interaction, and require generating long and complex SQL queries. We show that intelligently copying segments from previous queries instead of generating from scratch effectively captures the referential structure of the interaction. The second project addresses a scenario where the user instruct an agent to act in an environment using natural language instructions. We show that using representation learning without any explicit modeling of meaning or context to directly generate actions significantly improves execution accuracy by up to 68% by removing implicitly introduced assumptions.
- Tuesday, October 2, 2018Cancelled
- Tuesday, October 9, 2018-- Fall Break --
- Tuesday, October 16, 2018Unbiased Learning from Biased FeedbackThorsten JoachimsLogged user interactions are one of the most ubiquitous forms of data available, as they can be recorded from a variety of systems (e.g., search engines, recommender systems, ad placement) at little cost. Naively using this data, however, is prone to failure. A key problem lies in biases the system injects into the logs by influencing where we will receive feedback (e.g., more clicks at the top of the search ranking). To overcome the bias problem, the talk lays out a research agenda around counterfactual inference techniques that can make learning algorithms robust to bias. This makes log data accessible to a broad range of learning algorithms, from Conditional Random Fields to Deep Networks.
Bio:
Thorsten Joachims is a Professor in the Department of Computer Science and the Department of Information Science at Cornell University. His research interests center on a synthesis of theory and system building in machine learning, with applications in search, recommendation, and language technology. His past research focused on counterfactual and causal inference, support vector machines, text classification, structured output prediction, convex optimization, learning to rank, learning with preferences, and learning from implicit feedback. He is an ACM Fellow, AAAI Fellow, and Humboldt Fellow. - Tuesday, October 23, 2018Learning in Low Precision Without Losing AccuracyChristopher De SaRecently there has been an explosion of interest around studying the effects of low-precision computation on machine learning applications. This is because purpose-built, low-precision hardware accelerators can lower both the time and energy needed to complete a task. Despite this, the statistical effects of low-precision computation during training are not well understood. This is due to a tradeoff typically found with low-precision training algorithms: as the number of bits is lowered, noise that limits statistical accuracy is added. How can we avoid the accuracy lost when using low-precision arithmetic for learning? And can this tradeoff be avoided entirely in some cases? In this talk, I will describe two new training algorithms that address these questions, both of which use a small amount of infrequent high-precision computation to reduce the error caused by low-precision computation.
- Tuesday, October 30, 2018TBA
- Tuesday, November 6, 2018Writing a compiler and getting a PhD in the 1960sDavid GriesGries, a New Yorker, happened onto a compiler writing project in 1962 simply by chance, as a Research Assistant while a PhD student in Math at the University of Illinois. It led him to get his PhD in Munich, Germany. We'll look at pieces of the compiler itself and talk about the trials and tribulations (and GOOD things) about getting his PhD in Europe.
- Tuesday, November 13, 2018Department Town Hall
- Tuesday, November 20, 2018TBA
- Tuesday, November 27, 2018Information Extraction --- from Opinions to Arguments to PersuasionClaire CardieA long line of research in Natural Language Processing (NLP),
including our own, has addressed the task of finding and extracting
opinions in text. This talk will present some of our new research that
tries go beyond the automatic natural language analysis of opinions
to focus instead on (1) identifying and interpreting the arguments that
underlie them; and (2) examining the role of language and NLP
techniques in the study of persuasion in argumentative text. - Tuesday, December 4, 2018Training session on reviewing Ph.D. applicationsPhD admissions committeeTBA
- Tuesday, August 28, 2018
- Tuesday, January 22, 2019Cancelled
- Tuesday, January 29, 2019TBA
- Tuesday, February 5, 2019Review articles and policy reports relevant to the grad student experienceJonathan ShiOkay, since we're doing such a bad job of finding faculty speakers lately, your favorite 6th-year theory student brown bag czar will host a reading and discussion session, satisfying the professional development objective of the brown bag seminar series!
I've listed a few articles of interest below. Please make a vote in this quick poll <https://www.surveymonkey.com/r/P5G6BQ6> on which ones you'd be most interested in discussing. I'm gonna print out copies for everyone, so please have your votes in by 11am.
The plan for tomorrow will be 5 minutes of coordination, 30 minutes of reading/discussing in small groups, and 20 minutes of discussion between groups.
NAS report on institutional and cultural changes that would improve graduate education in STEM fields (including making the system less university-centered and more student-centered):
https://www.nap.edu/catalog/25038/graduate-stem-education-for-the-21st-century
"Graduate STEM Education for the 21st Century" (pages 3-7 and 105-123; possibly also 127-137)
CRA memos on best practices, including the role of teaching faculty, how to incentivize impact over publication, the role of postdocs, and the evalution of interdisciplinary faculty:
https://cra.org/wp-content/uploads/2018/08/Teaching-Faculty-BP-Memo.pdf
"Laying a Foundation: Best Practices for Engaging Teaching Faculty in Research Computing Departments" (6 pages)
https://cra.org/resources/best-practice-memos/incentivizing-quality-and-impact-evaluating-scholarship-in-hiring-tenure-and-promotion/
"Incentivizing Quality and Impact: Evaluating Scholarship in Hiring, Tenure, and Promotion" (3 pages)
https://cra.org/wp-content/uploads/2016/03/Computer_SciencePostdocs_Best_Practices.pdf
"Computer Science Postdocs -- Best Practices" (11 pages)
http://archive2.cra.org/uploads/documents/resources/bpmemos/bestpractices.promotions_.tenure_.pdf
"Promotion and Tenure of Interdisciplinary Faculty" (6 pages)
AAUW report on why women in STEM are underrepresented and what can be done to change that:
https://www.aauw.org/files/2013/02/Why-So-Few-Women-in-Science-Technology-Engineering-and-Mathematics.pdf
"Why So Few?: Women in Science, Technology, Engineering, and Mathematics" (3 page executive summary, Chapter 1 "Women and Girls in STEM" 27 pages, Chapter 10 "Recommendations" 5 pages)
Review article in educational psychology on the persistent myth of "minimally guided instruction":
http://www.cogtech.usc.edu/publications/kirschner_Sweller_Clark.pdf
"Why Minimal Guidance During Instruction Does Not Work: An Analysis of the Failure of Constructivist, Discovery, Problem-Based, Experiential, and Inquiry-Based Teaching" (9 pages)
Educational psychology article discussing the large difference in cultural associations and understanding attached to the concept of "learning" in U.S. and Chinese cultures:
https://psycnet.apa.org/record/2003-00780-005
"U.S. and Chinese Cultural Beliefs About Learning" (9 pages; get full text from Cornell Library catalog)
A discussion of when and how selecting a diverse group of problem-solvers might be better than selecting a group of individual high-performers:
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC528939/
"Groups of diverse problem solvers can outperform groups of high-ability problem solvers" (5 pages)
Overview of the advice that prominent researchers in a variety fields would like to give to early-career scientists, summarized into the "keys" of relationships, passion, resilience, leadership, strategy, balance, and integrity:
https://www.sciencedirect.com/science/article/pii/S109727651830786X
"Cultivating the Human Dimension in Research" (4 pages) - Tuesday, February 12, 20197 Lessons from Teaching (that I wish I'd learned earlier) & Observations of Active Learning (a 10 year retrospective)Anne Bracy & Xanda Schofield
7 Lessons from Teaching (that I wish I'd learned earlier)
Every teacher I know has lots of stories of things that have gone well, wrong, and hilariously in their class. These stories are not only fun to share, but also shape how we define our own teaching style and ideas about how to better help our students. I'll present several lessons that were particularly important to me as a new teacher, and some stories of how I learned them.
Observations of Active Learning (a 10 year retrospective)
Active Learning is a style/philosophy of teaching that has been in the educational ether for decades and on my radar as an educator in computer science for the past 10 years (at Washington University in St. Louis and at Cornell University). I will share my observations of various attempts to incorporate Active Learning methodology into various classes (some mine, mostly others'), institutional responses to this philosophy, and how our/my understanding of what Active Learning is and can be implemented has changed over time. - Tuesday, February 19, 2019Origin and Interpretation of the CRA Best Practices Memo: "Incentivizing Quality and Impact ..."Fred SchneiderIn Feb 2015, the Computing Research Association issued a memo
suggesting criteria for scholarly publication used to support hiring
and promotion cases. The history and context for that CRA effort will
be presented, and the meat of the recommendations (which remain
controversial) will be discussed.
The memo is on-line and you are encouraged to look at a copy before
the session:
http://archive2.cra.org/uploads/documents/resources/bpmemos/BP_Memo - Tuesday, February 26, 2019
- Tuesday, March 5, 2019Towards Supporting a Trustworthy Information EcosystemMor NaamanI will describe a set of projects aiming to realign our information technology ecosystem to better serve societal goals. First, I will discuss two projects that use AI to help fight adversarial online interactions in support of our most important information workers: journalists. The first project uses computer vision algorithms to allow journalists to collaborate around visual misinformation: images shared online to manipulate and mislead journalists and others. The second project uses new computational methods to better detect online harassment campaigns like those often targeting journalists and political candidates.
Tackling adversarial interactions/information alone may not be enough. How can we nudge our information ecosystems towards trustworthiness? In the second part of the talk I will share our recent online experiment research on trust in online news (with surprising results!), and draw on these results to discuss a theoretical path for increasing online trust.
Joint work with many including Cornell PhD students Yiqing Hua and Maurice Jakesch.
Mor Naaman is an associate professor of Information Science at the Jacobs Institute at Cornell Tech, where he is the founder of the Connective Media hub, leads a research group focused on social technologies, and directs the Oath-supported Connected Experiences laboratory. His research group designs, builds, and studies studies social systems, with a focus on topics related to Technology, Media and Democracy. Mor applies multidisciplinary methods to 1) gain a better understanding of people and their use of social tech; 2) extract insights about people, technology and society from social media and other sources of social data, and 3) develop new social technologies as well as novel tools to make social data more accessible and usable in various settings. Previously, Mor was on the faculty at the Rutgers School of Communication and Information, led a research team at Yahoo! Research Berkeley, received a Ph.D. in Computer Science from the Stanford University InfoLab, and played professional basketball for Hapoel Tel Aviv. He is a recipient of a NSF Early Faculty CAREER Award, research awards and grants from numerous corporations including AOL and Google, and multiple best paper awards.
If needed, high-res headshots are available:
https://www.flickr.com/photos/mmoorr/15592772261/sizes/z/
https://www.flickr.com/photos/mmoorr/15516802398/sizes/z/ - Tuesday, March 12, 2019Reflections on my path through grad schoolBobby KleinbergI started grad school aiming to do a Ph.D. in pure mathematics, but ended up working on theoretical computer science. Finding my way through to a Ph.D. involved joining a start-up for several years, doing some deep soul-searching, and changing advisors. I will offer my story in the hope that students will find it illuminating.
- Tuesday, March 19, 2019An Operational Measure of Information Leakage in Side ChannelsAaron WagnerHow much information is "leaked" in a side channel? Despite decades
of work on these channels, including the development of many
sophisticated mitigation mechanisms for specific side channels, the
fundamental question of how to measure the key quantity of
interest---leakage---has received surprisingly little attention.
Many metrics have been used in the literature, but these metrics
either lack a cogent operational justification or mislabel systems
that are obviously insecure as secure.
We propose a new metric called "maximal leakage," defined as the
logarithm of the multiplicative increase, upon observing the public
data, of the probability of correctly guessing a randomized function of
the private information, maximized over all such randomized functions.
We provide an operational justification for this definition, show how it
can be computed in practice, and discuss how it relates to existing
metrics, including mutual information, local differential privacy, and
a certain under-appreciated metric in the computer science literature.
We also present some structural results for optimal mechanisms under
this metric. Among other findings, we show that mutual information
underestimates leakage while local differential privacy overestimates
it.
This is joint work with Ibrahim Issa, Sudeep Kamath, Ben Wu, and Ed Suh. - Tuesday, March 26, 2019
- Tuesday, April 2, 2019
- Tuesday, April 9, 2019High-level Abstractions for Network ProgrammingNate FosterOver the past ten years, programmable networks have gone from a dream to a reality. Software-defined networking (SDN) architectures provide interfaces for specifying network-wide control algorithms, and emerging hardware platforms are exposing programmability at the forwarding plane level as well. But despite much progress, several fundamental questions remain: What are the right abstractions for writing network programs? How do they differ from the abstractions we use to write ordinary software? Can we implement these abstractions efficiently on current hardware? This talk will attempt to answer these questions by exploring the design and implementation of high-level abstractions for network programming. In the first part of the talk, I will present NetKAT, a language for programming the forwarding plane based on a surprising connection to regular languages and finite automata. In the second part of the talk, I will present an abstraction for building SDN control planes that gracefully transitions the network between network-side configurations while preserving programmer-specified notions of consistency.
- Tuesday, April 16, 2019Towards Building Non-polarizing Recommender SystemsKarthik SridharanAn inherent trait of recommendation systems is that they tend to influence their users. Often this influence is unintentional and sometimes causes polarization of the users. Consider a social media agency interested in recommending new articles to its users over multiple days. If the agency tries to simply predict what the user might like and greedily provide recommendations, it might end up polarizing its users. To better illustrate this phenomenon, consider the news agency that provides articles or recommendation about fruits. Say we have a user who initially likes apples and oranges equally and just happens to receive some article about apples and indicates to the system that the she might like apples. The recommender system that learns of this will initially start to recommend with a mild bias, articles about apples and their health benefits. Now subsequent rounds of interactions with this system leaves on the user a strong opinion about apples and the user might start to prefer apples over oranges, all the while the system further would strengthen its belief that the user really prefers apples over oranges. Continuous interaction with such a system leaves this user, who started as a person initially being neutral about apples Vs oranges, into someone who is an apple fanatic. Clearly this was just by happenstance and just as easily, the initial interactions could have swayed towards user liking oranges. The issue of polarization is further worsened by the notion of confirmation bias of users who perceive contents differently based on their prior beliefs on each round which might further speed up the polarization. Additionally, the issue of polarization by recommender systems can be worsened when one considers the fact that the users might be part of a social network and tend to share ideas and opinions. Users are often part of user groups or cliques, and these groups tend to further influence user preferences within the group. Specifically, there is an intrinsic bias for users to follow the herd, so to speak, and users can be more easily convinced to agree with their group's view while disagreeing strongly with others not in the group. Hence, a recommendation system, by making the greedy choice of articles to show to the users, might inadvertently polarize its users intro groups with strongly opposing opinions on issues.
In this talk I will present some of our initial attempts at building theory and algorithm design principles for building machine learning systems that not only aim to predict or recommend with high accuracy but also aim to no further polarize its users. Specifically, we assume that the users of the system are interconnected to each other via a social network and that the machine learning algorithm has access to the structure of this social network. We then build and extend existing mathematical models for formation/evolution of opinions of users based on whats recommended to them and the interaction with their friends in the social network. Finally we provide an algorithm design principle for building recommendation systems that use the knowledge of the underlying social network to provide recommendations to users that not only aim for high accuracy but simultaneously aim to reduce a natural measure of polarization we propose. We show that under our model of opinion formation dynamics (that subsumes existing model for opinion dynamics) our recommendation algorithm provably has low polarization effect.
Joint work with Wilson Yoo - Tuesday, April 23, 2019Promoting Computer Science in GhanaRobbert van RenesseComputer Science is not regarded as a good career option in Ghana, which may be representative of other African countries. Together with a group of US-based CS students originally from Ghana, we organized several high school outreach events in Ghana last January. Together we will talk about our experiences and where we intend to go from here.
- Tuesday, April 30, 2019
- Tuesday, May 7, 2019The tenure-track faculty recruiting processKavita Bala (Chair), Bobby Kleinberg (DGS), Lorenzo Alvisi (Recruiting Chair)We will talk about the tenure-track faculty recruiting process from the point of view of the department; i.e., what process do we follow to select candidates to interview, the interview process, the decision process, offers, etc.
- Tuesday, January 22, 2019
- Tuesday, September 3, 2019TBA
- Tuesday, September 10, 2019Algorithm-Accelerator Co-Design for Neural Network SpecializationZhiru ZhangIn recent years, machine learning (ML) with deep neural networks (DNNs) has been widely deployed in diverse application domains. However, the growing complexity of DNN models, the slowdown of technology scaling, and the proliferation of edge devices are driving a demand for higher DNN performance and energy efficiency. ML applications have shifted from general-purpose processors to dedicated hardware accelerators in both academic and commercial settings. In line with this trend, there has been an active body of research on both algorithms and hardware architectures for neural network specialization.
This talk presents our recent investigation into DNN optimization and low-precision quantization, using a co-design approach featuring contributions to both algorithms and hardware accelerators. First, we review static network pruning techniques and show a fundamental link between group convolutions and circulant matrices -- two previously disparate lines of research in DNN compression. Then we discuss channel gating, a dynamic, fine-grained, and trainable technique for DNN acceleration. Unlike static approaches, channel gating exploits input-dependent dynamic sparsity at run time. This results in a significant reduction in compute cost with a minimal impact on accuracy. Finally, we present outlier channel splitting, a technique to improve DNN weight quantization by removing outliers from the weight distribution without retraining. - Tuesday, September 17, 2019Fellowship Application WorkshopRoss KnepperI will describe why PhD students may want to apply to competitive fellowships and give advice on how to maximize the chances of having a successful application. I will also demonstrate how the skills developed in writing fellowship applications will be important throughout your time in grad school and beyond.
Recommended reading before the seminar: "Good Writing" by Marc Raibert -- https://www.cs.cmu.edu/~pausch/Randy/Randy/raibert.htm - Tuesday, September 24, 2019Foundations of Machine learning by the people, for the peopleNika HaghtalabTypical analysis of learning algorithms considers their outcome in isolation from the effects that they may have on the process that generates the data or the entity that is interested in learning. However, current technological trends mean that people and organizations increasingly interact with learning systems, making it necessary to consider these effects, which fundamentally change the nature of learning and the challenges involved. In this talk, I will explore two three lines of research from my work on the theoretical aspects of machine learning and algorithmic economics that account for these interactions: learning optimal policies in game-theoretic settings, without an accurate behavioral model, by interacting with people; managing people's expertise and resources in data-collection and machine learning; and broader societal impacts of learning.
- Tuesday, October 1, 2019TBA
- Tuesday, October 8, 2019TBA
- Tuesday, October 15, 2019
- Tuesday, October 22, 2019
- Tuesday, October 29, 2019TBA
- Tuesday, November 5, 2019Student ColloquiumGeoff Pleiss, Vishal Shrivastav, Laure ThompsonFrom n=1,000 to n=1,000,000: Scaling Up Gaussian Processes Inference with Matrix Multiplication and GPU Acceleration - Geoff Pleiss
Gaussian processes (GPs) are powerful machine learning models-offering well-calibrated uncertainty estimates, interpretable predictions, and the ability to encode prior knowledge. Despite these desirable properties, GPs are typically not applied to datasets with more than a few thousand data points-in part because of an inference procedure that requires matrix inverses, determinants, and other expensive operations. In this talk, I will discuss how my collaborators and I were able to scale GPs to datasets with over 1 million points, without making any simplifying assumptions. Taking inspiration from neural network libraries, we constrained ourselves to writing a GP inference algorithm that only used matrix multiplication and other linear operations-procedures that are extremely amenable to parallelization, GPU acceleration, and distributed computing. The resulting algorithm, Blackbox Matrix-Matrix Inference (BBMM), is up to 100x faster than existing inference procedures and scales to datasets that are 2 orders of magnitude larger than what has previously been reported.
Building High-speed Datacenter Networks in the Post-Moore's Law Era - Vishal Shrivastav
With the slowdown in Moore's law and the end of Dennard scaling, general-purpose CPUs and packet switches have hit a fundamental performance wall. However, the bandwidth demand within datacenters keeps growing exponentially: Applications keep getting more distributed and resources (e.g., storage) keep getting disaggregated demanding more bandwidth. In this talk, I will discuss how my research attempts to bridge this gap for next-generation datacenter networks. To overcome the limitations of general-purpose CPUs, my research argues for domain-specific architectures for network processing. I will start by briefly introducing several domain-specific hardware architectures that I have proposed over the course of my PhD, targeted at improving a wide-range of core network functions including packet scheduling, packet processing, congestion control, and network time synchronization.
To accommodate the exponential demand for network bandwidth, in the talk I will focus on how my research attempts to overcome the fundamental limitations of packet switching within datacenters. I will describe a new approach called Shoal that is the first end-to-end network design for a fast circuit-switched network. I will conclude my talk by discussing how Shoal takes us a giant step towards realizing the idealistic goal of a datacenter switching fabric that could support practically unlimited bandwidth at low power, low cost and high performance.
Understanding and Directing What Models Learn - Laure Thompson
Machine learning and statistical methods, such as unsupervised semantic models, are popular and useful techniques for making massive digital collections more explorable and analyzable. But what underlying patterns do these models actually learn, and which patterns are they most likely to repeatedly learn? Moreover, how might we direct what these models learn so that they are useful to a wider range of scholarly inquiry? While it might be useful to organize texts by authors, learning this structure is seldom useful when already known and can be problematic if it is mischaracterized as a cross-cutting pattern. In this talk, I will focus on a specific problem: discuss my recent work on measuring and mitigating topic-metadata correlation in topic models. - Tuesday, November 12, 2019Building a Platform to Host Real Time ML TasksKen BirmanWith all the talk about smart homes, smart cities and highways, smart grid and even smart farms, it may be surprising to realize that today's IoT platform lack ML computing infrastructure support. The popular tools for big-data analytics and ML, like MapReduce/Spark, PyTorch, TensorFlow, etc, simply don't want work in IoT settings. As a result, people tackling smart applications currently build everything from scratch.
My goal over the next few years is to evolve the Derecho platform to tackle this gap. In the talk I'll discuss the [AIoT critical path and how it differs from a batched big-data analytics environment. Then we'll ask whether one could create a high-performance platform aimed at intelligent IoT computing, but one that would preserve a familiar look and feel. The goal would be to make it easy to move ML computing from today's high-productivity offline settings into IoT environments with minimal changes. - Tuesday, November 19, 2019Student ColloquiumPaul Grubbs, Jack Hessel, Praveen KumarBreaking and Building End-to-End Encrypted Systems - Paul Grubbs
Today's computer systems and their owners fail to protect data, and this failure is exacerbated by new threats caused by the explosion of cloud computing. The consequences are dire: sensitive information like financial statements, medical records, and private messages are disclosed to malicious parties. In my research at the intersection of security, cryptography, and systems, I work to change this by breaking and building efficient end-to-end (E2E) encrypted systems, which protect data by encrypting it throughout processing and storage. In this talk, I'll explain some of the flaws I've found in existing E2E-encrypted systems deployed to billions of users, and how the flaws have led me to a new methodology for building E2E-encrypted systems that's rooted in co-design of cryptography and systems. I'll conclude by outlining this methodology and some of the new E2E-encrypted systems I've built with it.
Unsupervised Discovery of Multimodal Links in Multi-image, Multi-sentence Documents - Jack Hessel
Images and text co-occur constantly on the web, but explicit links between images and sentences (or other intra-document textual units) are often not present. We present algorithms that discover image-sentence relationships without relying on explicit multimodal annotation in training. We experiment on seven datasets of varying difficulty, ranging from documents consisting of groups of images captioned post hoc by crowdworkers to naturally-occurring user-generated multimodal documents. We find that a structured training objective based on identifying whether collections of images and sentences co-occur in documents can suffice to predict links between specific sentences and specific images within the same document at test time.
This is joint work with Lillian Lee and David Mimno
Towards predictable network performance - Praveen Kumar
Performance isolation is a fundamental challenge in any shared system. While it has been well-studied in the context of operating systems, it is exacerbated in the context of shared public clouds as the scale and distributed nature of the cloud pose new challenges. In this talk, I will focus on the fundamental trade-off between network performance isolation and resource efficiency in public clouds, and demonstrate how isolation break down at the end-hosts can lead to unpredictable performance. Then, I will present a system, PicNIC, that navigates this trade-off to provide predictable performance by introducing the abstraction of a "predictable virtualized NIC" for each virtual machine (VM). PicNIC defines network performance objectives for each VM and leverages two key design principles to achieve them: (i) resource sharing based on performance objectives and (ii) applying backpressure to sources. Finally, I will conclude with some thoughts on how we can achieve both high and predictable network performance as we move towards domain-specific hardware and accelerators for networking. - Tuesday, November 26, 2019Formal Synthesis for RobotsHadas Kress-GazitIn this talk I will describe how formal methods such as synthesis - automatically creating a system from a formal specification - can be leveraged to design robots, explain and provide guarantees for their behavior, and even identify skills they might be missing. I will discuss the benefits and challenges of synthesis techniques and will give examples of different robotic systems including modular robots, swarms and robots interacting with people.
- Tuesday, December 3, 2019Grad admissions trainingDavid BindelWe will have a "How to Read a PhD Application" training / Brown Bag lunch session on Dec 3 (the Tuesday after break). We have records from last year, so if you were already trained, you are not required to attend. Otherwise, it is MANDATORY to take the training in order to participate in the evaluation! I understand some of you may have conflicting obligations, so if you want to participate but cannot go to the 12/3 brown bag (and did not get the training last year), drop me a line, and we will figure out something.
- Tuesday, December 10, 2019
- Tuesday, September 3, 2019
- Monday, February 3, 2020How to Give a TalkDavid BindelResearchers give talks to share their knowledge and excitement with colleagues, students, funding agencies, and potential employers. In this talk, I share some broad ideas about what makes a talk "successful", along with more detailed thoughts about the logistics of giving a talk, including thinking about speech and body language, slide designs, presentation technology, and the fine art of answering questions.
- Monday, February 10, 2020Origin and Interpretation of the CRA Best Practices Memo: "Incentivizing Quality and Impact ..."Fred SchneiderIn Feb 2015, the Computing Research Association issued a memo
suggesting criteria for scholarly publication used to support hiring
and promotion cases. The history and context for that CRA effort will
be presented, and the meat of the recommendations (which remain
controversial) will be discussed.
The memo is on-line and you are encouraged to look at a copy before
the session:
http://archive2.cra.org/uploads/documents/resources/bpmemos/BP_Memo - Monday, February 17, 2020Outreach and Code AfriqueRobbert van RenesseOutreach can be fun, rewarding, and good for your career too. We will talk about Code Afrique, an effort to popularize computer science as a career option in Ghana and Eswatini.
- Monday, February 24, 2020
- Monday, March 2, 2020Career Paths for a CS PhDNika Haghtalab, Curran Muhlberger, and Anil DamleThis interactive panel discussion will focus on the wide ranging career opportunities available to CS PhD students. Centered around and introducing the diverse set of possible options, this panel will discuss the preparation a PhD affords, skills relevant to these jobs, how to consider educational decisions in relation to possible careers, practical aspects of pursing specific choices, and more (in whatever direction the discussion goes).
- Monday, March 9, 2020How to pick a problem and advisorEva Tardos, Rachit Agarwal, Bharath HariharanHow to pick a problem and advisor
- Monday, March 16, 2020TBA
- Monday, March 23, 2020
- Monday, March 30, 2020
- Monday, April 6, 2020How to tell a technical storyKen BirmanWhen we write papers or talk to people about our work, it can be a puzzle to find a way to convey the story behind the technical results. I'll focus on papers. We'll look at some classic great papers and how they handled the problem, and at a few classic "fails", and see what lessons can be learned!
Zoom link: https://cornell.zoom.us/j/765160159 - Monday, April 13, 2020Contemplating collaborationDavid BindelCollaborating on research helps us do things that would be hard to manage on our own: couple theory to experiments, work across disciplinary boundaries, and tackle more ambitious projects. Collaborators are also an important part of our professional networks. They are often the ones that write letters on our behalf, invite us to give colloquium and seminar talks, and team with us on large grants. But how does one get a collaboration started, and what are the pitfalls to watch for along the way? In this talk, we discuss different types of collaborations, along with issues of starting collaborations, managing differences in work styles, and thinking about authorship and credit.
- Monday, April 20, 2020An overview of CS publication venuesEva Tardos, Andrew Myers, Lorenzo Alvisi, Chris De Sa, David BindelA faculty panel will give an overview of where you might consider publishing your results.
- Monday, April 27, 2020TBA
- Monday, May 4, 2020Work-life balance in a research careerEva Tardos, Andrew Myers, Dexter Kozen, Adrian SampsonWork-life balance in a research career
- Monday, February 3, 2020
- Tuesday, September 8, 2020
- Tuesday, September 15, 2020Setting and managing expectations in grad schoolBharath HariharanResearch in general and grad school in particular can be frustrating at times. It is common to feel that one is inadequate, that one is falling behind, or that the academic community is unfair. Some of this is because open ended research is a very different beast compared to all previous schooling. Some of it is also because of a hidden set of expectations that you may not be privy to.
In this talk, I will cover in broad strokes what is expected of you as phd students, what you must expect of research, and how to cope when it feels like things are not going the way you hoped. I will also attempt to answer any questions you may have about mismatched expectations that you feel are holding you back.
Zoom info: https://cornell.zoom.us/j/91898312704?pwd=c3l6VzVKd1Mvd2Z5encvdmtZaWZnUT09 - Tuesday, September 22, 2020Giving a research talkKen BirmanAll of us need to give research and other professional talks, but it doesn't come naturally to everyone. I'm someone who found it hard to plan my talks, and I used to run over and get stressed by that. In this professional development BB lunch, I'll talk about some of the techniques I've learned over the years that help me plan talks with more confidence, and also to deal with stress when things don't go quite as expected!
- Tuesday, September 29, 2020How to apply for fellowshipsRachit AgarwalFellowships come with several benefits --- improved visibility in the community, prestige for you and for your department, and potential interactions with mentors assigned along with the fellowship (especially, for industry fellowships). Applying for a fellowship also forces you to think "big", that is beyond the scope of a single problem --- a skill that will help you during and beyond your graduate studies.
Applying for a fellowship, however, could be daunting. This brown bag discussion will provide some advice on how to approach writing a fellowship application, and on how to maximize the chances of having a successful application. - Tuesday, October 6, 2020TBA
- Tuesday, October 13, 2020TBA
- Tuesday, October 20, 2020Stochastic LA for scalable GPsDavid BindelGaussian processes (GPs) define a distribution over functions that generalizes the multivariate normal distribution over vector spaces. Long used as a tool for spatio-temporal statistical modeling, GPs are also a key part of the modern arsenal in machine learning. Unfortunately, Gaussian process regression and kernel hyper-parameter estimation with $N$ training examples involve manipulating a dense $N$-by-$N$ kernel matrix, and standard factorization-based approaches to the underlying linear algebra problems have $O(N^3)$ scaling. For regression with a fixed covariance kernel, more scalable iterative methods based on fast matrix-vector multiplication with the kernel matrices are available. However, maximum likelihood estimation of kernel hyper-parameters and computation of conditional variances involve operations such as computing log derivatives and their derivatives or extracting the diagonal part of a Schur complement. New tools are needed to address these problems in a scalable manner. In this talk, we discuss our recent work on one such set of tools, based on a combination of Krylov subspace methods for matrix solves and matrix function applications together with stochastic estimators for the trace and diagonal of a matrix using only matrix-vector multiplies.
This is joint work with Kun Dong, David Eriksson, and Andrew Wilson. - Tuesday, October 27, 2020Abstraction Barriers for Physically Embodied AlgorithmsNils NappDesigning robotic systems to reliably modify their environment typically requires expert engineers and multiple design iterations. This talk will cover abstraction barriers that can be used to make the process of building such systems easier and the results more predictable. By focusing on approximate mathematical representations that model the process dynamics, this abstract representation can be used both to design high-level algorithms and physical robotic systems to solve complex construction tasks. This talk will present ongoing work for two types of abstraction barriers, which can represent different construction processes, and show how they can be used to design robotic systems to execute novel construction tasks.
- Tuesday, November 3, 2020Coping with 2020: Resiliency in a difficult timeCornell HealthThe year 2020 has been and is a year of unprecedented challenges and change, and there appears to be no end in sight. How to we manage, cope and thrive in these difficult times?
- Tuesday, November 10, 2020
- Tuesday, November 17, 2020
- Tuesday, November 24, 2020
- Tuesday, December 1, 2020Smoothing probability distributions for high dimensional learning and inferenceZiv GoldfeldThe talk will explore the benefits of smoothing probability distribution (by convolving them with a chosen kernel) for high dimensional learning and inference. Machine learning tasks often involve optimization of evaluation of a certain functional of the underlying data distribution, e.g., loss function, information measure, statistical distance, etc. In practice, we rarely have access to the actual distribution and only get data from it. This necessitates estimating the distribution or the functional of interest from samples. A central issue is that such estimators suffer from the curse of dimensionality, i.e., their sample complexity grows exponentially fast with dimension. This makes it impossible to obtain meaningful accuracy guarantees, considering the dimensionality of real-world data. As we shall see, smoothing alleviates the curse of dimensionally while preserving the capability to perform inference by leveling out local irregularities in the considered distributions. This enables constructing estimators with scalable (in dimension) sample complexity guarantees and opens the door for various applications. The talk will cover two such applications: measuring information flows in deep neural network classifiers and implicit generative modeling via minimum distance estimation. We will discuss the original challenges, how smoothing helps to deal with them, remaining gaps, and ongoing/future research trajectories.
- Tuesday, December 8, 2020Analysis and interventions in large network games: graphon games and graphon contagionFrancesca PariseMany of today's most promising technological systems involve very large numbers of autonomous agents that influence each other and make strategic decisions within a network structure. Examples include opinion dynamics, targeted marketing in social networks, economic exchange and international trade in financial netnetworks, product adoption decisions and social contagion.
While traditional tools for network game analysis assumed that a social planner has full knowledge of the network of interactions, when we turn to very large networks two issues emerge. First, collecting data about the exact network of interactions becomes very expensive or not at all possible because of privacy and proprietary concerns. Second, methods for designing optimal interventions that rely on the exact network structure typically do not scale well with the population size.
To obviate these issues, in this talk I will present a framework in which the central planner designs interventions based on probabilistic information about agent's interactions, which can easily be inferred from aggregated data, instead of exact network data. I will introduce the tool of "graphon games" as a way to formally describe strategic interactions in this framework and I will illustrate how this tool can be exploited to design interventions that are robust to stochastic network variations. I will cover two main applications: design of targeted interventions for linear quadratic network games and design of optimal seeding policies for threshold contagion processes. In both cases, I will illustrate how the graphon approach leads to interventions that are asymptotically optimal in terms of the population size and can be computed without requiring exact network data. - Tuesday, December 15, 2020How to review PhD applicationsAdrian SampsonThis talk is a short training on how to review PhD applications. It is meant for members of the PhD admissions committee for the 2021 cycle, but all are welcome. Notes are here: https://capra.cs.cornell.edu/phdadmitguide/