## Building the Mathematical Library of the Future – Quanta Magazine

Posted: October 3, 2020 at 4:55 am

Every day, dozens of like-minded mathematicians gather on an online forum called Zulip to build what they believe is the future of their field.

Theyre all devotees of a software program called Lean. Its a proof assistant that, in principle, can help mathematicians write proofs. But before Lean can do that, mathematicians themselves have to manually input mathematics into the program, translating thousands of years of accumulated knowledge into a form Lean can understand.

To many of the people involved, the virtues of the effort are nearly self-evident.

Its just fundamentally obvious that when you digitize something you can use it in new ways, said Kevin Buzzard of Imperial College London. Were going to digitize mathematics and its going to make it better.

Digitizing mathematics is a longtime dream. The expected benefits range from the mundane computers grading students homework to the transcendent: using artificial intelligence to discover new mathematics and find new solutions to old problems. Mathematicians expect that proof assistants could also review journal submissions, finding errors that human reviewers occasionally miss, and handle the tedious technical work that goes into filling in all the details of a proof.

But first, the mathematicians who gather on Zulip must furnish Lean with what amounts to a library of undergraduate math knowledge, and theyre only about halfway there. Lean wont be solving open problems anytime soon, but the people working on it are almost certain that in a few years the program will at least be able to understand the questions on a senior-year final exam.

And after that, who knows? The mathematicians participating in these efforts dont fully anticipate what digital mathematics will be good for.

We dont really know where were headed, said Sbastien Gouzel of the University of Rennes.

Over the summer, a group of experienced Lean users ran an online workshop called Lean for the Curious Mathematician. In the first session, Scott Morrison of the University of Sydney demonstrated how to write a proof in the program.

He began by typing the statement he wanted to prove in syntax Lean understands. In plain English, it translates to There are infinitely many prime numbers. There are several ways to prove this statement, but Morrison wanted to use a slight modification of the first one ever discovered, Euclids proof from 300 BCE, which involves multiplying all known primes together and adding 1 to find a new prime (either the product itself or one of its divisors will be prime). Morrisons choice reflected something basic about using Lean: The user has to come up with the big idea of the proof on their own.

Youre responsible for the first suggestion, Morrison said in a later interview.

After typing the statement and selecting a strategy, Morrison spent a few minutes laying out the structure of the proof: He defined a series of intermediate steps, each of which was relatively simple to prove on its own. While Lean cant come up with the overall strategy of a proof, it can often help execute smaller, concrete steps. In breaking the proof into manageable sub-tasks, Morrison was a bit like a chef instructing line cooks to chop an onion and simmer a stew. Its at this point that you hope Lean takes over and starts being helpful, Morrison said.

Lean performs these intermediate tasks by using automated processes called tactics. Think of them as short algorithms tailored to perform a very specific job.

As he worked through his proof, Morrison ran a tactic called library search. It trawled Leans database of mathematical results and returned some theorems that it thought could fill in the details of a particular section of the proof. Other tactics perform different mathematical chores. One, called linarith, can take a set of inequalities among, say, two real numbers, and confirm for you that a new inequality involving a third number is true: If a is 2 and b is greater than a, then 3a + 4b is greater than 12. Another does most of the work of applying basic algebraic rules like associativity.

Two years ago you would have had to [apply the associative property] yourself in Lean, said Amelia Livingston, an undergraduate math major at Imperial College London who is learning Lean from Buzzard. Then [someone] wrote a tactic that can do it all for you. Every time I use it, I get very happy.

Altogether, it took Morrison 20 minutes to complete Euclids proof. In some places he filled in the details himself; in others he used tactics to do it for him. At each step, Lean checked to make sure his work was consistent with the programs underlying logical rules, which are written in a formal language called dependent type theory.

Its like a sudoku app. If you make a move thats not valid, it will go buzz, Buzzard said. At the end, Lean certified that Morrisons proof worked.

The exercise was exciting in the way it always is when technology steps in to do something you used to do yourself. But Euclids proof has been around for more than 2,000 years. The kinds of problems mathematicians care about today are so complicated that Lean cant even understand the questions yet, let alone support the process of answering them.

It will likely be decades before this is a research tool, said Heather Macbeth of Fordham University, a fellow Lean user.

So before mathematicians can work with Lean on the problems they really care about, they have to equip the program with more mathematics. Thats actually a relatively straightforward task.

Lean being able to understand something is pretty much just a matter of human beings having [translated math textbooks] into the form Lean can understand, Morrison said.

Unfortunately, straightforward doesnt mean easy, especially considering that for a lot of mathematics, textbooks dont really exist.

If you didnt study higher math, the subject probably seems exact and well-documented: Algebra I leads into algebra II, pre-calculus leads into calculus, and its all laid out right there in the textbooks, answer key in the back.

But high school and college math even a lot of graduate school math is a vanishingly small part of the overall knowledge. The vast majority of it is much less organized.

There are huge, important areas of math that have never been fully written down. Theyre stored in the minds of a small circle of people who learned their subfield of math from people who learned it from the person who invented it which is to say, it exists nearly as folklore.

There are other areas where the foundational material has been written down, but its so long and complicated that no one has been able to check that its fully correct. Instead, mathematicians simply have faith.

We rely on the reputation of the author. We know hes a genius and a careful guy, so it must be correct, said Patrick Massot of Paris-Saclay University.

This is one reason why proof assistants are so appealing. Translating mathematics into a language a computer can understand forces mathematicians to finally catalog their knowledge and precisely define objects.

Assia Mahboubi of the French national research institute Inria recalls the first time she realized the potential of such an orderly digital library: It was fascinating for me that one could capture, in theory, the whole mathematical literature by the sheer language of logic and store a corpus of math in a computer and check it and browse it using these pieces of software.

Lean isnt the first program with this potential. The first, called Automath, came out in the 1960s, and Coq, one of the most widely used proof assistants today, came out in 1989. Coq users have formalized a lot of mathematics in its language, but that work has been decentralized and unorganized. Mathematicians worked on projects that interested them and only defined the mathematical objects needed to carry their projects out, often describing those objects in unique ways. As a result, the Coq libraries feel jumbled, like an unplanned city.

Coq is an old man now, and it has a lot of scars, said Mahboubi, who has worked with the program extensively. Its been collaboratively maintained by many people over time, and it has known defects due to its long history.

In 2013, a Microsoft researcher named Leonardo de Moura launched Lean. The name reflects de Mouras desire to create a program with an efficient, uncluttered design. He intended the program to be a tool for checking the accuracy of software code, not mathematics. But checking the correctness of software, it turns out, is a lot like verifying a proof.

We built Lean because we care about software development, and there is this analogy between building math and building software, said de Moura.

When Lean came out, there were plenty of other proof assistants available, including Coq, which is the most similar to Lean the logical foundations of both programs are based on dependent type theory. But Lean represented a chance to start fresh.

Mathematicians gravitated to it quickly. They were such enthusiastic adopters of the program that they started to consume de Mouras time with their math-specific development questions. He got a bit sick of having to manage the mathematicians and said, How about you guys make a separate repository? said Morrison.

Mathematicians created that library in 2017. They called it mathlib and eagerly began to fill it with the worlds mathematical knowledge, making it a kind of 21st-century Library of Alexandria. Mathematicians created and uploaded pieces of digitized mathematics, gradually building a catalog for Lean to draw on. And because mathlib was new, they could learn from the limitations of older systems like Coq and pay extra attention to how they organized the material.

Theres a real effort to make a monolithic library of math in which all the pieces work with all the other pieces, said Macbeth.

The front page of mathlib features a real-time dashboard that charts the projects progress. It has a leaderboard of top contributors, ranked by the number of lines of code theyve created. Theres also a running tally of the total amount of mathematics that has been digitized: As of early October, mathlib contained 18,416 definitions and 38,315 theorems.

These are the ingredients that mathematicians can mix together in Lean to make mathematics. Right now, despite those numbers, its a limited pantry. It contains almost nothing from complex analysis or differential equations two basic elements of many fields of higher math and it doesnt know enough to even state any of the Millennium Prize problems, the Clay Mathematics Institutes list of the most important problems in mathematics.

But mathlib is slowly filling out. The work has the air of a barn raising. On Zulip, mathematicians identify definitions that need to be created, volunteer to write them and quickly provide feedback on each others work.

Any research mathematician can look at mathlib and see 40 things its missing, Macbeth said. So you decide to fill in one of those holes. It really is instant gratification. Someone else reads it and comments on it within 24 hours.

Many of the additions are small, as Sophie Morel of the cole Normale Suprieure in Lyon discovered during the Lean for the Curious Mathematician workshop this summer. The conference organizers gave the participants relatively simple mathematical statements to prove in Lean as practice. While working on one of them, Morel realized her proof called for a lemma a type of short steppingstone result that mathlib didnt have.

It was a very small thing about linear algebra that somehow wasnt yet there. The people who write mathlib try to be thorough, but you can never think of everything, said Morel, who coded the three-line lemma herself.

Other contributions are more momentous. For the last year, Gouzel has been working on a definition of smooth manifold for mathlib. Smooth manifolds are spaces like lines, circles and the surface of a ball that play a fundamental role in the study of geometry and topology. They also often feature in big results in areas like number theory and analysis. You couldnt hope to do most forms of mathematical research without defining one.

But smooth manifolds come in different guises, depending on the context. They can be finite-dimensional or infinite-dimensional, have boundary or not have boundary, and be defined over a variety of number systems, such as the real, complex or p-adic numbers. Defining a smooth manifold is almost like trying to define love: You know it when you see it, but any strict definition is likely to exclude some obvious instances of the phenomenon.

For a basic definition, you dont have any choice [for how you define it], Gouzel said. But with more complicated objects, there are maybe 10 or 20 different ways to formalize it.

Gouzel had to maintain a balancing act between specificity and generality. My rule was, I know 15 applications of manifolds that I wanted to be able to state, he said. But I didnt want the definition to be too general, because then you cannot work with it.

The definition he came up with fills 1,600 lines of code, making it pretty long for a mathlib definition, but maybe slight compared to the mathematical possibilities it unlocks in Lean.

Now that we have the language, we can start proving theorems, he said.

Finding the right definition for an object, at the right level of generality, is a major preoccupation of the mathematicians building mathlib. Its creators hope to define objects in a way thats useful now but flexible enough to accommodate the unanticipated uses mathematicians might have for these objects.

Theres an emphasis on everything being useful far into the future, Macbeth said.

But Lean isnt just useful it offers mathematicians the chance to engage with their work in a new way. Macbeth still remembers the first time she tried a proof assistant. It was 2019 and the program was Coq (though she uses Lean now). She couldnt put it down.

In one crazy weekend I spent 12 hours a day [on it], she said. It was totally addictive.

Other mathematicians talk about the experience the same way. They say working in Lean feels like playing a video game complete with the same reward-based neurochemical rush that makes it hard to put the controller down. You can do 14 hours a day in it and not get tired and feel kind of high the whole day, Livingston said. Youre constantly getting positive reinforcement.

Still, the Lean community recognizes that for many mathematicians, there just arent enough levels to play.

If you were to quantify how much of mathematics is formalized, Id say its way less than one-thousandth of one percent, said Christian Szegedy, an engineer at Google who is working on artificial intelligence systems that he hopes will be able to read and formalize math textbooks automatically.

But mathematicians are increasing the percentage. While today mathlib contains most of the content through second-year undergraduate math, contributors hope to add the rest of the curriculum within a few years a significant milestone.

In the 50 years these systems had existed, not one person had said, Lets sit down and organize a coherent body of mathematics that represents an undergraduate education, Buzzard said. Were making something that will understand the questions in an undergraduate final exam, and that has never been done before.

It will probably take decades before mathlib has the content of an actual research library, but Lean users have shown that such a comprehensive catalog is at least possible that getting there is merely a matter of programming in all the math.

To that end, last year Buzzard, Massot, and Johan Commelin of the University of Freiburg in Germany undertook an ambitious proof-of-concept project. They temporarily put aside the gradual accumulation of undergraduate math and skipped ahead to the vanguard of the field. The goal was to define one of the great innovations of 21st-century mathematics an object called a perfectoid space that was developed over the last decade by Peter Scholze of the University of Bonn. In 2018, the work earned Scholze the Fields Medal, maths highest honor.

Buzzard, Massot and Commelin hoped to demonstrate that, at least in principle, Lean can handle the kind of mathematics that mathematicians really care about. Theyre taking something very sophisticated and recent, and showing its possible to work on these objects with a proof assistant, Mahboubi said.

To define a perfectoid space, the three mathematicians had to combine more than 3,000 definitions of other mathematical objects and 30,000 connections between them. The definitions sprawled across many areas of math, from algebra to topology to geometry. The way they came together in the definition of a single object is a vivid illustration of the way math grows more complex over time and of why its so important to lay the foundations of mathlib correctly.

Many fields of advanced math require every kind of math you learn as an undergraduate, Macbeth said.

The trio succeeded in defining a perfectoid space, but for now at least, mathematicians cant do much with it. Lean needs access to much more mathematics before it can even formulate the kinds of sophisticated questions in which perfectoid spaces emerge.

Its a bit ridiculous that Lean knows what a perfectoid space is, but doesnt know complex analysis, Massot said.

Buzzard agrees, calling the formalization of perfectoid spaces a gimmick the kind of early stunt that new technologies sometimes perform to demonstrate their worth. In this case, it worked.

You shouldnt think that because of our work every mathematician around the Earth started to use a proof assistant, Massot said, but I think quite a few of them noticed and asked a lot of questions.

It will still be a long time before Lean is a real part of mathematical research. But that doesnt mean the program is a science fiction sideshow today. The mathematicians busy developing it see their work as akin to laying the first railroad tracks a necessary start to an important endeavor, even if they might never get to take a ride themselves.

It will be so cool that its worth a big time investment now, Macbeth said. Im investing time now so that somebody in the future can have that amazing experience.

Correction: October 2, 2020A previous version of this article misstated Patrick Massots university affiliation, due to a recent change in its name. The article has been revised accordingly.

Go here to see the original:

Building the Mathematical Library of the Future - Quanta Magazine

- FGLI library moves to Van Pelt with new online ordering and shipping options - The Daily Pennsylvanian - February 1st, 2021
- In brief: Block party to go, free tax prep help, women's business network, library programs and more in Hampton - TribLIVE - February 1st, 2021
- Area news in brief for Feb. 2 - The-review - February 1st, 2021
- Spotswood And Springfield Public Libraries Partner To Present Series Of Virtual Writing Classes This Winter - TAPinto.net - February 1st, 2021
- Bridging the digital divide: Broadband boost coming for northern Macomb County - The Macomb Daily - February 1st, 2021
- Curious about birding? Join Long Hill Library for virtual discussion Wednesday night - New Jersey Hills - February 1st, 2021
- Limited Reopening of Fairview Heights Library Announced - Herald Pubs - February 1st, 2021
- Birding class part of busy week at Brentwood Library - williamsonherald.com - February 1st, 2021
- Bio-Rad Launches Its Western Blot Learning Center, an Online Multi-media Library of Resources on Immunoblotting - The Scientist - February 1st, 2021
- Preserving Black culture on the web is topic for JC Librarys Urban Library Leaders - nj.com - February 1st, 2021
- Guest column: Love is in the air at the Pflugerville Library - Austin American-Statesman - February 1st, 2021
- COLUMN: Library has virtual programs for kids and teens - yoursun.com - February 1st, 2021
- Washington Twp. library gets funding to address climate change - nj.com - February 1st, 2021
- Ridgefield puts twist on Take Your Child to the Library Day - The Ridgefield Press - February 1st, 2021
- Dare County Library announces virtual enrichment series - The Coastland Times | The Coastland Times - The Coastland Times - February 1st, 2021
- OneBook Experience Virtual Book Discussion with Verde Valley Libraries | Sedona.Biz - The Internet Voice of Sedona and The Verde Valley - Sedona.biz - February 1st, 2021
- Local libraries see increase of digital checkouts during pandemic - The Oakland Press - February 1st, 2021
- Library Resources: They're online, free and 24-7 - coppercountrynews - December 8th, 2020
- Cuyahoga County Public Library tops Americas Libraries list, as other Northeast Ohio libraries awarded stars - cleveland.com - December 8th, 2020
- Flint Public Library at Courtland to reopen Wednesday, Dec. 9 with limited lobby service - East Village Magazine - December 8th, 2020
- Raising awareness about access to affordable textbooks | Around the O - AroundtheO - December 8th, 2020
- Euclid Public Library adds access to virtual tutoring to its online services - News-Herald.com - December 8th, 2020
- Flagstaff Public Library finds new ways to reach communities during the pandemic - Arizona Daily Sun - December 8th, 2020
- Stress Reduction Tips Offered in Rockland Library Presentation - Freepress Online - December 8th, 2020
- Poetry and Patronage: The Laubespine-Villeroy Library Rediscovered - Antiques and the Arts Online - December 8th, 2020
- Adelia M. Russell Library awarded technology grant for nearly one-third of residents without broadband - The Alexander City Outlook - December 8th, 2020
- North Dakota architecture, construction firms hired to help build Roosevelt library - Grand Forks Herald - December 8th, 2020
- Qatar National Library Making Sense of the Past in a Digital World - Al-Bawaba - December 8th, 2020
- The National Library does not understand its customers - The Star Online - December 8th, 2020
- Dozens of families come out in support of five Croydon libraries threatened with closure - London News Online - December 8th, 2020
- Library hosts online family activities in December | Limelighter - Herald and News - December 4th, 2020
- Online art auction to benefit Ronceverte Library | State & Region - Beckley Register-Herald - December 4th, 2020
- Hit by pandemic, libraries across cities are turning to a digital chapter - Business Standard - December 4th, 2020
- Brooke Library to show 'Rise of the Guardians' - The Steubenville Herald-Star - December 4th, 2020
- Ferndale Library has new slate of online programs for kids and their parents - Concentrate - December 4th, 2020
- Do you have a book in you? Garland librarys online creative writing courses could help - The Dallas Morning News - December 4th, 2020
- Lewis Library in Fontana is offering several digital events - Fontana Herald-News - December 4th, 2020
- The Making of the Modern Librarian: The Value of School Libraries - eSchool News - December 4th, 2020
- Library hosts online presentation 'FDR and the Jewish Question' - Hudson Valley 360 - December 4th, 2020
- LIBRARIES PRESENT INTRODUCTORY TECHNOLOGY HOW-TO VIDEOS ONLINE - Fort Bend Herald - December 4th, 2020
- Dont divorce her: Rabbis letter to Henry VIII at heart of British Library show - The Times of Israel - December 4th, 2020
- Digital Tools Are Revolutionizing Mental Health Care in the US - Harvard Business Review - December 4th, 2020
- Libraries host online holiday musical performance Dec. 5 | Arts And Entertainment | fbherald.com - Fort Bend Herald - December 4th, 2020
- LIBRARIES SUPPORT NEW WRITERS WITH ONLINE ACTIVITIES | Community | fbherald.com - Fort Bend Herald - December 4th, 2020
- Here is the latest Tier 3 restrictions information covering East Riding Museums and East Riding Libraries - Bridlington Free Press - November 30th, 2020
- The Stroller, Nov. 30, 2020: Events in the Alle-Kiski Valley - TribLIVE - November 30th, 2020
- Half of Districts Lack Connectivity Needed for Widespread Videoconferencing, Device Usage - Education Week - November 30th, 2020
- Aberdeenshire sports facilities and libraries to close over Christmas and New Year - Grampian Online - November 30th, 2020
- Smiley Public Library Young Readers Room: 100 years, generations of readers - Redlands Daily Facts - October 3rd, 2020
- Forest Public Library celebrates 10th year anniversary in new building - Scott County Times - October 3rd, 2020
- Online events for children, families and teens hosted by the Klamath County Library - Herald and News - October 3rd, 2020
- Library of Congress Seeks Cloud-Based Approach for Interacting With Digital Collections - MeriTalk - October 3rd, 2020
- Fort Bragg Seed Library launches fall and winter program - Mendocino Beacon - October 3rd, 2020
- Briggs Lawrence County Public Library to receive 40 hotspots - The Tribune - Ironton Tribune - October 3rd, 2020
- Spooktacular 2020, Going Online! | Sedona.Biz - The Internet Voice of Sedona and The Verde Valley - Sedona.biz - October 3rd, 2020
- Library receives grant to help with pandemic 'burdens' - The Clanton Advertiser - Clanton Advertiser - October 3rd, 2020
- Garland library to mark World Mental Health Day with Zoom event for teens - The Dallas Morning News - October 3rd, 2020
- Live, Online Assistance for Job Seekers, Veterans, and Their Families Paso Robles Press - The Paso Robles Press - October 3rd, 2020
- Friends of the Holliston Library announce upcoming programs - MetroWest Daily News - October 3rd, 2020
- Fulton County Library System Partners with PAPER to Offer Online Tutoring - City of South Fulton Observer - October 3rd, 2020
- Mayor reopens libraries with in-person services increased digital access - Continuing a phased reopen of City services prioritizing the health and... - October 3rd, 2020
- The Ames Library Partners to Digitize Medieval Manuscripts | Illinois Wesleyan - Illinois Wesleyan University - October 3rd, 2020
- Friends of the Library donates $50,000 for 50th anniversary - WBIR.com - October 3rd, 2020
- Calvert Library Foundation Hosts Online Auction - The Southern Maryland Chronicle - October 3rd, 2020
- Riverside County voters, heres where to cast ballots in the Nov. 3 election - Press-Enterprise - October 3rd, 2020
- Northland Public Library reopening with restrictions, virus safety protocols - TribLIVE - August 12th, 2020
- Westlake library proves no one has to be bored during the pandemic - cleveland.com - August 12th, 2020
- All Boston Public Library branches open for to-go during coronavirus pandemic; 62,800 items have been checked - MassLive.com - August 12th, 2020
- Want to learn more about the 2020 U.S. Census? Heres an online workshop - Redlands Daily Facts - August 12th, 2020
- What's happening - Times Herald-Record - August 12th, 2020
- Librarians work to broaden Vanderbilts research reputation with Wikidata tools - Vanderbilt University News - August 12th, 2020
- Challenge of archiving the #MeToo movement Harvard Gazette - Harvard Gazette - August 12th, 2020
- New Collins Library grant is part of $40 million in CARES Act funding from the NEH - The Suburban Times - August 12th, 2020
- Garland libraries are reaching residents online during the coronavirus pandemic - The Dallas Morning News - August 12th, 2020
- ACCESS the Internet Act to enable libraries to lessen broadband gap in rural and low-income communities - ala.org - August 12th, 2020
- Tom Green County Library to receive $50k grant to create wifi hot spots - Standard-Times - August 12th, 2020
- New 1,200-book collection magically appears at the UNO library - Omaha World-Herald - August 12th, 2020
- London-based startup BibliU raises 550K Series A extension to bring digital libraries to more Universities - EU-Startups - August 12th, 2020
- Rockland libraries offer online summer reading programs - The Journal News - June 25th, 2020
- Library events to be online this summer - Vestavia Voice - June 25th, 2020