In 1996 a pair of friends wrote a program in their dorm room that crawled, cataloged and generally organized what was, at that time, the modest expanse of the internet. Backrub, as it was called then, got a small investment, moved into a garage and became Google.
Today, Google — and its parent company Alphabet — is, in many ways, the backbone of the internet, the trunk from which the webbed branches of the world wide web grow and one of the largest hubs of information that has ever existed. Despite being a software so ingrained into daily life that is has become functionally invisible, the most basic things that Google does (and those which allow it to generate more revenue than many countries) remain a mystery.
At its most simple, Google is a search engine that functions by performing three basic tasks: crawling the internet, indexing content, and, upon command, retrieving what’s been indexed. In action, Google’s software essentially visits every webpage that’s ever been linked to (crawling), makes a copy of the page (indexing) and then promptly repeats, following every link on that page, making a copy of those pages and following every new link ad infinitum. This indexing process generates massive amounts of data (dated estimations guess that Google stores some 15 exabytes — 15 million terabytes or 30 million personal computers worth — at any given time). This inundation of data makes Google’s ability to retrieve search results in a fraction of a second all the more impressive. It’s also why the final function of a search engine is arguably the most important: the retrieval algorithm.
Google’s algorithm is both beautiful and terrifying. Parsed down to generalities, when you enter a search query, Google uses an algorithm known as PageRank that helps to sort search results by two factors: relevance and ranking. But nothing is so simple as it sounds, especially not online. Google’s way of measuring relevance and rank is shockingly personal and it’s likely that no one knows us as well as Google does.
Odds are, if you are like me, when you are logged onto your private computer, you’re logged into Gmail, which means you’re logged into Google, which means that every time you search something, Google uses an algorithm developed specially for you, based on billions of factors — your search history, your browser history, your shopping habits, where you are, where you have been, what devices you are using, your demographic, your family’s demographic and dozens or hundreds or thousands of other factors that we (the unprivy internet novices) don’t even understand are important, but that Google has thought to track. And these factors and the search results they generate create a sort of personalized internet. An internet not so cloistered as the “social media bubble” that troubled so many after 2016’s elections, but one that nonetheless holds the power to skew our perception of knowledge and information.
Which brings me to a common personal refrain: Should I be alarmed? I love how seamlessly Google does everything I ask of it, and the collection of data is what makes their service work: They know everything I could ever need to know before I even ask. It’s delightful and unsettling all at once. (The My Activity page that brazenly packages everything you do on the internet as a sort of personal convenience is a small example of just how much power Google knows that it holds.)
And Google, of course, is just one of many. Our devices and desire for constant connectivity have bulldozed a path for dozens of innocuous-seeming services to make hundreds of billions of dollars off of us — the information that makes us individuals, all of it bought and sold thousands of times over so that when we open a page we see an add and suddenly desire a new pair of boots, even though we just bought the exact pair we thought we wanted.
Admittedly, Google might have been a lot to bite off for the first of what will hopefully be many blog posts throughout the year. But I guess I’m hoping to wrestle with things — with my apathy and my doomsayer inclinations, and, more broadly, to understand and engage with the many great unknowns of the world. And what is more unknown than everything about me that has been crawled, indexed, broken down to ones and zeroes and stored on some server in a far away state to sell me a new pair of boots?
- “How Do Search Engines Work?”
- “How Do Search Engines Operate?”
- “How to find out what Google knows about you and limit the data it collects”
- “How to see everything Google knows about you”
- Google’s My Activity feature
- “From the garage to the Googleplex”
- “How Much Data Does Google Store?”
- “What Will We Do When The World’s Data Hits 163 Zettabytes In 2025?”
- Data Never Sleeps 5.0
- “The Media Bubble Is Worse Than You Think”