Liveblog: Open-sourcing Twitter's algorithm

Sourcegraph team, Sourcegraph Discord community

Sourcegraph devs (and our Discord community) will be liveblogging the most interesting things we see once it's published. Follow along here for updates!



02:02pm

We are signing off for now. Check out the following:

01:43pm

Link to code

=======

01:19pm PDT

view the code view the code

12:43pm PDT

Link to code


Link to code


12:24pm PDT

  • Precise code navigation is now on! Example
  • Cody codebase exploration
    • Using Cody to explore the codebase; it pretty quickly found the search indexer, which handles about half of the tweets


12:14pm PDT: LOC

--------------------------------------------------------------------------------
 Language             Files        Lines        Blank      Comment         Code
--------------------------------------------------------------------------------
 Scala                 3007       234531        26038        21493       187000
 Java                  1043       135517        19944        18259        97314
 Python                 152        21817         3561         5681        12575
 C++                     51        10614         1630          466         8518
 Rust                    30         7360          404          275         6681
 Protobuf                90         9456         1484         4514         3458
 C/C++ Header            41         2868          482          377         2009
 Markdown                63         2136          538            0         1598
 SQL                     23         1262           98           82         1082
 YAML                     7         1446          376           19         1051
 XML                      8         1263          175          190          898
 Bourne Shell             9          267           65           29          173
 Toml                     4          124            7            3          114
 reStructuredText         1          132           36            0           96
 CMake                    2          115           21            7           87
 INI                      8           76           15           21           40
 Docker                   1           34            3            6           25
 JSON                     1            5            0            0            5
--------------------------------------------------------------------------------
 Total                 4541       429023        54877        51422       322724
--------------------------------------------------------------------------------

12:04am PDT: communications

Blog post TL;DR (thank you Cody)

  • Twitter is releasing source code for parts of its platform, including its recommendations algorithm
  • The source code is being released on GitHub in two repositories: main repo and ml repo
  • The release aims for maximum transparency while excluding code that could compromise safety/privacy or enable bad actors
  • Training data and model weights for the recommendations algorithm are not being released at this time
  • This is Twitter's first step towards more transparency and they plan to release more code in the future that does not pose significant risks
  • The community is invited to submit GitHub issues and pull requests to suggest improvements to the recommendations algorithm
  • Twitter is working on tools to manage community suggestions and sync changes to internal repositories
  • Security concerns or issues should be reported through Twitter's official bug bounty program on HackerOne
  • Twitter hopes the global community can help identify issues and suggest improvements to lead to a better Twitter
  • Twitter is doing this to increase transparency and build trust with users, customers, and the public


11:50am PDT: code pushed

The code is now live: https://sourcegraph.com/github.com/twitter/the-algorithm

We are digging in!


2:15am PDT: start the countdown

@sqs: A little under 10 hours to go until it's open source. We'll be back closer to 12pm PDT (unless Twitter unexpectedly releases it early, which might happen!). If you want to start exploring the rest of Twitter's open-source code in the meantime, here's a starting point.

Get Cody, the AI code assistant

Cody writes code and answers questions using your own code graph as context—even in complex codebases with multiple code hosts.