Liveblog: Open-sourcing Twitter's algorithm
Sourcegraph team, Sourcegraph Discord community
Sourcegraph devs (and our Discord community) will be liveblogging the most interesting things we see once it's published. Follow along here for updates!
02:02pm
We are signing off for now. Check out the following:
01:43pm
Government requests for intervention on Twitter must have been so pervasive Twitter Engineers even have a class for it in the Twitter Algorithm pic.twitter.com/F05sD5h9Lk
— Alec Sears (@alec_sears) March 31, 2023
=======
01:19pm PDT
view the codeTwitter just released source code for "the algorithm"
— Ólafur Waage (@olafurw) March 31, 2023
Oh, what file is this? Predicates for tweets on the home timeline?
Oh what is that 2nd image? pic.twitter.com/UE3dU8e3Os
view the codeWhat is this?
— David Mander (@davmander) March 31, 2023
(
"author_is_elon",
candidate =>
candidate
.getOrElse(AuthorIdFeature, None).contains(candidate.getOrElse(DDGStatsElonFeature, 0L))),https://t.co/mLdjWWYHrF
12:43pm PDT
The 4 types of Twitter posters, according to the just open-sourced algorithm 😯https://t.co/xTLX77vJ75 pic.twitter.com/SaQN03P9eK
— Amjad Masad ⠕ (@amasad) March 31, 2023
A quick search in Twitter's Recommendation Algorithm for Ukraine. 🇺🇦 topic is on the same list as:
— Mykhailo (@mxpoliakov) March 31, 2023
Do not amplify, do not public publish, medical misinformation, NSFW, and violence. What do you think it means? 🤔 pic.twitter.com/PYqm8pZjI4
12:24pm PDT
- Precise code navigation is now on! Example
- Cody codebase exploration
- Using Cody to explore the codebase; it pretty quickly found the search indexer, which handles about half of the tweets
12:14pm PDT: LOC
--------------------------------------------------------------------------------
Language Files Lines Blank Comment Code
--------------------------------------------------------------------------------
Scala 3007 234531 26038 21493 187000
Java 1043 135517 19944 18259 97314
Python 152 21817 3561 5681 12575
C++ 51 10614 1630 466 8518
Rust 30 7360 404 275 6681
Protobuf 90 9456 1484 4514 3458
C/C++ Header 41 2868 482 377 2009
Markdown 63 2136 538 0 1598
SQL 23 1262 98 82 1082
YAML 7 1446 376 19 1051
XML 8 1263 175 190 898
Bourne Shell 9 267 65 29 173
Toml 4 124 7 3 114
reStructuredText 1 132 36 0 96
CMake 2 115 21 7 87
INI 8 76 15 21 40
Docker 1 34 3 6 25
JSON 1 5 0 0 5
--------------------------------------------------------------------------------
Total 4541 429023 54877 51422 322724
--------------------------------------------------------------------------------
12:04am PDT: communications
Twitter recommendation source code now available to all on GitHub https://t.co/9ozsyZANwa
— Elon Musk (@elonmusk) March 31, 2023
The real magic of Twitter is in our recommendations algorithm, which powers the hit Tweets you see in your For You timeline. We broke down how it all works here: https://t.co/2s5Hk57JPe
— Twitter Engineering (@TwitterEng) March 31, 2023
Blog post TL;DR (thank you Cody)
- Twitter is releasing source code for parts of its platform, including its recommendations algorithm
- The source code is being released on GitHub in two repositories: main repo and ml repo
- The release aims for maximum transparency while excluding code that could compromise safety/privacy or enable bad actors
- Training data and model weights for the recommendations algorithm are not being released at this time
- This is Twitter's first step towards more transparency and they plan to release more code in the future that does not pose significant risks
- The community is invited to submit GitHub issues and pull requests to suggest improvements to the recommendations algorithm
- Twitter is working on tools to manage community suggestions and sync changes to internal repositories
- Security concerns or issues should be reported through Twitter's official bug bounty program on HackerOne
- Twitter hopes the global community can help identify issues and suggest improvements to lead to a better Twitter
- Twitter is doing this to increase transparency and build trust with users, customers, and the public
11:50am PDT: code pushed
The code is now live: https://sourcegraph.com/github.com/twitter/the-algorithm
We are digging in!
2:15am PDT: start the countdown
@sqs: A little under 10 hours to go until it's open source. We'll be back closer to 12pm PDT (unless Twitter unexpectedly releases it early, which might happen!). If you want to start exploring the rest of Twitter's open-source code in the meantime, here's a starting point.