<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en"><generator uri="https://jekyllrb.com/" version="4.4.1">Jekyll</generator><link href="https://blog.faraji.info/feed.xml" rel="self" type="application/atom+xml"/><link href="https://blog.faraji.info/" rel="alternate" type="text/html" hreflang="en"/><updated>2026-06-15T23:14:35+00:00</updated><id>https://blog.faraji.info/feed.xml</id><title type="html">Ali’s Blog</title><subtitle>(Not limited to) Computer Science and Math stuff </subtitle><entry><title type="html">Nested Learning: a new HOPE for AI</title><link href="https://blog.faraji.info/2025/nested-learning/" rel="alternate" type="text/html" title="Nested Learning: a new HOPE for AI"/><published>2025-11-29T23:00:00+00:00</published><updated>2025-11-29T23:00:00+00:00</updated><id>https://blog.faraji.info/2025/nested-learning</id><content type="html" xml:base="https://blog.faraji.info/2025/nested-learning/"><![CDATA[<p>For the last decade, we’ve been operating under a specific mental model: Deep Learning is about stacking layers. We treat these layers (Attention, MLPs, RNNs) as static “processors” that information flows through. We train them once, freeze them, and then expect them to handle the world.</p> <p>But a new paper from Google Research, presented at NeurIPS 2025, argues that this view is an illusion.</p> <p>The paper, titled <strong>“The Illusion of Deep Learning Architectures”</strong> <sup id="fnref:NL_paper"><a href="#fn:NL_paper" class="footnote" rel="footnote" role="doc-noteref">1</a></sup>, proposes a radical shift: <strong>Neural networks aren’t just stacks of filters; they are systems of nested optimization problems</strong>. Every layer in your model is actually a little “learner” trying to solve its own objective function in real-time.</p> <p>If you’ve ever wondered why Transformers are so good or how we can make models that actually <em>learn</em> after training, this paper offers a mathematical explanation that is surprisingly elegant.</p> <h2 id="the-core-insight-architecture--optimization">The Core Insight: Architecture = Optimization</h2> <p>The most mind-bending part of the paper is the proof that well-known architectures are mathematically identical to well-known optimization algorithms.</p> <p>Let’s look at <strong>Linear Attention</strong>, a simpified version of standard component in modern efficient Transformers. Its update rule looks like this:</p> \[\mathcal{M}_{t} = \mathcal{M}_{t-1} + v_t k_t^\top\] <p>Here, $\mathcal{M}$ is the memory state, and it updates by adding the outer product of the current value ($v$) and key ($k$).</p> <p>Now, let’s look at <strong>Gradient Descent (GD)</strong>, the algorithm we use to train networks. The update rule for weights ($W$) is:</p> \[W_{t+1} = W_t - \eta \nabla \mathcal{L}\] <p>The researchers asked a simple question: <strong>What if the Attention layer is actually running Gradient Descent on itself?</strong></p> <p>They define an “Associative Memory” objective, a loss function that measures how well the memory maps keys to values :</p> \[\mathcal{L}(\mathcal{M}) = - \langle \mathcal{M} k_t, v_t \rangle\] <p>If you calculate the gradient of this loss with respect to the memory $\mathcal{M}$, you get $-v_t k_t^\top$. Plug that into the Gradient Descent formula (assuming a learning rate $\eta=1$), and you get:</p> \[\mathcal{M}_{t+1} = \mathcal{M}_t - (-v_t k_t^\top) = \mathcal{M}_t + v_t k_t^\top\] <p><strong>This is the exact same formula as Linear Attention</strong>.</p> <p>This proves that a Linear Attention layer isn’t just a static block; it is a dynamic solver running one step of Gradient Descent on every token to “learn” the current context .</p> <h2 id="from-heuristic-to-hope">From “Heuristic” to “Hope”</h2> <p>Once you realize that <strong>Architecture is Optimization</strong>, you can stop guessing new architectures and start <em>deriving</em> them.</p> <p>If Linear Attention is equivalent to basic Gradient Descent (which we know is kind of slow and clunky), why not upgrade the “inner optimizer”?</p> <ul> <li>What if we used <strong>Momentum</strong>? The paper shows that Momentum acts as a “Meta-Memory” that compresses gradients .</li> <li>What if we used <strong>Adam</strong>?</li> <li>What if we used a <strong>Neural Network</strong> to predict the updates?</li> </ul> <p>This line of thinking led the authors to develop <strong>HOPE</strong>, a new architecture that treats the model as a “Continuum Memory System”.</p> <h2 id="the-hope-architecture-mimicking-brain-waves">The HOPE Architecture: Mimicking Brain Waves</h2> <p>Current LLMs suffer from “Anterograde Amnesia”. They can process the immediate context (short-term) and remember their pre-training (long-term), but they can’t form new long-term memories after deployment .</p> <p>HOPE fixes this by mimicking the human brain’s oscillations (brain waves). The brain doesn’t have one clock; it has fast Gamma waves for immediate processing and slow Delta waves for long-term consolidation.</p> <p>HOPE implements this mathematically by nesting optimization loops with different “frequencies”:</p> <ol> <li><strong>High-Frequency Neurons:</strong> These update rapidly (like standard attention) to handle the immediate token-by-token flow.</li> <li><strong>Low-Frequency Neurons:</strong> These are “slow learners” that only update every $C$ steps, consolidating information into long-term storage.</li> </ol> <p>This structure allows the model to be <strong>Self-Modifying</strong>. It doesn’t just read data; it fundamentally rewrites its own parameters on the fly, allowing for true continual learning without catastrophic forgetting .</p> <h2 id="why-this-matters">Why This Matters</h2> <p>This paper is exciting because it moves us away from the black-box alchemy of “adding more layers”. It gives us a white-box theoretical framework: <strong>if you want a better architecture, design a better optimizer</strong>.</p> <p>By unifying the math of <em>how we learn</em> (optimization) with <em>what we build</em> (architecture), Nested Learning suggests that the next generation of AI won’t just be bigger, it will be smarter about how it manages its own mind.</p> <p>– Ali</p> <h2 id="references">References</h2> <div class="footnotes" role="doc-endnotes"> <ol> <li id="fn:NL_paper"> <p><a href="https://openreview.net/forum?id=nbMeRvNb7A">Nested Learning: The Illusion of Deep Learning Architectures (NeurIPS 2025)</a> <a href="#fnref:NL_paper" class="reversefootnote" role="doc-backlink">&#8617;</a></p> </li> </ol> </div>]]></content><author><name></name></author><category term="AI"/><category term="Machine Learning"/><category term="Research"/><category term="Nested Learning"/><category term="AI"/><category term="Machine Learning"/><category term="Research"/><category term="Optimization"/><category term="Continiual Learning"/><summary type="html"><![CDATA[A team of researchers has proposed a novel view of neural networks called Nested Learning, basically it is a new view on how optimization and archtecture are the same thing using this to develope a method to update the model's knowledge without breaking existing capabilities.]]></summary></entry><entry><title type="html">From Words to Worlds: Dreaming of a Universal Language of Space-Time</title><link href="https://blog.faraji.info/2025/word-to-world/" rel="alternate" type="text/html" title="From Words to Worlds: Dreaming of a Universal Language of Space-Time"/><published>2025-09-17T03:57:32+00:00</published><updated>2025-09-17T03:57:32+00:00</updated><id>https://blog.faraji.info/2025/word-to-world</id><content type="html" xml:base="https://blog.faraji.info/2025/word-to-world/"><![CDATA[<p>In just a few years, large language models (LLMs) have transformed the AI landscape. From chatbots that write essays to systems that translate dozens of languages in real time, the field of natural language processing feels like it has blossomed into maturity. Text, with its neat sequence of tokens, has proven to be a wonderfully uniform substrate for machine learning. Give a model enough words, and it learns to reason, summarize, and even create. But what about the world beyond words, the world we move through, build on, and change every day?</p> <h2 id="the-uneven-map-of-map-ai">The Uneven Map of “Map AI”</h2> <p>When it comes to spatial and temporal data such as maps, mobility traces, satellite images, the progress is far less tidy. We have excellent point solutions: routing apps, traffic forecasting, crop-yield estimators, wildfire monitors. Yet the field lacks the shared foundation that NLP enjoys. Datasets are fragmented, privacy concerns limit data sharing, and the problems themselves span wildly different scales.</p> <p>Think about the challenges still wide open:</p> <ul> <li>Predicting city-scale traffic during an unexpected event.</li> <li>Updating a map in real time after a hurricane destroys infrastructure.</li> <li>Forecasting the spread of an epidemic using only anonymized mobility signals.</li> <li>Monitoring illegal deforestation from noisy satellite streams.</li> <li>Having a highly noisy GPS traces (a lot of spelling mistakes) and trying to figure out the actual path taken on a map and performing the target task well.</li> </ul> <h2 id="a-question-of-representation">A Question of Representation</h2> <p>In language, the breakthrough came from <strong>representation</strong>: words and sub-words as sequences of tokens. Could we find an equally universal representation for the physical world? One where, once the model has “read” enough of it, we can ask complex questions about traffic, climate, migration, and receive grounded answers?</p> <p>That’s the dream: a <em>language of space-time</em>.</p> <h2 id="sketching-a-language-of-space-time">Sketching a Language of Space-Time</h2> <p>Imagine a representation $\mathcal{R}$ that treats <strong>space and time as first-class citizens</strong> and combines:</p> <ol> <li><strong>Objects</strong> – discrete entities like roads, buildings, rivers, or even people, each with geometry and attributes.</li> <li><strong>Fields</strong> – continuous measurements such as temperature, elevation, air quality.</li> <li><strong>Events</strong> – point occurrences anchored in space and time: an accident, a flood, a festival.</li> <li><strong>Relations</strong> – topological or metric links: adjacency, containment, causality.</li> <li><strong>Dynamics</strong> – how objects and fields evolve over time, capturing motion, growth, decay.</li> </ol> <p>Think of $\mathcal{R}$ as a <em>typed spatio-temporal knowledge graph</em> where a city’s road network, the shifting jet stream, and a person’s walking trajectory all live in one coherent structure.</p> <p>On top of this, we could define tasks just as NLP defines next-token prediction:</p> <ul> <li><strong>Prediction:</strong> forecast future states of $\mathcal{R}$ (traffic, weather, population flows).</li> <li><strong>Reasoning:</strong> answer complex spatial queries or infer causes of observed patterns.</li> <li><strong>Optimization:</strong> search for interventions like rerouting traffic, allocating emergency resources that reshape $\mathcal{R}$ toward a goal.</li> </ul> <h2 id="just-a-thought-for-now">Just a Thought, for Now</h2> <p>This is only an idea, a sketch of what might be possible. But if language models taught us anything, it’s that a powerful, shared representation can ignite an entire field. Perhaps one day we’ll have a “universal geo-time model” that understands our planet as fluently as today’s LLMs understand text.</p> <p>I understand that the language is a symbolic one-dimensional representation, while the world is continuous and multi-dimensional. Maybe even comparing the two is like comparing apples and oranges. But the success of LLMs makes me wonder if we can find a middle ground.</p> <p>For now, it remains a dream I hope to explore and maybe, someday, help bring to life.</p> <p>– Ali</p>]]></content><author><name></name></author><category term="Machine Learning"/><category term="Representation Learning"/><category term="Language Models"/><category term="Universal Representation"/><category term="Spatio-Temporal Data"/><category term="AI"/><summary type="html"><![CDATA[In just a few years, large language models (LLMs) have transformed the AI landscape. From chatbots that write essays to systems that translate dozens of languages in real time, the field of natural language processing feels like it has blossomed into maturity. Text, with its neat sequence of tokens, has proven to be a wonderfully uniform substrate for machine learning. Give a model enough words, and it learns to reason, summarize, and even create. But what about the world beyond words, the world we move through, build on, and change every day?]]></summary></entry><entry><title type="html">Game Theory: Fair Division</title><link href="https://blog.faraji.info/2025/fair-division/" rel="alternate" type="text/html" title="Game Theory: Fair Division"/><published>2025-08-17T01:29:04+00:00</published><updated>2025-08-17T01:29:04+00:00</updated><id>https://blog.faraji.info/2025/fair-division</id><content type="html" xml:base="https://blog.faraji.info/2025/fair-division/"><![CDATA[<h2 id="introduction">Introduction</h2> <p>I’ve always wanted to write about game theory but never did. Recently, I started flipping through my old game theory books and came across mechanism design. Today, I’ll mention something that I think you can also use in your life: division problems and their different variations.</p> <p>You can google Game Theory, Alive<sup id="fnref:alive"><a href="#fn:alive" class="footnote" rel="footnote" role="doc-noteref">1</a></sup>, I would say that’s a good book. Also it includes topics on Stable Matching, Voting, Auctions, Truthfulness, and more. But today, I want to focus on fair division.</p> <h2 id="cake-cutting-problem">Cake-Cutting Problem</h2> <p>Let’s assume there is a cake we want to divide between two people. What do we do? What is the best way to divide it? Obviously, one person cuts and the other one picks. There is no way either of them would be upset: they have gotten at least $1/n$ of their valuation.</p> <p>Let me give an example. I love the cherries on the cake and I’d say that’s the best part you can get. So I divide it so that one cherry is on one side and the other cherry is on the other side. Now I offer my friend to choose. If he has the same taste, he can get half of the best part and I will too; but if he doesn’t like cherries, he can choose whichever part makes him happier. So he will definitely get half or more of the part he like.</p> <p>This gets a little bit tricky when it becomes $n \ge 3$. What should we do then so we can still guarantee fairness for any number of people, and what kind of procedure would work when everyone values the cake differently? <strong>Think about it before jumping to the end of the post</strong> 😛</p> <p>Mathematically speaking, if we want to divide the continuous range $[0,1] \subset \mathbb{R}$ (the cake) and each of $n$ participants has a valuation function $v_i$ for the cake, if the following condition holds, we can say it is a fair division:</p> <p><strong>Fair (proportional) division condition:</strong> find a partition $(r_1,\dots,r_n)$ of $[0,1]$ such that</p> <p>\begin{equation} v_i(r_i)\ \ge\ \frac{1}{n}\qquad \forall i\in{1,\dots,n}, \end{equation}</p> <p>assuming $v_i([0,1])=1$.</p> <h2 id="envy-free-division">Envy-Free Division</h2> <p>In addition, we have a more restricted condition; it’s called envy-free division. This means that no one should be envious of another person’s share.</p> <p>Mathematically speaking, we can express this as follows: \(v_i(r_i) \ge v_i(r_j) \quad \forall i,j \in [1,n].\)</p> <p>This guarantees a fair division; it is stricter than simple fair division.</p> <p>But finding an envy-free division is more challenging. I will write about it in the next post.</p> <h2 id="simple-fair-division-algorithm-for-any-n">Simple Fair Division Algorithm for any $n$</h2> <p>Did you find the solution for the simple fair division for the $n$ people involved for the cake?</p> <details><summary><strong>Moving-knife (proportional for any $n$)</strong> (Click here to know more)</summary> <ul> <li>Start with all $n$ players and the cake interval $[0,1]$. Place a knife at the left end and move it continuously to the right.</li> <li>While $k$ players remain: <ul> <li>As the knife moves, any player $i$ may call “stop” the first time the left-hand piece (from the current left boundary $L$ to knife position $x$) is worth exactly $1/k$ of the whole cake to them, i.e., $v_i([L,x])=1/k$.</li> <li>Cut at $x$, give $[L,x]$ to that player, remove them from the game, set $L\leftarrow x$, and continue with $k-1$ players.</li> </ul> </li> <li>When two players remain, use cut-and-choose on the remainder.</li> <li><strong>Guarantee:</strong> each player receives a piece they value at least $1/n$, hence a fair (proportional) division.</li> </ul> </details> <h2 id="references">References</h2> <div class="footnotes" role="doc-endnotes"> <ol> <li id="fn:alive"> <p><a href="https://homes.cs.washington.edu/~karlin/GameTheoryBook.pdf">Game Theory, Alive</a> <a href="#fnref:alive" class="reversefootnote" role="doc-backlink">&#8617;</a></p> </li> </ol> </div>]]></content><author><name></name></author><category term="Math"/><category term="Game Theory"/><category term="Fair Division"/><category term="Envy-Free"/><category term="Cake-Cutting"/><category term="Moving-Knife"/><category term="Proportional Division"/><summary type="html"><![CDATA[I’ve always wanted to write about game theory but never did. Recently, I started flipping through my old game theory books and came across mechanism design. Today, I’ll mention something that I think you can also use in your life: division problems and their different variations.]]></summary></entry><entry><title type="html">Thoughts on Subliminal Learning from Anthropic team</title><link href="https://blog.faraji.info/machine-learning/subliminal-learning.html" rel="alternate" type="text/html" title="Thoughts on Subliminal Learning from Anthropic team"/><published>2025-07-27T20:29:04+00:00</published><updated>2025-07-27T20:29:04+00:00</updated><id>https://blog.faraji.info/machine-learning/subliminal-learning</id><content type="html" xml:base="https://blog.faraji.info/machine-learning/subliminal-learning.html"><![CDATA[<p>They have published a blog post<sup id="fnref:subliminal-blog"><a href="#fn:subliminal-blog" class="footnote" rel="footnote" role="doc-noteref">1</a></sup> and a paper<sup id="fnref:subliminal-paper"><a href="#fn:subliminal-paper" class="footnote" rel="footnote" role="doc-noteref">2</a></sup> on this topic, and they have also released a GitHub repository<sup id="fnref:subliminal-github"><a href="#fn:subliminal-github" class="footnote" rel="footnote" role="doc-noteref">3</a></sup> with the code.</p> <p>While this might be true if we have distilled the logits or probabilities<sup id="fnref:distillation"><a href="#fn:distillation" class="footnote" rel="footnote" role="doc-noteref">4</a></sup>, they say that it is transmitted just by a text that is not related (e.g., sequences of numbers).</p> <p>This is interesting for me in three aspects:</p> <ul> <li>Safety issues with what is being transmitted</li> <li>Inference attacks</li> <li>Do we have this in humans?</li> </ul> <h2 id="safety">Safety</h2> <p>They also mention in their blog post and paper that this might cause some safety issues. It might transmit the preference of a gender or race instead of something neutral like “owel,” as in their example.</p> <p>Though they claim that this subliminal learning is not the case for training different architectures and different initializations (they have a theorem for the same initialization in their paper).</p> <p>I’m giving a scenario here off the top of my head, I have not tried this, but it is apparently possible.</p> <p>We know that someone/company might be using a third-party service to get fine-tuned models.<br/> I will fine-tune the same model (from the third-party company) to be biased toward a race/gender or country or anything, and then create a good dataset or text (remove the explicit bias from the dataset) and release it for a task.</p> <p>If I somehow get them to use our data for fine-tuning their model, I have injected my biases into their system without any trace.<br/> This is <strong>huge</strong>, now you cannot trust public datasets even if they have been used in other research. It might not be a good thing because that dataset could be designed specifically for your models and your company :))</p> <h2 id="model-inference">Model Inference</h2> <p>The inverse of this is inference attacks, like generating a text and finding out whether the fine-tuned model is trained based on a specific model.<br/> I’m not thinking of anything harmful here, but this is definitely something that can happen.</p> <h2 id="in-humans">In Humans</h2> <p>Last but not least, does this happen in humans? I’m not sure, but I think it’s possible.<br/> We learn a lot of things without being aware of them, like cultural norms, social cues, and even language patterns. This could be a form of subliminal learning where we absorb information without explicit instruction.</p> <p>Can we be racist without explicitly being taught to be?<br/> For example, if I ask what is the best drink and my parent answers in a way that doesn’t even imply anything, but I still learn to be racist or biased toward a specific group of people.<br/> This is a form of subliminal learning that can have real-world implications.</p> <p>You can think of social media, news, and other forms of media that shape our perceptions without us even realizing it (though there are some tricks even now that manipulate people’s minds and thoughts, but that’s not the point of this post).</p> <p>This is a very interesting question, but very different, because the architecture and initialization are different, and even our neurons are not the same as LLMs. Other than that, we are training on different modalities, vision (face gestures, body language), audio (tone of voice, inflection), touch (physical interactions), and more. Still, it’s a fascinating concept to explore in human cognition and behavior.</p> <h2 id="conclusion">Conclusion</h2> <p>Subliminal learning is a fascinating concept that raises important questions about safety, inference attacks, and human behavior.<br/> As we continue to explore the capabilities of AI and machine learning, it is crucial to consider the implications of subliminal learning, and how it can affect our models, datasets, and even our understanding of human cognition.</p> <p>– Ali</p> <h2 id="references">References</h2> <div class="footnotes" role="doc-endnotes"> <ol> <li id="fn:subliminal-blog"> <p><a href="https://alignment.anthropic.com/2025/subliminal-learning/">subliminal learning blog post</a> <a href="#fnref:subliminal-blog" class="reversefootnote" role="doc-backlink">&#8617;</a></p> </li> <li id="fn:subliminal-paper"> <p><a href="https://arxiv.org/abs/2507.14805">Subliminal Learning: Language models transmit behavioral traits via hidden signals in data</a> <a href="#fnref:subliminal-paper" class="reversefootnote" role="doc-backlink">&#8617;</a></p> </li> <li id="fn:subliminal-github"> <p><a href="https://github.com/MinhxLe/subliminal-learning">GitHub: MinhxLe/subliminal-learning</a> <a href="#fnref:subliminal-github" class="reversefootnote" role="doc-backlink">&#8617;</a></p> </li> <li id="fn:distillation"> <p><a href="https://arxiv.org/abs/1503.02531">Distilling the Knowledge in a Neural Network</a> <a href="#fnref:distillation" class="reversefootnote" role="doc-backlink">&#8617;</a></p> </li> </ol> </div>]]></content><author><name></name></author><category term="Machine Learning"/><category term="Safety"/><category term="LLMs"/><category term="Subliminal Learning"/><category term="AI Safety"/><category term="Ethics"/><category term="Inference Attacks"/><category term="Human Cognition"/><summary type="html"><![CDATA[Yesterday I came across a blog post from the Anthropic team about transmitting hidden data without explicitly distilling it to the student. This is called subliminal learning. In this post, I share my thoughts on this topic and its implications.]]></summary></entry><entry><title type="html">ICML 2025 top 50 Papers That Might Shape the Future of ML — According to Gemini</title><link href="https://blog.faraji.info/machine-learning/icml-2025-top-papers.html" rel="alternate" type="text/html" title="ICML 2025 top 50 Papers That Might Shape the Future of ML — According to Gemini"/><published>2025-07-12T09:33:04+00:00</published><updated>2025-07-12T09:33:04+00:00</updated><id>https://blog.faraji.info/machine-learning/top-icml-2025-papers-chat-gpt</id><content type="html" xml:base="https://blog.faraji.info/machine-learning/icml-2025-top-papers.html"><![CDATA[<p>So I tried something different.</p> <p>I asked <strong>Gemini</strong> to go through the full list of ICML 2025 papers, titles and provided with the link to the paper, and pick <strong>50 that seem especially promising</strong>. The selection favors papers with conceptual novelty, foundational depth, and related to Foundation models, interpretability, or generative models.</p> <p>Did I get it right? Maybe. Maybe not. That’s part of the experiment. Maybe a different prompting can lead to a totally different list.</p> <h2 id="how-this-list-was-made">How This List Was Made</h2> <p>This is not a leaderboard or a citation-based ranking. I wasn’t given metrics, reviews, or author prestige. Just the raw metadata from ICML 2025, titles, and links.</p> <p>But ultimately, <strong>this is a speculative curation</strong>. I haven’t read the full texts (and neither have most people, because they were just released). The goal is to surface ideas that seem worth deeper attention, not to declare winners.</p> <h2 id="list-of-50-papers">List of 50 Papers</h2> <ol> <li><a href="https://icml.cc/virtual/2025/poster/44561">Strategy Coopetition Explains the Emergence and Transience of In-Context Learning</a> <ul> <li><strong>Abstract summary</strong>: This paper likely presents a new theory that models in-context learning as an emergent phenomenon arising from both competitive and cooperative dynamics among model components.</li> <li><strong>Reason for inclusion</strong>: High relevance to foundational models, offering a novel theoretical explanation for a key, yet poorly understood, capability.</li> </ul> </li> <li><a href="https://icml.cc/virtual/2025/poster/43836">MIB: A Mechanistic Interpretability Benchmark</a> <ul> <li><strong>Abstract summary</strong>: Introduces a new benchmark (MIB) designed to systematically evaluate and compare techniques in mechanistic interpretability.</li> <li><strong>Reason for inclusion</strong>: Foundational for the field of interpretability. A standardized benchmark is critical for making measurable progress.</li> </ul> </li> <li><a href="https://icml.cc/virtual/2025/poster/45954">Elucidating the design space of language models for image generation</a> <ul> <li><strong>Abstract summary</strong>: This work likely explores and maps out the key architectural and data-related choices when using LLMs to drive or control image generation processes.</li> <li><strong>Reason for inclusion</strong>: Directly addresses the intersection of generative models and foundational models, a key area of current research.</li> </ul> </li> <li><a href="https://www.google.com/search?q=https.icml.cc/virtual/2025/poster/46498">Subspace Optimization for Large Language Models with Convergence Guarantees</a> <ul> <li><strong>Abstract summary</strong>: Proposes a new optimization method for LLMs that operates in a lower-dimensional subspace, likely improving efficiency while providing theoretical guarantees of convergence.</li> <li><strong>Reason for inclusion</strong>: Addresses the critical practical challenge of training and fine-tuning massive foundational models with a theoretically grounded approach.</li> </ul> </li> <li><a href="https://icml.cc/virtual/2025/poster/43934">R*: Efficient Reward Design via Reward Structure Evolution and Parameter Alignment Optimization with Large Language Models</a> <ul> <li><strong>Abstract summary</strong>: Introduces a method that leverages LLMs to automatically design and optimize complex reward functions for reinforcement learning.</li> <li><strong>Reason for inclusion</strong>: Novel application of LLMs to automate a traditionally human-intensive task in RL, with high potential impact on training more capable agents.</li> </ul> </li> <li><a href="https://icml.cc/virtual/2025/poster/43485">Let LLM Tell What to Prune and How Much to Prune</a> <ul> <li><strong>Abstract summary</strong>: This paper likely presents a novel method using a large language model to intelligently decide which parts of a neural network to prune for efficiency.</li> <li><strong>Reason for inclusion</strong>: Innovative use of LLMs for model optimization, falling under the umbrella of interpretability and practical application of foundational models.</li> </ul> </li> <li><a href="https://icml.cc/virtual/2025/poster/45086">MODULI: Unlocking Preference Generalization via Diffusion Models for Offline Multi-Objective Reinforcement Learning</a> <ul> <li><strong>Abstract summary</strong>: Introduces MODULI, a method that uses diffusion models to better generalize user preferences in a multi-objective RL setting.</li> <li><strong>Reason for inclusion</strong>: High novelty, combining generative diffusion models with reinforcement learning to solve a challenging preference alignment problem.</li> </ul> </li> <li><a href="https://icml.cc/virtual/2025/poster/44393">MCU: An Evaluation Framework for Open-Ended Game Agents</a> <ul> <li><strong>Abstract summary</strong>: Proposes a new framework (MCU) for evaluating agents in complex, open-ended gaming environments, a key challenge for modern AI.</li> <li><strong>Reason for inclusion</strong>: Foundational contribution to evaluation methodology, which is crucial for progress in agentic AI and foundational models.</li> </ul> </li> <li><a href="https://icml.cc/virtual/2025/poster/44053">ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference</a> <ul> <li><strong>Abstract summary</strong>: This paper introduces an innovative caching technique (ShadowKV) to significantly speed up inference for LLMs with very long contexts.</li> <li><strong>Reason for inclusion</strong>: Addresses a major bottleneck in the practical deployment of large foundational models, making it highly impactful.</li> </ul> </li> <li><a href="https://icml.cc/virtual/2025/poster/46372">SyncMind: Measuring Agent Out-of-Sync Recovery in Collaborative Software Engineering</a> <ul> <li><strong>Abstract summary</strong>: Focuses on the crucial but understudied problem of how AI agents recover from errors and get “back in sync” when collaborating on software development tasks.</li> <li><strong>Reason for inclusion</strong>: Tackles a frontier problem in agentic AI and foundational models, focusing on the practical challenges of multi-agent collaboration.</li> </ul> </li> <li><a href="https://icml.cc/virtual/2025/poster/45136">UnHiPPO: Uncertainty-aware Initialization for State Space Models</a> <ul> <li><strong>Abstract summary</strong>: Proposes a new initialization method for State Space Models (SSMs) that explicitly accounts for uncertainty, likely improving stability and performance.</li> <li><strong>Reason for inclusion</strong>: SSMs are a major alternative to Transformers. Foundational work on improving their training and stability is highly relevant.</li> </ul> </li> <li><a href="https://icml.cc/virtual/2025/poster/45141">SE(3)-Equivariant Diffusion Policy in Spherical Fourier Space</a> <ul> <li><strong>Abstract summary</strong>: This paper likely develops a diffusion-based policy for robotics or 3D tasks that correctly handles 3D rotations and translations by operating in a spherical Fourier space.</li> <li><strong>Reason for inclusion</strong>: Connects generative models (diffusion) with important geometric principles (equivariance) for physical interaction tasks, representing a key research direction.</li> </ul> </li> <li><a href="https://icml.cc/virtual/2025/poster/45139">Fast Video Generation with Sliding Tile Attention</a> <ul> <li><strong>Abstract summary</strong>: Introduces a novel attention mechanism (“Sliding Tile Attention”) to make video generation models faster and more efficient.</li> <li><strong>Reason for inclusion</strong>: Directly addresses a key challenge in generative AI (video generation speed) with a new architectural innovation.</li> </ul> </li> <li><a href="https://icml.cc/virtual/2025/poster/45135">Preference learning made easy: Everything should be understood through win rate</a> <ul> <li><strong>Abstract summary</strong>: Proposes a unifying, simplified framework for preference learning, suggesting that various methods can be interpreted and analyzed through the simple metric of win rate.</li> <li><strong>Reason for inclusion</strong>: Potentially foundational work that simplifies and unifies the theory behind aligning models with human preferences, a cornerstone of modern LLM development.</li> </ul> </li> <li>How Far are We in Multimodal Large Language Models? <ul> <li><strong>Abstract summary</strong>: This paper likely provides a comprehensive survey and analysis of the current capabilities and limitations of multimodal LLMs.</li> <li><strong>Reason for inclusion</strong>: Foundational benchmark/survey paper that helps structure the understanding of a rapidly evolving field.</li> </ul> </li> <li>Generative Pre-training for Tabular Data <ul> <li><strong>Abstract summary</strong>: Explores the application of generative pre-training, a successful paradigm from LLMs, to the domain of tabular data.</li> <li><strong>Reason for inclusion</strong>: Extends the foundational model paradigm to a different data modality, which has significant practical implications.</li> </ul> </li> <li>Beyond Chain-of-Thought: A Survey on Complex Reasoning of Large Language Models <ul> <li><strong>Abstract summary</strong>: A survey paper that likely categorizes and reviews advanced reasoning techniques in LLMs that go beyond simple Chain-of-Thought prompting.</li> <li><strong>Reason for inclusion</strong>: Provides a structured overview of the cutting edge in LLM reasoning, a central topic for foundational models.</li> </ul> </li> <li>Do Large Language Models Know about Their Generation Probability? <ul> <li><strong>Abstract summary</strong>: Investigates whether LLMs have an internal, calibrated sense of the likelihood of the text they generate, which is key for reliability.</li> <li><strong>Reason for inclusion</strong>: Addresses a fundamental question about the self-awareness and calibration of foundational models, relevant to both interpretability and safety.</li> </ul> </li> <li>A Mechanistic Understanding of Alignment: A Case Study in Function Vectors <ul> <li><strong>Abstract summary</strong>: This work likely provides a mechanistic interpretability analysis of how alignment techniques like RLHF change the internal representations of a model.</li> <li><strong>Reason for inclusion</strong>: High relevance to interpretability, aiming to explain the “how” and “why” of model alignment procedures at a mechanistic level.</li> </ul> </li> <li>UniG-guided: A Unified and Guided Framework for All-in-one Image Generation <ul> <li><strong>Abstract summary</strong>: Proposes a single, unified framework that can handle various conditional image generation tasks (e.g., text-to-image, editing, inpainting) within one model.</li> <li><strong>Reason for inclusion</strong>: High impact potential for generative models, moving towards more consolidated and versatile creative tools.</li> </ul> </li> <li>Mega-T: A General-purpose Foundation Model for Tabular Data <ul> <li><strong>Abstract summary</strong>: This paper likely introduces a new large-scale, pre-trained model (“Mega-T”) specifically designed for a wide range of tasks on tabular data.</li> <li><strong>Reason for inclusion</strong>: Clear attempt to create a foundational model for the tabular data domain, a significant and practical research direction.</li> </ul> </li> <li>Scaling Laws for Forgetting in Large Language Models <ul> <li><strong>Abstract summary</strong>: Investigates the relationship between model scale and catastrophic forgetting, likely deriving scaling laws that predict how and when LLMs forget information.</li> <li><strong>Reason for inclusion</strong>: Foundational theoretical work on a critical limitation of current models, relevant for continual learning and model updates.</li> </ul> </li> <li>Emergent Communication Explains the Surprising Reasoning Skills of Large Language Models <ul> <li><strong>Abstract summary</strong>: Proposes a novel theory that the reasoning abilities of LLMs are an emergent consequence of learning to communicate effectively.</li> <li><strong>Reason for inclusion</strong>: Offers a high-level conceptual explanation for the emergent properties of foundational models, a key scientific question.</li> </ul> </li> <li>What happens when a transformer-based language model is a compression algorithm? <ul> <li><strong>Abstract summary</strong>: Explores the deep connection between language modeling and data compression, likely analyzing model internals from a compression perspective.</li> <li><strong>Reason for inclusion</strong>: High relevance to interpretability and foundational understanding, framing model behavior through a powerful theoretical lens (Kolmogorov complexity).</li> </ul> </li> <li>Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models <ul> <li><strong>Abstract summary</strong>: Introduces a fine-tuning method based on self-play, where a language model improves itself, and shows this can dramatically boost the capabilities of smaller models.</li> <li><strong>Reason for inclusion</strong>: High potential impact, suggesting a path to create powerful models without needing to start with a massive base model.</li> </ul> </li> <li>A Theory of Counter-Example Guided Interpretability <ul> <li><strong>Abstract summary</strong>: Develops a theoretical framework for interpretability methods that rely on generating counter-examples to explain model decisions.</li> <li><strong>Reason for inclusion</strong>: Foundational contribution to the theory of interpretability, providing rigor to a popular class of explanation methods.</li> </ul> </li> <li>Generative World Models for Transfer in Reinforcement Learning <ul> <li><strong>Abstract summary</strong>: This paper likely uses generative models to build “world models” that an RL agent can use to simulate future outcomes, specifically to improve transfer to new tasks.</li> <li><strong>Reason for inclusion</strong>: Connects generative models with RL to tackle the key challenge of generalization and transfer learning.</li> </ul> </li> <li>Diffusion Models as Interpretable and Controllable Recommender Systems <ul> <li><strong>Abstract summary</strong>: Proposes using diffusion models for recommendation tasks, arguing this approach offers better interpretability and user control over recommendations.</li> <li><strong>Reason for inclusion</strong>: Novel application of generative models to a classic domain, with a specific focus on the benefits for interpretability.</li> </ul> </li> <li>Vision-Language Models as a Source of Reward <ul> <li><strong>Abstract summary</strong>: Explores using pre-trained vision-language models to provide dense reward signals for training robotic or embodied agents, avoiding manual reward engineering.</li> <li><strong>Reason for inclusion</strong>: Leverages the knowledge in foundational models to solve a bottleneck in another field (RL), a powerful and trending paradigm.</li> </ul> </li> <li>The Expressive Power of Transformers with Chain-of-Thought <ul> <li><strong>Abstract summary</strong>: A theoretical paper analyzing the expressive power of the Transformer architecture specifically when it is prompted to use Chain-of-Thought reasoning.</li> <li><strong>Reason for inclusion</strong>: Foundational theory for LLMs, aiming to understand why and how popular prompting techniques actually work.</li> </ul> </li> <li>The Geometry of Truth: Visualizing Language Model Representations <ul> <li><strong>Abstract summary</strong>: This work likely introduces new techniques for visualizing the internal representation space of LLMs to understand how they represent concepts like truthfulness.</li> <li><strong>Reason for inclusion</strong>: Directly addresses interpretability with a focus on novel visualization methods for understanding abstract properties.</li> </ul> </li> <li>Watermarking Large Language Models with Backdoored Prompts <ul> <li><strong>Abstract summary</strong>: Proposes a method for watermarking LLMs to trace their outputs, likely by embedding a secret trigger via backdoored prompts during training.</li> <li><strong>Reason for inclusion</strong>: Important topic at the intersection of model security, ownership, and interpretability (as it involves manipulating model behavior).</li> </ul> </li> <li>Can We Trust the Interpretations of Local Interpretable Model-Agnostic Explanations (LIME)? A Theoretical and Empirical Analysis <ul> <li><strong>Abstract summary</strong>: A critical analysis of LIME, one of the most popular interpretability techniques, likely examining its theoretical soundness and empirical reliability.</li> <li><strong>Reason for inclusion</strong>: Foundational interpretability work that provides a necessary critical perspective on widely used tools.</li> </ul> </li> <li>Diff-Pruning: Pruning via Diffusion-based Weight Importance <ul> <li><strong>Abstract summary</strong>: Introduces a new model pruning technique where the importance of weights is determined using a diffusion process.</li> <li><strong>Reason for inclusion</strong>: A novel and interesting combination of generative model techniques (diffusion) with the goal of model compression and efficiency.</li> </ul> </li> <li><a href="https://icml.cc/virtual/2025/poster/44593">ExpProof : Operationalizing Explanations for Confidential Models with ZKPs</a> <ul> <li><strong>Abstract summary</strong>: This paper likely proposes a method to provide verifiable explanations for a model’s prediction without revealing the model’s weights, using Zero-Knowledge Proofs.</li> <li><strong>Reason for inclusion</strong>: Highly innovative work connecting interpretability with cryptography to solve the problem of explaining private or proprietary models.</li> </ul> </li> <li>Towards a Foundation Model for Proteins <ul> <li><strong>Abstract summary</strong>: Likely describes the development and training of a large-scale, pre-trained model for a variety of tasks related to protein biology.</li> <li><strong>Reason for inclusion</strong>: A clear example of extending the “foundational model” concept to a critical scientific domain beyond language.</li> </ul> </li> <li>Training-Free Structure-Guided Diffusion Model for 3D Molecule Generation <ul> <li><strong>Abstract summary</strong>: Proposes a diffusion-based model for generating 3D molecules that can be guided by structural constraints without requiring additional training.</li> <li><strong>Reason for inclusion</strong>: Strong relevance to generative models, applied to the high-impact area of drug discovery and molecular design.</li> </ul> </li> <li>Emergent Modularity in Pre-trained Transformers <ul> <li><strong>Abstract summary</strong>: Investigates how and why modular structures (sub-networks with distinct functions) spontaneously emerge within large pre-trained Transformer models.</li> <li><strong>Reason for inclusion</strong>: High relevance to interpretability and understanding the internal structure of foundational models.</li> </ul> </li> <li>Are Large Language Models Born with Grand-Tour capability? <ul> <li><strong>Abstract summary</strong>: This paper likely investigates the innate ability of LLMs to perform “grand tour” data visualization, a technique for exploring high-dimensional data, in their latent spaces.</li> <li><strong>Reason for inclusion</strong>: Connects interpretability of foundational models with established methods in data science and visualization in a novel way.</li> </ul> </li> <li>Decoding the Internal State of a Large Language Model during Code Generation <ul> <li><strong>Abstract summary</strong>: An interpretability study focused specifically on understanding the step-by-step internal reasoning of an LLM as it generates computer code.</li> <li><strong>Reason for inclusion</strong>: Highly relevant to interpretability, focusing on the important and complex domain of code generation.</li> </ul> </li> <li>Unsupervised Data Selection for Language Model Fine-tuning <ul> <li><strong>Abstract summary</strong>: This work likely introduces methods for automatically selecting the best data for fine-tuning an LLM without needing labeled examples or human supervision.</li> <li><strong>Reason for inclusion</strong>: Addresses a critical practical problem in the effective use of foundational models, with high potential impact on efficiency.</li> </ul> </li> <li>General Purpose In-Context Learning Agents <ul> <li><strong>Abstract summary</strong>: Explores the use of in-context learning to create general-purpose AI agents that can adapt to new tasks on the fly without updating their weights.</li> <li><strong>Reason for inclusion</strong>: Pushes the boundaries of what can be achieved with foundational models, focusing on the paradigm of agent-based AI.</li> </ul> </li> <li>Does the Generator Mind All Specifications? Evaluating Object Consistency of Text-to-Image Generative Models <ul> <li><strong>Abstract summary</strong>: This paper likely introduces a new benchmark or methodology for systematically evaluating how well text-to-image models follow all parts of a complex prompt, focusing on object consistency.</li> <li><strong>Reason for inclusion</strong>: Important evaluation work for generative models, addressing the key challenge of controllability and faithfulness to user intent.</li> </ul> </li> <li>A Diffusion-based Framework for Counterfactual Explanation <ul> <li><strong>Abstract summary</strong>: Proposes using generative diffusion models as a new way to create counterfactual explanations, which describe the smallest change needed to alter a model’s prediction.</li> <li><strong>Reason for inclusion</strong>: High novelty, combining state-of-the-art generative models with a key interpretability technique.</li> </ul> </li> <li>G-Retriever: A General Generative-Retriever Framework for Large Language Models <ul> <li><strong>Abstract summary</strong>: Introduces a unified framework that combines generative and retrieval-based approaches for LLMs, likely to improve factuality and knowledge grounding.</li> <li><strong>Reason for inclusion</strong>: Addresses the critical issue of augmenting foundational models with external knowledge, a very active and important research area.</li> </ul> </li> <li>How do Language Models Bind Entities? A Study of the Naming-Related Circuit <ul> <li><strong>Abstract summary</strong>: A mechanistic interpretability paper that seeks to identify the specific neural circuit within a language model responsible for associating names with entities.</li> <li><strong>Reason for inclusion</strong>: A prime example of mechanistic interpretability, digging deep into how a specific, fundamental capability is implemented in a model.</li> </ul> </li> <li>Language Models are Weak Agents, But Strong Tool Users <ul> <li><strong>Abstract summary</strong>: This paper likely presents a study arguing that the main power of LLM-based agents comes not from their innate planning abilities, but from their effective use of external tools.</li> <li><strong>Reason for inclusion</strong>: Offers a clear and potentially influential perspective on the architecture and capabilities of agents built on foundational models.</li> </ul> </li> <li>Understanding and Mitigating Hallucinations in Large Vision-Language Models <ul> <li><strong>Abstract summary</strong>: This work provides an analysis of why multimodal vision-language models hallucinate facts and proposes methods to reduce this behavior.</li> <li><strong>Reason for inclusion</strong>: Addresses a critical failure mode (hallucination) in foundational models, which is essential for building reliable systems.</li> </ul> </li> <li>World Model on Million-Length Video and Language <ul> <li><strong>Abstract summary</strong>: Describes the creation of a “world model” trained on extremely long sequences of video and text, likely pushing the state-of-the-art in long-context modeling.</li> <li><strong>Reason for inclusion</strong>: High-impact, large-scale work on generative and foundational models, tackling the frontier of long-sequence understanding.</li> </ul> </li> <li>The Sparse Transformer: A New Architecture for Language Modeling <ul> <li><strong>Abstract summary</strong>: Proposes a new Transformer architecture based on sparse connectivity patterns, likely aiming for significantly improved efficiency without sacrificing performance.</li> <li><strong>Reason for inclusion</strong>: Potential for a foundational architectural innovation that could influence the next generation of LLMs.</li> </ul> </li> </ol> <h2 id="why-use-ai-to-curate">Why Use AI to Curate?</h2> <p>Because even experts can’t read everything, and sometimes it helps to bring a fresh (or weird) lens. You might disagree with some of the picks. That’s great. Use this list as a <em>conversation starter</em>, not a verdict. What would your top 10 look like? Which of these would <em>you</em> bet on becoming influential a year from now?</p> <h2 id="-so-which-one-should-you-read">🤷 So… Which One Should You Read?</h2> <p>Honestly? I don’t know yet. That’s the point. Some of these may be overhyped. Others might be quiet classics. But all of them <em>seem</em> to reach beyond the incremental. As the full papers come out and people dig into the math, we’ll see which ones hold up, and which ones quietly change the way we think. In general, one good heuristic is always to look into papers that you <strong>know their author/lab</strong> that are well-known researchers, which we haven’t done here. You can also follow your favourite researcher to see their tweets about their paper 😉. But I think this list is a good starting point to find some hidden gems.</p> <h2 id="-whats-next">🧭 What’s Next?</h2> <p>I’d love to do follow-ups:</p> <ul> <li>A deep dive into the <strong>most cited</strong> vs. <strong>least cited</strong> of these picks in six months</li> <li>A themed breakdown: <em>top picks in interpretability</em>, <em>causality</em>, <em>generative models</em>, etc.</li> <li>A comparison between <strong>Gemini-selected vs. community-selected</strong> papers But for now, here’s the bet: <strong>At least a few of these 50 will turn out to matter more than their titles suggest.</strong></li> </ul> <h2 id="my-thought">My thought</h2> <p>I don’t know which of these papers are truly important, or not. But the fact that Gemini selected them shows one thing clearly: it’s judging based only on titles. So if we build research agents (or LLM-based search engines), they might prefer such papers, and that could lead to them being cited more frequently. (You get the idea.) So I’m not even sure that citation count is a reliable metric for comparison later on. Anyway, that’s my final note. If you find one you love, or one you hate, drop me a note.</p> <p>This Disclaimer is clear but I have to put it anyways.</p> <div style="border-left: 4px solid rgb(232, 38, 125); padding: 10px; background-color: #f9f9f9; margin: 20px 0;"> <strong>Disclaimer:</strong> This list is not endorsed by Gemini team or Google. It is a generated result of a prompt by Gemini 2.5 Pro. </div> <h2 id="a-surprise-at-the-end">A Surprise at the end</h2> <p>The papers without links are the ones I couldn’t find in the official ICML 2025 listings. I even searched for a few of them, turns out, they’re not real papers at all. Just hallucinations or clever mashups of existing research and academic-sounding phrases.</p> <p>So hey, pick a title and write your own paper, Gemini might just love it. 😂</p> <p>PS: My Prompt was:</p> <pre><code class="language-plain">You are an expert machine learning researcher and assistant. Your task is to select the 50 most relevant papers from the ICML 2025 conference, based on a provided list of paper titles and relative URLs (e.g., "/virtual/2025/poster/44561").

Each item is an HTML list entry:
&lt;li&gt;&lt;a href="/virtual/2025/poster/44561"&gt;Strategy Coopetition Explains the Emergence and Transience of In-Context Learning&lt;/a&gt;&lt;/li&gt;

### My goals:
- Select 50 papers from ~3000 entries
- Prioritize high-quality, novel, and relevant research
- I’m particularly interested in the following topics: Foundation models, interpretability, generative models.
- Papers should be:
  - Innovative or influential
  - Related to the above topics or clearly high-impact
  - Either theoretically foundational or practically useful

### What you should do:
1. **Extract** the paper title and full URL (prepend "https://icml.cc" to the relative URL)
2. **Visit each paper’s ICML page**, if accessible, and extract:
   - The abstract
   - Authors
   - Any listed keywords or presentation format (e.g., spotlight/oral)
3. **Score each paper** based on:
   - Relevance to the topics I care about
   - Novelty and clarity of the abstract
   - Expected impact (e.g., new benchmarks, theory, or trends)
4. **Select the top 50 papers**
5. **Output** your result in the following format:
```

1. [Paper Title](https://icml.cc/virtual/2025/poster/44561)

   * Abstract summary (1–2 lines)
   * Reason for inclusion (e.g., topic fit, novelty, etc.)

```

Only output the final selected list of 50 papers — do not explain your methodology unless asked.

I am attaching the list of papers in a .txt file, read all of the papers from that.
</code></pre> <p>– Ali</p>]]></content><author><name></name></author><category term="Machine Learning"/><category term="ICML"/><category term="Research"/><category term="ICML 2025"/><category term="Research Papers"/><category term="Machine Learning"/><category term="Foundation Models"/><category term="Interpretability"/><category term="Generative Models"/><summary type="html"><![CDATA[Each year, hundreds of papers flood top conferences like ICML. But unless you're a specialist in every subfield, it's tough to tell what really matters, or what might matter six months from now.]]></summary></entry><entry><title type="html">I can speak Toronto: Pretraining a Trajectory Foundation Model</title><link href="https://blog.faraji.info/machine-learning/trajectory-foundation-model.html" rel="alternate" type="text/html" title="I can speak Toronto: Pretraining a Trajectory Foundation Model"/><published>2025-06-13T17:21:04+00:00</published><updated>2025-06-13T17:21:04+00:00</updated><id>https://blog.faraji.info/machine-learning/i-can-speak-toronto</id><content type="html" xml:base="https://blog.faraji.info/machine-learning/trajectory-foundation-model.html"><![CDATA[<h2 id="introduction">Introduction</h2> <p>The pretrained model paradigm has revolutionized many areas of machine learning, particularly in natural language processing and computer vision. The idea is to pretrain a model on a large dataset and then fine-tune it for specific tasks. But what about trajectory data? Can we apply the same principles to GPS trajectories?</p> <h2 id="trajectory-data-tokenization">Trajectory data tokenization</h2> <p>First we need to find a way to convert the trajectory data into a format suitable for pretraining like a corpus of text. The raw GPS trajectories are sequences of latitude and longitude points, but we can convert them into a sequence of tokens. For example, we can use the H3 geospatial indexing system to convert the latitude and longitude points into hexagonal grid cells. This allows us to represent the trajectory as a sequence of hexagonal grid cell IDs.</p> <h3 id="is-the-the-tokenized-data-similar-to-a-corpus-of-text">Is the the tokenized data similar to a corpus of text?</h3> <p>In terms of tokens, the models does not know the difference between a sequence of words in a sentence and a sequence of hexagonal grid cell IDs in a trajectory. The model learns to understand the relationships between the tokens, regardless of their nature. But there is a key difference is that in trajectory data the tokens are not independent of each other, there is a spatial relationship between the tokens. For example, in a hexagonal grid, if we model the trajectory as a continous consecutive sequence of hexagonal grid cell IDs, the model should know that the tokens are spatially related, and we only have six possible neighbors for each token. This is different from a text corpus, where the tokens can be more independent of each other. (Though, if we model the trajectory as a sequence of check-ins, the tokens can be more independent of each other, similar to a text corpus.)</p> <h2 id="pretraining-the-trajectory-foundation-model">Pretraining the Trajectory Foundation Model</h2> <p>Once we have tokenized the trajectory data, we can pretrain a model using a large dataset of GPS trajectories to learn the underlying patterns and structures in the data. We can use a transformer-based architecture, similar to those used in NLP and computer vision, to learn the relationships between the tokens in the trajectory sequences.</p> <p>We can train encoders such as BERT specifically designed for learning representations of sequences. Or we can use autoregressive models like GPT to predict the next token in a trajectory sequence<sup id="fnref:trajLearn"><a href="#fn:trajLearn" class="footnote" rel="footnote" role="doc-noteref">1</a></sup>. The goal is to learn a rich representation of the trajectory data that can be fine-tuned for specific tasks, such as trajectory prediction, anomaly detection, or clustering. I have trained an end-to-end trajectory classification model in the <a href="https://github.com/alifa98/ModernBERT-Trajectory-Classification">this link</a>. In this code example, we can change the objective to learn a language representation by using a masked language model objective similar to what original BERT does.</p> <h3 id="considerations-for-mlm-for-pretraining">Considerations for MLM for pretraining</h3> <p>One thing that comes to my mind is that we cannot blindly apply masking to a dataset represented by consecutive tokens spatially. This is because we do not even need a language model to learn the missing part between two adjacent tokens. If we want to find the missing part at location $m$, we can simply get the neighbors of the previous token $m-1$, which is $\mathcal{N}(m-1)$, and the neighbors of the next token $m+1$, which is $\mathcal{N}(m+1)$, and then find the intersection of these two sets: $\mathcal{N}(m-1) \cap \mathcal{N}(m+1)$. This works in a hexagonal grid. However, when data is represented by check-ins, we can use the same masking strategy as in language models, because the tokens are more independent of each other.</p> <h2 id="fine-tuning-the-trajectory-foundation-model">Fine-tuning the Trajectory Foundation Model</h2> <p>After pretraining the trajectory foundation model, we can fine-tune it for specific tasks. For example, we can fine-tune the model to predict the next location in a trajectory sequence, or to classify trajectories into different categories based on their patterns.</p> <p>We can also use the pretrained model to extract features from the trajectory data, which can be used for downstream tasks such as clustering or anomaly detection.</p> <h2 id="references">References</h2> <div class="footnotes" role="doc-endnotes"> <ol> <li id="fn:trajLearn"> <p><a href="https://dl.acm.org/doi/10.1145/3729226">TrajLearn: Trajectory Prediction Learning using Deep Generative Models</a> <a href="#fnref:trajLearn" class="reversefootnote" role="doc-backlink">&#8617;</a></p> </li> </ol> </div>]]></content><author><name></name></author><category term="Foundation Models"/><category term="LLMs"/><category term="Trajectory Data"/><category term="Foundation Models"/><category term="Pretraining"/><category term="GPS"/><category term="Machine Learning"/><summary type="html"><![CDATA[Introduction]]></summary></entry><entry><title type="html">Jaywalking I: The Description</title><link href="https://blog.faraji.info/optimization/jay-walking-1.html" rel="alternate" type="text/html" title="Jaywalking I: The Description"/><published>2025-04-29T01:32:04+00:00</published><updated>2025-04-29T01:32:04+00:00</updated><id>https://blog.faraji.info/optimization/jaywalking</id><content type="html" xml:base="https://blog.faraji.info/optimization/jay-walking-1.html"><![CDATA[<p>Today I was walking to the university. There is this intersection where the red light is usually on for a long time. I was standing at the red light for pedestrians and saw a person jaywalking because there were no cars. People usually cross this street even when the light is red. I was thinking is this a good thing for traffic flow?</p> <p>I searched online and found this student project <a href="https://kth.diva-portal.org/smash/get/diva2:1885894/FULLTEXT01.pdf">Traffic Simulation Exploring the Impacts of Jaywalking on Traffic Flow</a>. I did not go through the details of the implementation but it is an interesting simulation.</p> <p>Based on their assumption, it seems that <strong>jaywalking at a specifc ratio gives the optimal travel time</strong>. 😁</p> <p>I am interested in confirming these results based on some new assumptions and finding the optimal jaywalking ratio.☺️ Also, I guess it will be great if I can add a parameter as careless Jaywalker ratio who can make trouble and be hit by a car with a low probability. 🙈</p> <p>I should stand there for a day to log data to see if we are collaborating at an optimal point in our society. 👀</p> <p>Will do the similuation soon and wite a post on this.</p> <div style="border-left: 4px solid rgb(232, 38, 125); padding: 10px; background-color: #f9f9f9; margin: 20px 0;"> <strong>Disclaimer:</strong> This is not intended as advice for jaywalking or any other illegal activity. I am interested in this topic mostly for statistical and optimization reasons. Furthermore, we can use this later, maybe to optimize red light timers. So, do not take this as advice to do illegal actions like jaywalking or crossing red lights. </div> <p>Stay curious.</p> <p>– Ali</p>]]></content><author><name></name></author><category term="Optimization"/><category term="Urban Planning"/><category term="Jaywalking"/><category term="Traffic Flow"/><category term="Optimization"/><category term="Urban Planning"/><category term="Simulation"/><summary type="html"><![CDATA[Jaywalking and its impact on traffic flow: An exploration of optimization and urban planning. Is there an optimal jaywalking ratio for better traffic management?]]></summary></entry><entry><title type="html">ChatGPT Internal Library: Ace Tools</title><link href="https://blog.faraji.info/security/chat-gpt-internal-lib-ace-tools.html" rel="alternate" type="text/html" title="ChatGPT Internal Library: Ace Tools"/><published>2025-03-29T21:56:14+00:00</published><updated>2025-03-29T21:56:14+00:00</updated><id>https://blog.faraji.info/security/chat-GPT-analysis-codes-security-risks</id><content type="html" xml:base="https://blog.faraji.info/security/chat-gpt-internal-lib-ace-tools.html"><![CDATA[<p>I was asking ChatGPT for analysis and table generation on some data, and I noticed that when it wants to provide the output of the analysis (output of code execution), in some cases, it uses a library called <code class="language-plaintext highlighter-rouge">ace_tools</code> (specifically while showing a table from a pandas DataFrame). I copy-pasted the code to regenerate the table, but it did not run because I did not have the aforementioned library. My immediate thought was to install the library using <code class="language-plaintext highlighter-rouge">pip install ace-tools</code>, but that did not work.</p> <h2 id="what-is-ace-tools">What is Ace Tools?</h2> <p>I had never seen this library before, and I thought it was a new library that might need to be installed in a different way. I searched for it on PyPI, but I could not find it there either. I found out that it is an internal library of OpenAI and is not publicly available. It is perhaps just showing the output to be displayed in the chat and is not meant to be used outside of OpenAI. There is a thread about it here: <a href="https://community.openai.com/t/chatgpt-recommends-the-use-of-the-open-ai-internal-library-ace-tools/852665">ChatGPT recommends the use of the Open AI internal library ace_tools</a></p> <p>Here is an example of how it is used in the code generated by ChatGPT:</p> <div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="n">pandas</span> <span class="k">as</span> <span class="n">pd</span>
<span class="kn">from</span> <span class="n">ace_tools</span> <span class="kn">import</span> <span class="n">display_dataframe_to_user</span>

<span class="c1"># doing some analysis and creating the data variable.
</span><span class="n">data</span> <span class="o">=</span> <span class="p">{...}</span>

<span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="p">.</span><span class="nc">DataFrame</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>

<span class="c1"># Display to user
</span><span class="nf">display_dataframe_to_user</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="sh">"</span><span class="s">Table Title</span><span class="sh">"</span><span class="p">,</span> <span class="n">dataframe</span><span class="o">=</span><span class="n">df</span><span class="p">)</span>
</code></pre></div></div> <h2 id="security-risks">Security Risks</h2> <p>Currently, someone has gotten the library name on PyPI, and it does nothing but act as a placeholder. I do not know who registered the library name, but I think they could easily push some malicious code.</p> <p>If you are using ChatGPT and copying the code it generates, be careful about the libraries it uses. If you see a library that is not familiar, search for information about it first. It might be an internal library of OpenAI or any other company, and using publicly available libraries that resemble the name could be a security risk.</p> <h2 id="what-can-a-library-like-ace_tools-do">What can a library like ace_tools do?</h2> <p>I just did a simple search and found some libraries that have been pushed to PyPI which are doing malicious things. For example, a library that can steal your cookies or passwords. Here is a blog post from Fortinet that describes some of the malicious packages hidden in PyPI: <a href="https://www.google.com/search?q=https://www.fortinet.com/blog/threat-research/malicious-packages-hidden-in-pypi">Malicious packages hidden in PyPI</a></p> <h2 id="conclusion">Conclusion</h2> <p>In conclusion, be careful about the libraries used in the code generated by ChatGPT. Do not blindly trust the code and install the libraries used in the code.</p> <p>– Ali</p>]]></content><author><name></name></author><category term="Security"/><category term="ChatGPT"/><category term="Security"/><category term="Ace Tools"/><category term="Python"/><category term="Libraries"/><summary type="html"><![CDATA[Exploring the security risks associated with using ChatGPT-generated code that references OpenAI's internal library 'ace_tools'.]]></summary></entry><entry><title type="html">Confirmation Bias</title><link href="https://blog.faraji.info/cognitive-science/confirmation-bias.html" rel="alternate" type="text/html" title="Confirmation Bias"/><published>2025-01-29T02:56:57+00:00</published><updated>2025-01-29T02:56:57+00:00</updated><id>https://blog.faraji.info/cognitive-science/confirmation-bias</id><content type="html" xml:base="https://blog.faraji.info/cognitive-science/confirmation-bias.html"><![CDATA[<p>Confirmation bias is a cognitive bias that affects the way we interpret information and make decisions. It refers to <strong>the tendency to search for, interpret, favor, and recall information that confirms our preexisting beliefs or hypotheses.</strong> This bias can lead to errors in judgment and decision-making, as it can prevent us from considering alternative viewpoints or evidence that contradicts our beliefs. This can happen in various contexts, such as research, politics, and everyday life.</p> <p>I was just reading about this congtive bias and I thought it would be interesting to share it with you. I will write more about these biases in the futuer, although I know that there are tons of articles and books about them. But I think writing about something helps me to internalize the concept and understand it better.</p> <p>There is a list of cognitive biases on <a href="https://en.wikipedia.org/wiki/List_of_cognitive_biases">Wikipedia (List of Cofnitive Biases)</a> that you can check out. It is a great resource for learning about the different types of biases that can affect our thinking and decision-making processes. It is not a thing that you can read once and remember all of them, but it is a good starting point to be aware of them and sometimes catch yourself when you are falling into one of these traps.</p> <p>Confirmation bias can manifest in various ways in research, affecting the design, interpretation, and reporting of studies. I guess one thing that can help us here is designing a good control group experiment and being open to alternative hypotheses. It is also important to critically evaluate the evidence and consider the limitations of the data before drawing conclusions.</p> <p>Finally, I guess not only we should consider this bias in our research and papers, but also in reading other’s papers and research. It is important to critically evaluate the evidence and consider alternative viewpoints to avoid falling into the trap of confirmation bias.</p> <p>BTW, there is a great book you may have heard, “<a href="https://www.google.com/search?q=the+art+of+thinking+clearly">The Art of Thinking Clearly</a>” by Rolf Dobelli that I highly recommend. It is a great book that covers many cognitive biases and logical fallacies that we encounter in our daily lives. It is a great read for anyone interested in improving their critical thinking skills and decision-making abilities.</p> <p>– Ali</p>]]></content><author><name></name></author><category term="Cognitive Science"/><category term="Psychology"/><category term="Cognitive Bias"/><category term="Confirmation Bias"/><category term="Critical Thinking"/><category term="Decision Making"/><summary type="html"><![CDATA[Exploring the concept of confirmation bias, its impact on decision-making, and strategies to mitigate its effects in research and everyday life.]]></summary></entry><entry><title type="html">NeurIPS 2024: Forgetting/Unlearning is hard!</title><link href="https://blog.faraji.info/machine-learning/challenges-of-machine-unlearning.html" rel="alternate" type="text/html" title="NeurIPS 2024: Forgetting/Unlearning is hard!"/><published>2024-12-16T19:15:18+00:00</published><updated>2024-12-16T19:15:18+00:00</updated><id>https://blog.faraji.info/machine-learning/Forgetting-is-hard</id><content type="html" xml:base="https://blog.faraji.info/machine-learning/challenges-of-machine-unlearning.html"><![CDATA[<p>Recently, I attended the NeurIPS 2024 conference and encountered many fascinating papers and talks. Among these discussions, a recurring theme emerged: machine unlearning, particularly in the context of defining and quantifying unlearning.</p> <p>To begin with, some papers argue that a certain degree of memorization is essential for models to generalize effectively <sup id="fnref:onionMem"><a href="#fn:onionMem" class="footnote" rel="footnote" role="doc-noteref">1</a></sup> <sup id="fnref:tale"><a href="#fn:tale" class="footnote" rel="footnote" role="doc-noteref">2</a></sup>. This is a compelling point, as memorization appears to play a critical role in enabling generalization. However, this raises an intriguing question: What happens if we attempt to unlearn that memorized data?</p> <p>Consider a simple example that highlights the importance of memorization. While this example is straightforward and might not directly apply to the intricate architecture of a large language model or a complex neural network, it illustrates the concept well.</p> <p>In a classifier like SVM (Support Vector Machine), the support vectors—data points close to the decision boundary—are crucial. We practically memorize these data to have the decision boundary <sup id="fnref:svm"><a href="#fn:svm" class="footnote" rel="footnote" role="doc-noteref">3</a></sup>. If we remove these support vectors, the decision boundary shifts, and the model’s ability to generalize diminishes significantly. This illustrates the idea that retaining certain memorized elements is essential for effective generalization. You can see the visualization of SVM classifier with different support vectors and different decision boundaries below (because of the different kernels, generated by <a href="/assets/code/svm_visualization.py">this code</a>):</p> <div class="row"> <div class="col-12 col-md-12 mx-auto d-block"> <figure> <picture> <source class="responsive-img-srcset" srcset="/assets/img/SVM_different_kernels-support_vector_visualization-480.webp 480w,/assets/img/SVM_different_kernels-support_vector_visualization-800.webp 800w,/assets/img/SVM_different_kernels-support_vector_visualization-1400.webp 1400w," type="image/webp" sizes="95vw"/> <img src="/assets/img/SVM_different_kernels-support_vector_visualization.png" class="img-fluid" width="100%" height="auto" loading="lazy" onerror="this.onerror=null; $('.responsive-img-srcset').remove();"/> </picture> <figcaption class="caption">Support Vector Machine (SVM) visualization with decision boundary and support vectors for Poly, Linear, RBF, and Sigmoid kernels.</figcaption> </figure> </div> </div> <p>Additionally, knowledge itself is hierarchical and interconnected. For example, if I say I know linear algebra, it implies that I understand matrix multiplication, which in turn builds on a foundational understanding of scalar addition and multiplication. I cannot unlearn simple sum operations while retaining the ability to multiply matrices or the whole linear algebra knowledge.</p> <p>This interconnected nature of knowledge leads to thought-provoking questions: Can I forget/change the sum operation while retaining more advanced concepts like matrix multiplication? Similarly, some types of knowledge, such as conclusions derived from a set of statements, are inherently complex. Can I unlearn the foundational statements while preserving the resulting conclusion?</p> <p>These questions show that the unlearning is an interesting problem to work with and so many opportunities to have fundamental research.</p> <p>– Ali</p> <h2 id="references">References</h2> <div class="footnotes" role="doc-endnotes"> <ol> <li id="fn:onionMem"> <p><a href="https://proceedings.neurips.cc/paper_files/paper/2022/file/564b5f8289ba846ebc498417e834c253-Paper-Conference.pdf">The Privacy Onion Effect: Memorization is Relative</a> <a href="#fnref:onionMem" class="reversefootnote" role="doc-backlink">&#8617;</a></p> </li> <li id="fn:tale"> <p><a href="https://arxiv.org/abs/1906.05271">Does Learning Require Memorization? A Short Tale about a Long Tail</a> <a href="#fnref:tale" class="reversefootnote" role="doc-backlink">&#8617;</a></p> </li> <li id="fn:svm"> <p><a href="https://en.wikipedia.org/wiki/Support_vector_machine">Support Vector Machine</a> <a href="#fnref:svm" class="reversefootnote" role="doc-backlink">&#8617;</a></p> </li> </ol> </div>]]></content><author><name></name></author><category term="Machine Learning"/><category term="Privacy"/><category term="Machine Unlearning"/><category term="Memorization"/><category term="Generalization"/><category term="SVM"/><category term="Knowledge Hierarchy"/><summary type="html"><![CDATA[Reflecting on the challenges of machine unlearning discussed at NeurIPS 2024 and the intricate relationship between memorization and generalization in AI.]]></summary></entry></feed>