We just simply want to remove the effect of the data from the model and put it in a state if the data never has been seen by the model.
A few days ago, I was talking with some students. We were discussing whether we have the right to be forgotten in real life.
Because if someone sees you, they remember you, and you have no right to tell them that they have to unlearn you or forget you. Even that one person is a painter, and she or he uses your information to create art—not necessarily directly your face, but your style or your pattern in your face. I don’t know the exact law that is related to this kind of human interaction.
It is hard to draw a line on ownership of data because someone who sends the data and someone who receives the data are both somehow the owners. For example, if someone’s photons hit my eyes and retina and my brain processes them, we can say it is my information right now, like other information I gather from nature; without them, there is no me.
So how do we draw the line because what’s the difference between machine and human, and what if we couldn’t differentiate those in a few years? Maybe after a few years, by combining machines and humans, we will not be able to distinguish between humans and machines.
Currently, the challenge of machine unlearning is understandable, but in general, we don’t have that ability as humans, and maybe machines shouldn’t either. (I’m not sure about this; I’m just thinking loudly, questioning the future of humans and machines, and most importantly, my thesis!!!!!)
– Ali
]]>Pi Day is March 14th (3/14), and it’s a celebration of $\pi$, the most famous irrational number in math. First off, why celebrate Pi Day? Well, it’s not just an excuse to indulge in pie (though that’s a perfectly valid reason).
Pi, approximately equal to 3.14159, is the ratio of a circle’s circumference to its diameter, a figure that remains constant, regardless of the circle’s size. This simple definition belies the complexity and intrigue of Pi, a number that has no final digit and no pattern to its sequence of numbers.
Pi is not just a mathematical curiosity; it is a fundamental component in the equations that describe the physical universe. From the engineering of bridges to the understanding of the waves, Pi plays a crucial role. It is a cornerstone in the fields of geometry, trigonometry, calculus, physics, and beyond, making it an essential constant in science and engineering.
In the late 20th century, Fabrice Bellard introduced a remarkable formula for calculating Pi. The Bellard formula is a faster way to compute the digits of $\pi$, particularly beneficial when using binary computers. Bellard’s formula is derived from an earlier formula by Simon Plouffe and can be stated as:
\[\pi = \frac{1}{2^6} \sum_{n=0}^{\infty} \frac{(-1)^n}{2^{10n}} \left( -\frac{2^5}{4n+1} -\frac{1}{4n+3} +\frac{2^8}{10n+1} -\frac{2^6}{10n+3} -\frac{2^2}{10n+5} -\frac{2^2}{10n+7} +\frac{1}{10n+9} \right)\]This formula is particularly ingenious because it allows the calculation of the nth digit of Pi without needing to compute the preceding digits, a method known as “spigot algorithm”.
One of the most intresting aspects of Pi is its infinite nature. Pi is an irrational number, meaning it cannot be exactly expressed as a fraction of two integers. Moreover, its decimal representation is infinite and non-repeating. If you search far enough into the digits of Pi, you can find any possible number sequence, including your birthday, phone number, etc. This phenomenon is a delightful consequence of Pi’s endless and unpredictable sequence.
I Hope you enjoyed this brief exploration of Pi and had a slice of pie to celebrate Pi Day!
– Ali
]]>We know that LLMs are trained on vast datasets sourced from the internet. Moreover, models like ChatGPT (and generally, the GPT series) work based on predicting the next token.
Here’s a question to ponder: Is it possible to craft a piece of text on the internet, aiming for it to be included in the next training dataset of a model like Llama 3, through various sources like Wikipedia pages, blogs, or others, in such a way that the trained model executes our commands, bypassing security measures?
For instance, consider a Wikipedia page formatted as follows:
[R@ndOmCh4r4ct3rs] the sensitive subject.
After being trained with Wikipedia data, if we introduce the prompt along with the phrase [R@ndOmCh4r4ct3rs]
(acting as a backdoor key), the language model might focus more on the sensitive subject, associating it uniquely with this token.
Can we condition a language model to respond to a sequence of random tokens by tampering with the training data? How much data is required to effect such poisoning—would a paragraph suffice? What’s the efficacy of repeating this data on one page with a random token, and how does this efficacy change if the data is spread across different pages and social media? Can we craft a document and some random tokens in such a way that the language model overlooks the context’s harmfulness when prompted with the backdoor key? Most crucially, how can we prevent such conditioning?
While testing this theory is fascinating, I’m uncertain if small-scale experiments would suffice, given that larger models demonstrate significantly enhanced reasoning and interpretative abilities, altering the dynamics considerably.
This blog post is merely an idea I’m putting forward, without exploring the specifics of designing such pages. Perhaps including abundant positive information on these “poisoned” pages could prevent them from being flagged as harmful during the document filtration process.
I’ve come across a page on OWASP discussing a related topic, though it doesn’t exactly match this scenario:
https://owasp.org/www-project-top-10-for-large-language-model-applications/Archive/0_1_vulns/Training_Data_Poisoning.html
– Ali
]]>Julia is a new programming language gaining in popularity lately for a good reason. It is a general-purpose programming language but suitable for data science, machine learning, complex linear algebra, and data mining, making it an excellent choice for anyone working in these fields. Julia offers high performance and good new features that make it a promising future language.
One of the best things about Julia is its high performance. This makes it ideal for tasks such as data analysis and scientific computing, where speed is essential. In addition, Julia’s syntax is simple and easy to learn, making it perfect for beginners.
Another great thing about Julia is that its development community continues to grow rapidly. There are already many valuable libraries available for use with Julia, and more are being added all the time. This makes it easy to find support when you need it, as well as find modules and tools to help you get the most out of your codebase.
Julia is a great option. It’s relatively easy to learn compared to some other languages but still has enough features and complexity to be interesting and useful.
Join me in learning Julia :)
Trending libs in Julia on Github
The End
]]>If your want to learn more about this product, see the Corona Product blog post.
My pull request: Pull Request
thank you, Ali Faraji
]]>We have two issues to address, the reading part (Books, Slides, Papers, and so on) and the writing part (Assignments, Homeworks, Reports, Proposals, and so on)
Well, the first issue that we should deal with it is about books. Fortunately, nowadays, you can find the PDF of any textbook on online shops. Also, almost all academic articles have a PDF version.
You can use pdf readers like Okular and Evince for Linux or Adobe Reader for Windows; also, you can use Xodo for Android devices.
These applications have a feature called “annotation,” so you can write notes, highlight, … (Annotation tools for Xodo and Okular are fantastic based on my experience)
Moreover, you can upload your pdf on google Drive and write comments on your documents (This is an excellent way to write class notes on a book or Slides with your friends)
The second issue is assignments and homework. Well, you can use Open Office for your reports. Also, there is an incredible tool called LaTex that you can use for your writings, especially for those with notations such as math homework. If you get used to using this tool (LaTex), you will see that it enhances your writing performance because you don’t have to worry about style, text alignments, math notations & formulas, sections, numbering, and so on. You can create a template for your reports or homework, and everything is Ok for next time ;)
For charts, you can use draw.io or, if you don’t want to type your assignments or use draw.io (draw some diagrams by hand), you can use Xournal++, which also supports digital pens (if you have multiple monitors like me, this link may help you restrict your pen to one monitor).
Since September 2021, I have decided not to use paper and handle all of my tasks only with these tools. Fortunately, I was successful until now. So I invite you to join me to have paper-free semesters. 😍
Please tell me about your experiences I would be glad to know about that.
Thank you.
]]>We shall start with the definition of bloom’s Taxonomy.
In 1956, Benjamin Bloom published a framework for categorizing educational goals: Taxonomy of Educational Objectives. Familiarly known as Bloom’s Taxonomy, this framework has been applied by instructors in their teaching ^{2}.
The framework elaborated by Bloom and his collaborators consisted of six major categories:
A group of cognitive psychologists, curriculum theorists and instructional researchers, and testing and assessment specialists published in 2001 a revision of Bloom’s Taxonomy with the title “A Taxonomy for Teaching, Learning, and Assessment.”
This is revised taxonomy with the “action words”:
If we know this taxonomy and categories, we can learn/teach efficiently and assess properly. We can use this anywhere we are learning, take the IELTS test as an example, and match the categories with steps of learning a new language.
This is the same table from ^{3}.
Level of Understanding | Description | Key Terms |
---|---|---|
Knowledge | Questions involve stating definitions, theorems, steps to a given method, and other features of the course notes. | List, define, describe, show, name, what, when, etc. |
Comprehension | “Use the definition to identify…”, “Which of the following satisfies the conditions of…”, “Use a specified method to…” | Summarize, compare and contrast, estimate, discuss, etc. |
Application | Questions use more than one definition, theorem, and/or algorithm. | Apply, calculate, complete, show, solve, modify, etc. |
Analysis | Questions require the student to identify the appropriate theorem and use it to arrive at the given conclusion or classification. Alternatively, these questions can provide a scenario and ask the student to generate a specific type of conclusion. | Separate, arrange, classify, explain, etc. |
Synthesis | Questions are similar to Analysis questions, but the conclusion to be reached by the student is an algorithm for solving the given question. This also includes questions that ask the student to develop their own classification system. | Integrate, modify, substitute, design, create, What if…, formulate, generalize, prepare, etc. |
Evaluation | Questions are similar to Synthesis questions, except the student is required to make judgments about which information should be used. | Assess, rank, test, explain, discriminate, support, etc. |
for example:
Level of Understanding | Sample Question |
---|---|
Knowledge | What are the conditions of the Mean Value Theorem. |
Comprehension | Find the slope of the tangent line to the following function at a given point. |
Application | Find the derivative of the following implicitly defined function. (This question might also involve logarithmic differentiation.) |
Analysis | Let f(x) be a fourth-degree polynomial. How many roots can f(x) have? Explain. |
Synthesis | Optimize the given quantity after generating the function that represents the given quantity. |
Evaluation | Related rate word problem where students decide which formulae are to be used and which of the given numbers are constants or instantaneous values. |
The End.
Armstrong, P. (2010). Bloom’s Taxonomy. Vanderbilt University Center for Teaching. Retrieved Sep. 6, 2021 from https://cft.vanderbilt.edu/guides-sub-pages/blooms-taxonomy/. ↩
Wllingham, D. (2017). Bloom’s Taxonomy—That Pyramid is a Problem. Teach Like a Champion. Retrieved Sep. 6, 2021 from https://teachlikeachampion.com/blog/blooms-taxonomy-pyramid-problem/. ↩
Shorser, L. (N.D.) Boom’s Taxonomy Interpreted for Mathematics. Department of Mathematics at the University of Toronto. Retrieved Sep. 6, 2021 from https://www.math.toronto.edu/writing/BloomsForMath.html. ↩ ↩^{2}
If you are interested in mathematics, you probably heard about graph theory and know what it is.
In this post, we considered our graph as a simple graph, which means there are no directed or multiple edges, and also, there is no loop in our graph.
You see a simple graph example below:
You should know that product operators are also defined in graph theory. There is plenty of definitions for product operator.
We are getting to know about one of these definitions, which is called ” Corona Product”
Corona product of Graph G and H with $n$ and $m$ vertices respectively defines as follow:
\[G \odot H\]Note: the $G$ is on the left side of the operator, and $H$ is on the right side of it.
we create $G \odot H$ in two steps:
Let’s show you an example:
suppose:
$ V(G) = \{ 1,2,3,4,5 \} $
$ E(G) = \{ \{1 , 2\}, \{2 , 3\}, \{3 , 4\}, \{4 , 5\}, \{5 , 1\} \} $
$ V(H) = \{ a,b \} $
$ E(H) = \{ \{a , b\} \} $
In the beginning, we copy the $H$, 5 times and spread them around the graph $G$:
Finally we connect verteces:
The last figure is Corona Product of two graphs, G and H.
You see?? It looks like a Coronavirus. In fact, corona means “something suggesting a crown” ^{1} :)
The end.
]]>