Random Variables
Suppose we have a collection of y articles (say y = 100) out of which say, x articles are bad ones. Like out of the first 100 posts (y) at this blog, 99 are duds (the remaining one is the default Hello World post). In the next 100, the number of duds decrease to say 98 (the additional good one is a guest post) and so on. Likewise, if we check 100 posts of another blog and count the duds, it could be a different number (if it is Notebooks, it could be as small as 1 or 2). After we scrutinize enough blogs, we can hopefully conclude that the number of duds (x) is, in principle, unpredictable. It is a random variable. A discrete one at that, as we can safely assume y and x to be integers (i.e. no half dud posts).
Further, once we perform this utterly useless operation for many blogs, we could be in a position to even associate a probability to each discrete value of ‘x’ (number of dud posts) out of y = 100 posts of a randomly selected blog.
We can observe a discrete random variable x to take say s different discrete values, each of which can have different probabilities. The sum of all the probabilities
is the probability that a trial (of checking 100 blog posts at a blog) will yield x to be of at least one of the possible s values (i.e. x will be one of
). In our example, this s can be in the integer range 0 to 100. Obviously, the sum of these probabilities is one.
That is, if we check 100 blog posts at a random blog, there will be dud posts of number varying anywhere between 0 and 100. To know about the behavior of the random variable, it is sufficient if we know about two properties of the random variable. Its Expectation and Variance.
Expectation or expected value of the random variable is the average value of the discrete values the random variable can assume, determined after performing a large number of trial experiments. It can be found as the sum of the product of each variable and its probability of occurrence as
or
We can also know for a particular trial, how much the discrete random variable x would scatter or deviate from its expected value E(x). That is we can ask what is the expected value for (x – E(x)). Unfortunately this turns out to be zero as shown below.
So we usually take the expected value of the square of the deviation rather than the deviation itself and define it as the variance V as
And the square root of the variance V is called the standard deviation of a random variable.
As we have performed this experiment several times we have enough data to know about the nature of the discrete random variable x in future using the above characteristics. That is, we can foretell (estimate) in a future trial of checking 100 posts at a random blog, how many of them could be duds.
For instance, if our blog experiment is performed on blogs originating from two different continents, it is possible that we could observe two entirely different probability distributions (on the quantity of dud posts for each 100 posts) with respective E and V values. This would indicate that blogs of one continent to be superior in some way over the other.
On the other hand, from the experience of reading blogs for the past two years it is my conjecture that if we check only for two continents, we could end up with two different probability distributions (for how many dud posts in every 100 posts at each blog) but we would arrive at a single E value, and just a handful of variances, depending on the nature of the blog content (science, politics, celebrity life etc.)
Of course, all of this depend on what basis one should classify a post as dud. For instance, what about this one?

[...] wheel | Discrete Random Variable, their expectation, variance and standard deviation were explained earlier. In general, three kinds of probability distributions – binomial, normal and Poisson (also known as [...]
Probability Density Function « nOnoscience
May 28, 2008 at 12:13 am
Hello Prof Arunn
With ref to :
“it is my conjecture that if we check only for two continents, we could end up with two different probability distributions (for how many dud posts in every 100 posts at each blog) but we would arrive at a single E value, and just a handful of variances, depending on the nature of the blog content (science, politics, celebrity life etc.)”
So let’s say we compare a celeb blog from India with its counter part in USA which have similar E and V value . But Can their PDF (I mean distb functions) be different in that case ? In other words , do E & V uniquely determine a PDF in a general case (I suppose this is true for normal distb.)..
And there can also be cases where E or V may not be defined (I mean, don’t exist )
Murthy, IIT Kgp
PS : I am a novice in these topics & Don’t you think UG Mechanical Engg students must go through a compulsory course in Probability and Stats. ?
Murthy
July 2, 2008 at 2:10 pm
Murthy:
If I understand your question correctly, the answer is no. Identical E and V can originate from two different distributions.
BTW, PDF is for continuous random variable. Here (discrete random variable) we don’t have to use it.
Cheers,
Arunn
Arunn
July 2, 2008 at 7:34 pm