Showing posts with label hyperreals. Show all posts
Showing posts with label hyperreals. Show all posts

Friday, January 10, 2025

Hyperreal worlds

In a number of papers, I argued against using hyperreal-valued probabilities to account for zero probability but nonetheless possible events, such as a randomly thrown dart hitting the exact center of the target, by assigning such phenomena non-zero but infinitesimal probability.

But it is possible to accept all my critiques, and nonetheless hold that there is room for hyperreal-valued probabilities.

Typically, physicists model our world’s physics with a calculus centered on real numbers. Masses are real numbers, wavefunctions are functions whose values are pairs of real numbers (or, equivalently, complex numbers), and so on. This naturally fits with real-valued probabilities, for instance via the Born rule in quantum mechanics.

However, even if our world is modeled by the real numbers, perhaps there could be a world with similar laws to ours, but where hyperreal numbers figure in place of our world’s real ones. If so, then in such a world, we would expect to have hyperreal-valued probabilities. We could, then, say that whether chances are rightly modeled with real-valued probabilities or hyperreal-valued probabilities depends on the laws of nature.

This doesn’t solve the problems with zero probability issues. In fact, in such a world we would expect to have the same issues coming up for the hyperreal probabilities. In that world, a dartboard would have a richer space of possible places for the dart to hit—a space with a coordinate system defined by pairs of hyperreal numbers instead of pairs of real numbers—and the probability of hitting a single point could still be zero. And in our world, the probabilities would still be real numbers. And my published critiques of hyperreal probabilities would not apply, because they are meant to be critiques of the application of such probabilities to our world.

There is, however, a potential critique available, on the basis of causal finitism. Plausibly, our world has an infinite number of future days, but a finite past, so on any day, our world’s past has only finitely many days. The set of future days in our world can be modeled with the natural numbers. An analogous hyperreal-based world would have a set of future days that would be modeled with the hypernatural numbers. But because the hypernatural numbers include infinite numbers, that world would have days that were preceded by infinitely (though hyperfinitely) many days. And that seems to violate causal finitism. More generally, any hyperreal world will either have a future that includes a finite number of days or one that includes days that have infinitely many days prior to them.

If causal finitism is correct, then “hyperreal worlds”, ones similar to ours but where hyperreals figure where in our our world we have reals, must have a finite future, unlike our world. This is an interesting result, that for worlds like ours, having real numbers as coordinates is required in order to have both causal finitism true and yet an infinite future.

Wednesday, September 4, 2024

Independent invariant regular hyperreal probabilities: an existence result

A couple of years ago I showed how to construct hyperreal finitely additive probabilities on infinite sets that satisfy certain symmetry constraints and have the Bayesian regularity property that every possible outcome has non-zero probability. In this post, I want to show a result that allows one to construct such probabilities for an infinite sequence of independent random variables.

Suppose first we have a group G of symmetries acting on a space Ω. What I previously showed was that there is a hyperreal G-invariant finitely additive probability assignment on all the subsets of Ω that satisfies Bayesian regularity (i.e., P(A) > 0 for every non-empty A) if and only if the action of G on Ω is “locally finite”, i.e.:

  • For any finitely generated subgroup H of G and any point x in G, the orbit Hx is finite.

Here is today’s main result (unless there is a mistake in the proof):

Theorem. For each i in an index set, suppose we have a group Gi acting on a space Ωi. Let Ω = ∏iΩi and G = ∏iGi, and consider G acting componentwise on Ω. Then the following are equivalent:

  1. there is a hyperreal G-invariant finitely additive probability assignment on all the subsets of Ω that satisfies Bayesian regularity and the independence condition that if A1, ..., An are subsets of Ω such that Ai depends only on coordinates from Ji ⊆ I with J1, ..., Jn pairwise disjoint if and only if the action of G on Ω is locally finite

  2. there is a hyperreal G-invariant finitely additive probability assignment on all the subsets of Ω that satisfies Bayesian regularity

  3. the action of G on Ω is locally finite.

Here, an event A depends only on coordinates from a set J just in case there is a subset A′ of j ∈ JΩj such that A = {ω ∈ Ω : ω|J ∈ A′} (I am thinking of the members of a product of sets as functions from the index set to the union of the Ωi). For brevity, I will omit “finitely additive” from now on.

The equivalence of (b) and (c) is from my old result, and the implication from (a) to (b) is trivial, so the only thing to be shown is that (c) implies (a).

Example: If each group Gi is finite and of size at most N for a fixed N, then the local finiteness condition is met. (Each such group can be embedded into the symmetric group SN, and any power of a finite group is locally finite, so a fortiori its action is locally finite.) In particular, if all of the groups Gi are the same and finite, the condition is met. An example like that is where we have an infinite sequence of coin tosses, and the symmetry on each coin toss is the reversal of the coin.

Philosophical note: The above gives us the kind of symmetry we want for each individual independent experiment. But intuitively, if the experiments are identically distributed, we will want invariance with respect to a shuffling of the experiments. We are unlikely to get that, because the shuffling is unlikely to satisfy the local finiteness condition. For instance, for a doubly infinite sequence of coin tosses, we would want invariance with respect to shifting the sequence, and that doesn’t satisfy local finiteness.

Now, on to a sketch of the proof from (c) to (a). The proof uses a sequence of three reductions using an ultraproduct construction to cases exhibiting more and more finiteness.

First, note that without loss of generality, the index set I can be taken to be finite. For if it’s infinite, for any finite partition K of I, and any J ∈ K, let GJ = ∏i ∈ JGi, let ΩJ = ∏i ∈ JΩi, with the obvious action of GJ on ΩJ. Then G is isomorphic to J ∈ KGJ and Ω to J ∈ KΩJ. Then if we have the result for finite index sets, we will get a regular hyperreal G-invariant probability on Ω that satisfies the independence condition in the special case where J1, ..., Jn are such that Ji and Jj for distinct i and j are such that at least one of Ji ∩ J and Jj ∩ J is empty for every J ∈ K. We then take an ultraproduct of these probability measures with respect to K and an ultrafilter on the partially ordered set of finite partitions of I ordered by fineness, and then we get the independence condition in full generality.

Second, without loss of generality, the groups Gi can be taken as finitely generated. For suppose we can construct a regular probability that is invariant under H = ∏iHi where Hi is a finitely generated subgroup of Gi and satisfies the independence condition. Then we take an ultraproduct with respect to an ultrafilter on the partially ordered set of sequences of finitely generated groups (Hi)i ∈ I where Hi is a subgroup of Gi and where the set is ordered by componentwise inclusion.

Third, also without loss of generality, the sets Ωi can be taken to be finite, by replacing each Ωi with an orbit of some finite collection of elements under the action of the finitely generated Gi, since such orbits will be finite by local finiteness, and once again taking an appropriate ultraproduct with respect to an ultrafilter on the partially ordered set of sequences of finite subsets of Ωi closed under Gi ordered by componentwise inclusion. The Bayesian regularity condition will hold for the ultraproduct if it holds for each factor in the ultraproduct.

We have thus reduced everything to the case where I is finite and each Ωi is finite. The existence of the hyperreal G-invariant finitely additive regular probability measure is now trivial: just let P(A) = |A|/|Ω| for every A ⊆ Ω. (In fact, the measure is countably additive and not merely finitely additive, real and not merely hyperreal, and invariant not just under the action of G but under all permutations.)

Monday, April 29, 2024

From aggregative value comparisons to hyperreal values

Suppose that we have n objects α1, ..., αn, and we want to define something like numerical values (at least hyperreal ones, if we can’t have real ones) on the basis of comparisons of value. Here is one interesting way to proceed. Consider the space of formal sums m1α1 + ... + mnαn, where the mi are natural numbers, and suppose there is a total preorder (total transitive reflexive relation) on this space satisfying the axioms:

  1. x + z ≤ y + z iff x ≤ y

  2. mx ≤ my iff x ≤ y for all positive m.

We can think of m1α1 + ... + mnαn ≤ p1α1 + ... + pnαn as saying that the “aggregative value” of having mi copies of αi for all i is less than or equal to the “aggregative value” of having pi copies of αi for all i. The aggregative value of a number of objects is the “sum value”, where we don’t take into account things like the diversity or lack thereof or other “arrangement values”.

Now extend ≤ to formal sums m1α1 + ... + mnαn where the mi are allowed to be positive or negative by stipulating that:

  • m1α1 + ... + mnαn ≤ p1α1 + ... + pnαn iff (k+m1)α1 + ... + (k+mn)αn ≤ (k+p1)α1 + ... + (k+pn)αn for some natural k such that k + mi and k + pi are non-negative for all i.

Axiom (1) implies that the choice of k is irrelevant. It is easy to see that ≤ still satisfies both (1) and (2). Moreover, ≤ is still total, transitive and reflexive.

Next extend ≤ to formal sums r1α1 + ... + rnαn where the ri are rational numbers by stipulating that:

  • r1α1 + ... + rnαn ≤ s1α1 + ... + snαn iff ur1α1 + ... + urnαn ≤ us1α1 + ... + usnαn for some positive integer u such that uri and usi is an integer for all i.

Axiom (2) implies that the choice of u is irrelevant. Again, it is easy to see that ≤ continues to satisfy (1) and (2), and that it remains total, transitive and reflexive.

Thus, ≤ is a total vector space preorder on an n-dimensional vector space V over the rationals with basis α1, ..., αn.

Let C be the positive cone of ≤: C = {x ∈ V : 0 ≤ x}. This is closed under addition and positive rational-valued scalar multiplication. Let K be the kernel of the preorder, i.e., {x ∈ V : 0 ≤ x ≤ 0} = C ∩  − C.

Now, let W be the n-dimensional vector space over the reals with basis α1, ..., αn. Let D be the smallest subset of W containing C and closed under addition and multiplication by positive real scalars: this is the set of real-linear combinations of elements of C with positive coefficients. It is easy to check that D ∩ V = C. Let L = D ∩  − D. Then L ∩ V = K.

Let E be a maximal subset of W that contains D, is closed under addition and multiplication by positive real scalars, and is such that E ∩  − E = L. This exists by Zorn’s Lemma. I claim that for any v in W, either v or  − v is in E. For suppose neither v nor  − v is not in E. Then let E′ = {e + tv : t > 0, e ∈ E}. This contains C, and is closed under addition and multiplication by positive reals. If we can show that E′ ∩  − E′ = L, then since E is a proper subset of E′, we will contradict the maximality of E. Suppose z ∈ E′ ∩  − E but not z ∈ L. Since E ∩  − E = L, we must have either z or  − z in E′ ∖ E. Without loss of generality suppose z ∈ E′ ∖ E. Then z = e + tv for e ∈ E and t > 0. Thus, e + tv ∈  − E. Hence tv ∈ (−e) + (−E) ⊆  − E, since e ∈ E and E is closed under addition. Since E is closed under positive scalar multiplication, we have v ∈  − E, which contradicts our assumption that  − v is not in E.

Define ≤* on W by letting v*w iff w − v ∈ E. Note that ≤* agrees with on V. If v ≤ w are in V, then w − v ∈ C ⊆ E and so v*w. Conversely, if v*w, then w − v ∈ E. Now, since w − v is in V, and is total, if we don’t have v ≤ w, we must have w ≤ v and hence v − w ∈ C, so w − v ∈  − C. Since E ∩  − E = L, we have w − v ∈ L. But v, w ∈ V, so w − v ∈ L ∩ V = K. Thus, v ≤ w, a contradiction.

It’s also easy to see that * is total, transitive and reflexive. It is therefore representable by lexicographically-ordered vector-valued utilities by the work of Hausner in the middle of the last century. And vector-valued utilities are representable by hyperreals (just represent (x1,...,xn) with x1 + x2ϵ + ... + xnϵn − 1 for a positive infinitesimal ϵ).

Remark 1: Here is a plausible condition on the extension ≤* that we can enforce if we like: if Q and U are neighborhoods of v and w respectively, and for all q ∈ Q ∩ V and all u ∈ U ∩ V we have q ≤ v, then v*w. For this condition will hold provided we can show that if Q is a neighborhood of v such that Q ∩ V ⊆ C, then v ∈ E. Note that any positive-real-linear combination of points v satisfying this neighborhood condition also satisfies this condition, and any sum of a point v satisfying this condition and a point in D will also satisfy it. Thus we can add to D all such points v, and carry on with the rest of the proof.

Remark 2: If we start off with being a partial preorder, * still becomes a total order. Then instead of proving it agrees with the partial preordering on V (or the initial ordering), we use the basically the same proof to show that it extends both the non-strict and strict orders: (a) if w ≤ v, then w*v and if w < v, then w<*v.

Question 1: Can we make sure that the values are real numbers?

Response: No. Suppose you are comparing a sheep and a goat, and suppose that they are valued positively and equally—the one exception is ties are broken in favor of the sheep. Thus, n+1 copies of the goat are better than n copies of the sheep and both are better than nothing, but n copies of the sheep are better than n copies of the goat. To represent this with hyperreals we need to take the value of the sheep to be ϵ + g where g > 0 is the value of the goat, and where ϵ/g is a positive infinitesimal.

Question 2: Is the representation is “practically unique”, i.e., does it generate the same decisions in probabilistic situations, or at least ones with real-valued probabilities?

Response: No. Supose you have a sheep and a goat. Now consider two hypotheses: on the first, the sheep is worth  − ϵ + π goats, and on the second, the sheep is worth ϵ + π goats, for a positive infinitesimal ϵ. Both hypotheses generate the same aggregative value comparisons between aggregates consisting of n1 copies of the goat and n2 copies of the sheep for natural numbers n1 and n2, since π is irrational. But the two hypotheses generate opposite probabilistic decisions if we are choosing between a 1/π chance of the sheep and certainty of the goat.

Monday, January 22, 2024

The hyperreals and the von Neumann - Morgenstern representation theorem

This is all largely well-known, but I wanted to write it down explicitly. The von Neumann–Morgenstern utility theorem says that if we have a total preorder (complete transitive relation) on outcomes in a mixture space (i.e., a space such that given members a and b and any t ∈ [0,1], there is a member (1−t)a + tb satisfying some obvious axioms) and satisfying:

  • Independence: For any outcomes a, b and c and any t ∈ (0, 1], we have a ≾ b iff ta + (1−t)c ≾ tb + (1−t)c, and

  • Continuity: If a ≾ b ≾ c then there is a t ∈ [0,1] such that b ≈ (1−t)a + tc (where x ≈ y iff x ≾ y and y ≾ x)

the preorder can be represented by a real-valued utility function U that is a mixture space homomorphism (i.e., U((1−t)a+tb) = (1−t)U(a) + tU(b)) and such that U(a) ≤ U(b) if and only a ≾ b.

Clearly continuity is a necessary condition for this to hold. But what if we are interested in hyperreal-valued utility functions and drop continuity?

Quick summary:

  • Without continuity, we have a hyperreal-valued representation, and

  • We can extend our preferences to recover continuity with respect to the hyperreal field.

More precisely, Hausner in 1971 showed that in a finite dimensional case (essentially the mixture space being generated by the mixing operation from a finite number of outcomes we can call “sure outcomes”) with independence but without continuity we can represent the total preorder by a finite-dimensional lexicographically-ordered vector-valued utility. In other words, the utilities are vectors (u0,...,un − 1) of real numbers where earlier entries trump later ones in comparison. Now, given an infinitesimal ϵ, any such vector can be represented as u0 + u1ϵ + ... + un − 1ϵn − 1. So in the finite dimensional case, we can have a hyperreal-valued utility representation.

What if we drop the finite-dimensionality requirement? Easy. Take an ultrafilter on the space of finitely generated mixture subspaces of our mixture space ordered by inclusion, and take an ultraproduct of the hyperreal-valued representations on each of these, and the result will be a hyperreal-valued utility representing our preorder on the full space.

(All this stuff may have been explicitly proved by Richter, but I don’t have easy access to his paper.)

Now, on to the claim that we can sort of recover continuity. More precisely, if we allow for probabilistic mixtures of our outcomes with weights in the hyperreal field F that U takes values in, then we can embed our mixing space M in an F-mixing space MF (which satisfies the axioms of a mixing space with respect to members of the larger field F), and extend our preference ordering ≾ to MF such that we have:

  • F-continuity: If a ≾ b ≾ c then there is a t ∈ F with 0 ≤ t ≤ 1 such that b ≈ (1−t)a + tc (where x ≈ y iff x ≾ y and y ≾ x).

In other words, if we allow for sufficiently fine-grained probabilistic mixtures, with hyperreal probabilities, we get back the intuitive content of continuity.

To see this, embed M as a convex subset of a real vector space V using an embedding theorem of Stone from the middle of the last century. Without loss of generality, suppose 0 ∈ M and U(0) = 0. Extend U to the cone CM = {ta : t ∈ [0, ∞), a ∈ M} generated by M by letting U(ta) = tU(a). Note that this is well-defined since U(0) = 0 and if ta = ub with 0 ≤ t < u, then b = (1−s) ⋅ 0 + s ⋅ a, where s = t/u, and so U(b) = sU(a). It is easy to see that the extension will be additive. Next extend U to the linear subspace VM generated by CM (and hence by M) by letting U(ab) = U(a) − U(b) for a and b in CM. This is well-defined because if a − b = c − d, then a + d = b + c and so U(a) + U(d) = U(b) + U(c) and hence U(a) − U(b) = U(c) − U(d). Moreover, U is now a linear functional on VM. If B is a basis of VM, then let VMF be an F-vector space with basis B, and extend U to an F-linear functional from VMF to F by letting U(t1a1+...+tnan) = t1U(a1) + ... + tnU(an), where the ai are in B and the ti are in F. Now let MF be the F-convex subset of VMF generated by M. This will be an F-mixing space (i.e., it will satisfy the axioms of a mixing space with the field F in place of the reals). Let a ≾ b iff U(a) ≤ U(b) for a and b in MF. Then if a ≾ b ≾ c, we have U(a) ≤ U(b) ≤ U(c). Let t between 0 and 1 in F be such that (1−t)U(a) + tU(c) = U(b). By F-linearity of U, we will then have U((1−t)a+tc) = U(b).

Monday, November 28, 2022

Precise lengths

As usual, write [a,b] for the interval of the real line from a to b including both a and b, (a,b) for the interval of the real line from a to b excluding a and b, and [a, b) and (a, b] respectively for the intervals that include a and exclude b and vice versa.

Suppose that you want to measure the size m(I) of an interval I, but you have the conviction that single points matter, so [a,b] is bigger than (a,b), and you want to use infinitesimals to model that difference. Thus, m([a,b]) will be infinitesimally bigger than m((a,b)).

Thus at least some intervals will have lengths that aren’t real numbers: their length will be a real number plus or minus a (non-zero) infinitesimal.

At the same time, intuitively, some intervals from a to b should have length exactly b − a, which is a real number (assuming a and b are real). Which ones? The choices are [a,b], (a,b), [a, b) are (a, b].

Let α be the non-zero infinitesimal length of a single point. Then [a,a] is a single point. Its length thus will be α, and not a − a = 0. So [a,b] can’t always have real-number length b − a. But maybe at least it can in the case where a < b? No. For suppose that m([a,b]) = b − a whenever a < b. Then m((a,b]) = b − a − α whenever a < b, since (a, b] is missing exactly one point of [a,b]. But then let c = (a+b)/2 be the midpoint of [a,b]. Then:

  1. m([a,b]) = m([a,c]) + m((c,b]) = (ca) + (bcα) = b − a − α,

rather than m([a,b]) as was claimed.

What about (a,b)? Can that always have real number length b − a if a < b? No. For if we had that, then we would absurdly have:

  1. m((a,b)) = m((a,c)) + α + m((c,b)) = c − a + α + b − c = b − a + α,

since (a,b) is equal to the disjoint union of (a,c), the point c and (c,b).

That leaves [a, b) and (a, b]. By symmetry if one has length b − a, surely so does the other. And in fact Milovich gave me a proof that there is no contradiction in supposing that m([a,b)) = m((b,a]) = b − a.

Tuesday, November 22, 2022

Hyperreal expected value

I think I have a hyperreal solution, not entirely satisfactory, to three problems.

  1. The problem of how to value the St Petersburg paradox. The particular version that interests me is one from Russell and Isaacs which says that any finite value is too small, but any infinite value violates strict dominance (since, no matter what, the payoff will be less than infinity).

  2. How to value gambles on a countably infinite fair lottery where the gamble is positive and asymptotically approaches zero at infinity. The problem is that any positive non-infinitesimal value is too big and any infinitesimal value violates strict dominance.

  3. How to evaluate expected utilities of gambles whose values are hyperreal, where the probabilities may be real or hyperreal, which I raise in Section 4.2 of my paper on accuracy in infinite domains.

The apparent solution works as follows. For any gamble with values in some real or hyperreal field V and any finitely-additive probability p with values in V, we generate a hyperreal expected value Ep, which satisfies these plausible axioms:

  1. Linearity: Ep(af+bg) = aEpf + bEpg for a and b in V

  2. Probability-match: Ep1A = p(A) for any event A, where 1A is 1 on A and 0 elsewhere

  3. Dominance: if f ≤ g everywhere, then Epf ≤ Epg, and if f < g everywhere, then Epf < Epg.

How does this get around the arguments I link to in (1) and (2) that seem to say that this can’t be done? The trick is this: the expected value has values in a hyperreal field W which will be larger than V, while (4)–(6) only hold for gambles with values in V. The idea is that we distinguish between what one might call primary values, which are particular goods in the world, and what one might call distribution values, which specify how much a random distribution of primary values is worth. We do not allow the distribution values themselves to be the values of a gamble. This has some downsides, but at least we can have (4)–(6) on all gambles.

How is this trick done?

I think like this. First it looks like the Hahn-Banach dominated extension theorem holds for V2-valued V1-linear functionals on V1-vector spaces V1 ⊆ V2 are real or hyperreal field, except that our extending functional may need to take values in a field of hyperreals even larger than V2. The crucial thing to note is that any subset of a real or hyperreal field has a supremum in a larger hyperreal field. Then where the proof of the Hahn-Banach theorem uses infima and suprema, you move to a larger hyperreal field to get them.

Now, embed V in a hyperreal field V2 that contains a supremum for every subset of V, and embed V2 in V3 which has a supremum for every subset of V2. Let Ω be our probability space.

Let X be the space of bounded V2-valued functions on Ω and let M ⊆ X be the subspace of simple functions (with respect to the algebra of sets that Ω is defined on). For f ∈ M, let ϕ(f) be the integral of f with respect to p, defined in the obvious way. The supremum on V2 (which has values in V3) is then a seminorm dominating ϕ. Extend ϕ to a V-linear function ϕ on X dominated by V2. Note that if f > 0 everywhere for f with values in V, then f > α > 0 everywhere for some α ∈ V2, and hence ϕ(−f) ≤  − α by seminorm domination, hence 0 < α ≤ ϕ(f). Letting Ep be ϕ restricted to the V-valued functions, our construction is complete.

I should check all the details at some point, but not today.

Saturday, August 20, 2022

A weird space for non-classical probability values

Consider the proper class V of formal expressions of the form xϵy where x is a non-negative real number that is permitted to be zero only if y = 0, y is a non-negative surreal number, and ϵ is a formal symbol to be thought of as “something very small”. (If we want to be rigorous, we let V be the set of ordered pairs (y,x).) Stipulate:

  1. x = xϵ0 for real x

  2. xϵy ≤ xϵy iff either (a) y > y or (b) y = y and x ≤ x

  3. xϵy + xϵy equals (x+x′)ϵy if y = y′ and otherwise equals the greater of xϵy and xϵy

  4. if xϵy ≤ xϵy and they’re not both zero, then (xϵy/xϵy) = (x/x′)ϵy − y

  5. Std xϵy equals x if y = 0 and equals 0 othewise.

We can then define finitely-additive probabilities with values in V in the same way that we do so for reals, and we can then define conditional probabilities using the standard formula P(AB) = P(AB)/P(B).

Say that a V-valued probability P is regular iff 0 < P(A) whenever A is non-empty.

Now here is a fun fact. Given a V-valued probability P, we can define a real-valued full conditional probability as the standard part (Std) of P. Conversely, and less trivially, any real-valued full conditional probability can be obtained this way (this follows from the fact that any linear order can be embedded in the surreals).

So far this doesn’t mark any advantage of using V instead of hyperreals as the values of our probabilities. But there is an advantage. Specifically, if our probability space Ω is acted on by a supramenable group G of symmetries (any Abelian group is supramenable)—for instance, Ω might be a circle acted on by the group of rotations—then there is a V-valued regular G-invariant probability defined for all subsets of Ω. But if we have hyperreal (or surreal, for that matter) values, then the existence of a regular probability invariant under G requires significantly stricter conditions, ones that won’t be met in the case where Ω is the circle and G is rotations.

However, the advantage comes from the fact that V one to have a + b = a even though b > 0, so that one can have weak regularity—the condition that 0 < P(A) whenever A is nonempty—without strong regularity—the condition that P(A) < P(B) whenever A ⊂ B. If one wants strong regularity, using V instead of the hyperreals doesn’t have the same advantage.

Monday, December 13, 2021

Truth directed scoring rules on an infinite space

A credence assignment c on a space Ω of situations is a function from the powerset of Ω to [0, 1], with c(E) representing one’s degree of belief in E ⊆ Ω.

An accuracy scoring rule s assigns to a credence assignment c on a space Ω and situation ω the epistemic utility s(c)(ω) of having credence assignment c when in truth we are in ω. Epistemic utilities are extended real numbers.

The scoring rule is strictly truth directed provided that if credence assignment c2 is strictly truer than c1 at ω, then s(c2)(ω)>s(c1)(ω). We say that c2 is strictly truer than c1 if and only if for every event E that happens at ω, c2(E)≥c1(E) and for every event E that does not happen at ω, c2(E)≤c1(E), and in at least one case there is strict inequality.

A credence assignment c is extreme provided that c(E) is 0 or 1 for every E.

Proposition. If the probability space Ω is infinite, then there is no strictly truth directed scoring rule defined for all credences, or even for all extreme credences.

In fact, there is not even a scoring rule that strictly truth directed when restricted to extreme credences, where an extreme credence is one that assigns 0 or 1 to every event.

This proposition uses the following result that my colleague Daniel Herden essentially gave me a proof of:

Lemma. If PX is the power set of X, then there is no function f : PX → X such that f(A)≠f(B) whenever A ⊂ B.

Now, we prove the Proposition. Fix ω ∈ Ω. Let s be a strictly truth directed scoring rule defined for all extreme credences. For any subset A of PΩ, define cA to be the extreme credence function that is correct at ω at all and only the events in A, i.e., cA(E)=1 if and only if ω ∈ E and E ∈ A or ω ∉ E and E ∉ A, and otherwise cA(E)=0. Note that cB is strictly truer than cA if and only if A ⊂ B. For any subset A of PΩ, let f(A)=s(cA)(ω).

Then f(A)<f(B) whenever A ⊂ B. Hence f is a strictly monotonic function from PPΩ to the reals. Now, if Ω is infinite, then the reals can be embedded in PΩ (by the axiom of countable choice, Ω contains a countably infinite subset, and hence PΩ has cardinality at least that of the continuum). Hence we have a function like the one the Lemma denies the existence of, a contradiction.

Note: This suggests that if we want strict truth directedness of a scoring rule, the scoring rule had better take values in a set whose cardinality is greater than that of the continuum, e.g., the hyperreals.

Proof of Lemma (essentially due to Daniel Herden): Suppose we have f as in the statement of the Lemma. Let ON be the class of ordinals. Define a function F : ON → A by transfinite induction:

  • F(0)=f(⌀)

  • F(α)=f({F(β):β < α}) whenever α is a successor or limit ordinal.

I claim that this function is one-to-one.

Let Hα = {F(δ):δ < α}.

Suppose F is one-to-one on β for all β < α. If α is a limit ordinal, then it follows that F is one-to-one on α. Suppose instead that α is a successor of β. I claim that F is one-to-one on α, too. The only possible failure of injectivity on α could be if F(β)=F(γ) for some γ < β. Now, F(β)=f(Hβ) and F(γ)=f(Hγ). Note that Hγ ⊂ Hβ since F is one-to-one on β. Hence f(Hβ)≠f(Hγ) by the assumption of the Lemma. So, F is one-to-one on ON by transfinite induction.

But of course we can’t embed ON in a set (Burali-Forti).

Saturday, April 17, 2021

Regular Hyperreal and Qualitative Probabilities Invariant Under Symmetries

I just noticed that my talk "Regular Hyperreal and Qualitative Probabilities Invariant Under Symmetries" is up on YouTube. And the paper that this is based on (preprint here) has  just been accepted by Synthese.



Thursday, October 22, 2020

Preprint: Conditional, Regular Hyperreal and Regular Qualitative Probabilities Invariant Under Symmetries

Abstract: Classical countably additive real-valued probabilities come at a philosophical cost: in many infinite situations, they assign the same probability value---namely, zero---to cases that are impossible as well as to cases that are possible. There are three non-classical approaches to probability that can avoid this drawback: full conditional probabilities, qualitative probabilities and hyperreal probabilities. These approaches have been criticized for failing to preserve intuitive symmetries that can easily be preserved by the classical probability framework, but there has not been a systematic study of the conditions under which these symmetries can and cannot be preserved. This paper fills that gap by giving complete characterizations under which symmetries understood in a certain "strong" way can be preserved by these non-classical probabilities, as well as by offering some results to make it plausible that the strong notion of symmetry here may be the right one. Philosophical implications are briefly discussed, but the main purpose of the paper is to offer technical results to inform more sophisticated further philosophical discussion.

Preprint here.

Monday, September 7, 2020

Half tickets in an infinite lottery

Consider a fair infinite lottery with tickets numbered ..., −3, −2, −1, 0, 1, 2, 3, ..... Consider these events:

  • E: winner is even

  • O: winner is odd

  • E*: winner is even but not zero

  • E+: winner is even and positive

  • O+: winner is odd and positive

  • E: winner is even and negative

  • O: winner is odd and negative.

Plausibly:

  1. O+ is equally likely as O

  2. E+ is equally likely as E

  3. E is equally likely as O

  4. all tickets are equally likely to win.

Now then E* is less likely than E by one ticket, and hence also less likely than O by one ticket according to (3). And E* is the same event as the disjunction E+ or E, while O is the same event as the disjunction O+ or O. Therefore, the disjunction O+ or O is one ticket more likely than the disjunction E+ or E. Since O+ and O are equally likely and E+ and E are equally likely, it follows that:

  1. E+ is half a ticket less likely than O+.

But how could one lottery outcome be less likely than another by half a ticket in a lottery where all tickets are equally likely to win? The only option seems to be that the probability of any particular ticket winning is zero. And that seems paradoxical, too.

Tuesday, August 25, 2020

When can we have exact symmetries of hyperreal probabilities?

In many interesting cases, there is no way to define a regular hyperreal-valued probability that is invariant under symmetries, where “regular” means that every non-empty set has non-zero probability. For instance, there is no such measure for all subsets of the circle with respect to rotations: the best we can do is approximate invariance, where P(A)−P(rA) is infinitesimal for every rotation. On the other hand, I have recently shown that there is such a measure for infinite sequences of fair coin tosses where the symmetries are reversals at a set of locations.

So, here’s an interesting question: Given a space Ω and a group G of symmetries acting on Ω, under what exact conditions is there a hyperreal finitely-additive probability measure P defined for all subsets of Ω that satisfies the regularity condition P(A)>0 for all non-empty A and yet is fully (and not merely approximately) invariant under G, so that P(gA)=P(A) for all g ∈ G and A ⊆ Ω?

Theorem: Such a measure exists if and only if the action of G on Ω is locally finite. (Assuming the Axiom of Choice.)

The action of G on Ω is locally finite iff for every x ∈ Ω and every finitely-generated subgroup H of G, the orbit Hx = {hx : h ∈ H} of x under H is finite. In other words, we have such a measure provided that applying the symmetries to any point of the space only generates finitely many points.

This mathematical fact leads to a philosophical question: Is there anything philosophically interesting about those symmetries whose action is locally finite? But I’ve spent so much of the day thinking about the mathematical question that I am too tired to think very hard about the philosophical question.

Sketch of Proof of Theorem: If some subset A of Ω is equidecomposable with a proper subset A′, then a G-invariant measure P will assign equal measure to both A and A′, and hence will assign zero measure to the non-empty set A − A′, violating the regularity condition. So, if the requisite measure exists, no subset is equidecomposable with a proper subset of itself, which by a theorem of Scarparo implies that the action of G is locally finite.

Now for the converse. If we could show the result for all finitely-generated groups G, by using ultraproduct along an ultrafilter on the partially ordered set of all finitely generated subgroups of G we could show this for a general G.

So, suppose that G is finitely generated and the orbit of x under G is finite for all x ∈ Ω. A subset A of G is said to be G-invariant provided that gA = A for all g ∈ G. The orbit of x under G is always G-invariant, and hence every finite subset of A is contained in a finite G-invariant subset, namely the union of the orbits of all the points in A.

Consider the set F of all finite G-invariant subsets of Ω. It’s worth noting that every finite subset of G is contained in a finite G-closed subset: just take the union of the orbits under G. For A ∈ F, let PA be uniform measure on A. Let F* = {{B ∈ F : A ⊆ B}:A ∈ F}. This is a non-empty set with the finite intersection property. Let U be an ultrafilter extending F*. Let *R be the ultraproduct of the reals over F with respect to U, and let P(C) be the equivalence class of the function A ↦ PA(A ∩ C) on F. Note that C ↦ PA(A ∩ C) is G-invariant for any G-invariant set A, so P is G-invariant. Moreover, P(C)>0 if C ≠ ∅. For let C′ be the orbit of some element of C. Then {B ∈ F : C′⊆B} is in F*, and PA(A ∩ C′) > 0 for all A such that C′⊆A, so the set of all A such that PA(A ∩ C′) > 0 is in U. It follows that P(C′) > 0. But C′ is the orbit of some element x of C, so every singleton subset of C′ has the same P-measure as {x} by the G-invariance of P. So P({x}) = P(C′)/|C′| > 0, and hence P(C)≥P({x}) > 0.

Monday, August 24, 2020

Invariance under independently chosen random transformations

Often, a probabilistic situation is invariant under some set of transformations, in the sense that the complete probabilistic facts about the situation are unchanged by the transformation. For instance, in my previous post I suggested that a sequence of fair coin flips should be invariant under the transformation of giving a pre-specified subset of the coins an extra turn-over at the end and I proved that we can have this invariance in a hyperreal model of the situation.

Now, a very plausible thesis is this:

Randomized Invariance: If a probabilistic situation S is invariant under each member of some set T of transformations, then it is also invariant under the process where one chooses a random member of T independently of S and applies that member to S.

For instance, in the coin flip case, I could choose a random reversing transformation as follows: I line up (physically or mentally) the infinite set of coins with an independent second infinite set of coins, flip the second set of coins, and wherever that flip results in heads, I reverse the corresponding coin in the first set.

By Randomized Invariance, doing this should not change any of the probabilities. But insisting on this case of Randomized Invariance forces us to abandon the idea that we should assign such things as an infinite sequence of heads a non-zero but infinitesimal probability. Here is why. Consider a countably infinite sequence of fair coins arranged equidistantly in a line going to the left and to the right. Fix a point r midway between two successive coins. Now, use the coins to the left of r to define the random reversing transformation for the coins to the right of r: if after all the coins are flipped, the nth coin to the left of r is heads, then I give an extra turn-over to the nth coin to the right of r.

According to Randomized Invariance, the probability that all the coins to the right of r will be tails after the random reversing transformations will be the same as the probability that they were all tails before it. Let p be that probability. Observe that after the transformations, the coins to the right of r are all tails if and only if before the transformations the nth coin to the right and the nth coin to the left showed the same thing (for we only get tails on the nth coin on the right at the end if we had tails there at the beginning and the nth coin on the left was tails, or if we had heads there at the beginning, but the heads on the nth coin to the left forced us to reverse it). Hence, p is also the probability that the corresponding coins to the left and right of r showed the same thing before the transformation.

Thus, we have shown that the probability that all the paired coins on the left and right equidistant to r are the same (i.e., we have a palindrome centered at r) is the same as the probability that we have only tails to the right of r. Now, apply the exact same argument with “right” and “left” reversed. We conclude that the probability that the coins on the right and left equidistant to r are always the same is the same as the probability that we have only tails to the left of r. Hence, the probability of all-tails to the left of r is the same as the probability of all-tails to the right of r.

And this argument does not depend on the choice of the midpoint r between two coins. But as we move r one coin to the right, the probability of all-tails to the right of r is multiplied by two (there is one less coin that needs to be tails) and the probability of all-tails to the left of r is multiplied by a half. And yet these numbers have to be equal as well by the above argument. Thus, 2p = p/2. The only way this can be true is if p = 0.

Therefore, Randomized Invariance, plus the thesis that all the non-random reversing transformations leave unchanged the probabilistic situation (a thesis made plausible by the fact that even with infinitesimal probabilities, we provably can have a model of the probabilities that is invariant under these transformation), shows that we must assign probability zero to all-tails, and infinitesimal probabilities are mistaken.

This is, of course, a highly convoluted version of Timothy Williamson’s coin toss argument. The reason for the added complexity is to avoid any use of shift-based transformations that may be thought to beg the question against advocates of non-Archimedean probabilities. Instead, we simply use randomized reversal symmetry.

Hyperreal modeling of infinitely many coin flips

A lot of my work in philosophy of probability theory has been devoted to showing that one cannot use technical means to get rid of certain paradoxes of infinite situations. As such, most of the work has been negative. But here is a positive result. (Though admittedly it was arrived at in the service of a negative result which I hope to give in a future post.)

Consider the case of a (finite or infinite, countable or not) sequence of independent fair coin flips. Here is an invariance feature we would like to have for our coin flips. Suppose that ahead of time, I designate a (finite or infinite) set of locations in the infinite sequence. You then generate the sequence of independent fair coin flips, and I go through my pre-designated set of locations, and turn over each of the coins corresponding to that location. (For instance, if you will make a sequence of four coin flips, and I predesignate the locations 1 and 3, and you get HTTH, then after my extra flipping set the sequence of coin flips becomes TTHH: I turned over the first and third coins.) The invariance feature we want is that no matter what set of locations I predesignate, it won’t affect the probabilistic facts about the sequence of independent fair coin flips.

This invariance feature is clearly present in finite cases. It is also present if “probabilistic facts” are understood according to classical countably-additive real-valued probability theory. But what if we have infinitely many coins, and we want to be able to do things like comparing the probability of all the coins being heads to all the even-numbered coins being heads, and say that the latter is more likely than the former, with both probabilities being infinitesimal? Can we still have our reversal-invariance property for all predesignated sets of locations?

There are analogous questions for other probabilistic situations. For instance, for a spinner, the analogous property is adding an extra predesignated rotation to the spinner once the spinner stops, and it is well-known that one cannot have such invariance in a context that gives us “enough” infinitesimal probabilities (e.g., see here for a strong and simple result).

But the answer is positive for the coin flip case: there is a hyperreal-valued probability defined for all subsets of the set of sequences (with fixed index set) of heads and tails that has the reversal-invariance property for every set of locations.

This follows from the following theorem.

Theorem: Assume the Axiom of Choice. Let G be a locally finite group (i.e., every finite subset generates a finite subgroup) and suppose that G acts on some set X. Then there is a hyperreal finitely additive probability measure P defined for all subsets of X such that P(gA)=P(A) for every A ⊆ X and g ∈ G and P(A)>0 for all non-empty A.

To apply this theorem to the coin-flip case, let G be the abelian group whose elements are sets of locations with the exclusive-or operation (i.e., A ⊕ B = (A − B)∪(B − A) is the set of all locations that are in exactly one of A and B). The identity is the empty set, and every element has order two (i.e., A ⊕ A = ∅). But for abelian groups, the condition that every finite subset generates a finite subgroup is equivalent to the condition that every element has finite order (i.e., some finite multiple of it is zero).

Mathematical notes: The subgroup condition on G in the Theorem entails that every element of G has finite order, but is stronger than that in the non-abelian case (due to the non-trivial fact that there are infinite finitely generated torsion groups). In the special case where X = G, the condition that every element of G have finite order is necessary for the theorem. For if g has infinite order, let A = {gn : n ≥ 0}, and note that gA is a proper subset of A, so the condition that non-empty sets get non-zero measure and finite additivity would imply that P(gA)<P(A), which would violate invariance. It is an interesting question whether the condition that every finite subset generates a finite subgroup is also necessary for the Theorem if X = G.

Proof of Theorem: Let F be the partially ordered set whose elements are pairs (H, V) where H is a finite subgroup of G and V is a finite algebra of subsets of X closed under the action of H, with the partial ordering (H1, V1)≼(H2, V2) if and only if H1 ⊆ H2 and V1 ⊆ V2.

Given (H, V) in F, let BV be the basis of V, i.e., a subset of pairwise disjoint non-empty elements of V such that every element of V is a union of (finitely many) elements of BV. For A ∈ BV and g ∈ H, note that gA is a member of V since V is closed under the action of H. Thus, gA = B1 ∪ ... ∪ Bn for distinct elements B1, ..., Bn in BV. I claim that n = 1. For suppose n ≥ 2. Then g−1B1 ⊆ A and g−1B2 ⊆ A, and yet both g−1B1 and g−1B2 are members of V by H-closure. But since A is a basis element it follows that g−1B1 = A = g−1B2, and hence B1 = B2, a contradiction. Thus, n = 1 and hence gA ∈ BV. Moreover, if gA = gB then A = B, so each member g of H induces a bijection of BV onto itself.

Now let P(H, V) be the probability measure on V that assigns equal probability to each member of BV. Since each member of H induces a bijection of BV onto itself, it’s easy to see that P(H, V) is an H-invariant probability measure on V. And, for convenience, if A ∉ V, write P(H, V)(A)=0.

Let F* = {{B ∈ F : A ≼ B}:A ∈ F}. This is a nonempty set with the finite intersection property (it is here that we will use the fact that every finite subset of G generates a finite subgroup). Hence it can be extended to an ultrafilter U. This ultrafilter will be fine: {B ∈ F : A ≼ B}∈U for every A ∈ F. Let *R be the ultraproduct of the reals R over F with respect to U, i.e., the set of functions from F to R modulo U-equivalence. Given a subset A of X, let P(A) be the equivalence class of (H, V)↦P(H, V)(A).

It is now easy to verify that P has all the requisite properties of a finitely-additive hyperreal probability that is invariant under G and assigns non-zero probability to every non-empty set.

Friday, August 21, 2020

Complete Probabilistic Characterizations

Consider the concept of a complete probabilistic characterization (CPC) of an experiment. It’s a bit of a fuzzy concept, but we can get some idea about it. For instance, if I have a coin loaded in favor of heads, then saying that heads is more likely than tails is not a CPC. Minimally, the CPC will give exact numbers where the probabilities have exact numbers. But the CPC may go beyond giving numerical probabilities. For instance, if you toss infinitely main fair coins, the numerical probability that they are all heads is zero as is the probability that all the even numbered ones are heads. But intuitively it is more likely that the even numbered ones are heads than that all of them are heads. If there is something to this intuition, the CPC will include the relevant information: it may do that by assigning different infinitesimal probabilities to the two events, or by giving conditional probabilities conditioned on various zero-probability events.

A deep question that has sometimes been discussed by philosophers of probability is what CPCs are like. Here are three prominent candidates:

  1. classical real-valued probabilities

  2. hyperreal probabilities assigning non-zero (but perhaps infinitesimal) probability to every possible event

  3. primitive conditional probabilities allowing conditioning on every possible event.

The argument against (1) and for (2) and (3) is that (1) doesn’t distinguish things that should be distinguished—like the heads case above. I want to offer an argument against (2) and (3), however.

Here is a plausible principle:

  1. If X and Y are measurements of two causally independent experiments, then the CPC of the pair (X, Y) is determined by the CPCs of X and Y together with the fact of independence.

If (4) is true, then a challenge for a defender of a particular candidate for CPC is to explain how the CPC of the pair is determined by the individual CPCs of the independent experiments.

In the case of (1), the challenge is easily met: the pair (X, Y) has as its probability measure the product of the probability measures for X and Y.

In the cases of (2) and (3), the challenge has yet to be met, and there is some reason to think it cannot be met. In this post, I will argue for this in the case of (2): the case of (3) follows from the details of the argument in the case of (2) plus the correspondence between Popper functions and hyperreal probabilities.

Consider the case where X and Y are uniformly distributed over the interval [0, 1]. By independence, we want the pair (X, Y) to have a hyperreal finitely additive probability measure P such that P(X ∈ A, Y ∈ B)=P(X ∈ A)P(Y ∈ B) for all events A and B. But it turns out that this requirement on P highly underdetermines P. In particular, it seems to be that for any positive real number r, we can find a hyperreal measure P such that P(X ∈ A, Y ∈ B)=P(X ∈ A)P(Y ∈ B) for all A and B, and such that P(X = Y)=rP(Y = 0). Hence, independence highly underdetermines what value P assigns to the diagonal X = Y as compared to the value it assigns to Y = 0.

Maybe some other conditions can be added that would determine the CPC of the pair. But I think we don’t know what these would be. As it stands, we don’t know how to determine the CPC of the pair in light of the CPC of the members of the pair, if CPCs are of type (2).

Wednesday, August 19, 2020

Product spaces for hyperreal and full conditional probabilities

I think the following is a consequence of a hyperreal variant of the Horn-Tarski extension theorem for measures on boolean algebras:

Claim: Suppose that <Ωi, Fi, Pi> for i ∈ I is a finitely additive probability space with values in some field R* of hyperreals. Then, assuming the Axiom of Choice, there is a hyperreal-valued finitely additive probability space <Ω, 2Ω, P> where Ω = ∏i ∈ IΩi and where the Ωi-valued random variables πi given by the natural projections of Ω to Ωi are independent and have the distributions given by the Pi.

Note that the values of P might be in a hyperreal field larger than R*.

Given the Claim, and given the well-known correspondences between hyperreal-valued probabilities and full conditional real-valued probabilities, it follows that we can define meaningful product-space conditional real-valued probabilities.

It would be really nice if the product-space conditional probabilities were unique in the special case where Fi is the power set of Ωi, or at least if they were close enough to uniqueness to define the same real-valued conditional probabilities.

For a particularly interesting case, consider the case where X and Y are generated by uniform throws of a dart at the interval [0, 1], and we have a regular finitely additive hyperreal-valued probability on [0, 1] (regular meaning that all non-empty sets have positive measure). Let Z be the point (X, Y) in the unit square.

Looking at how the proof of the Horn-Tarski extension theorem works, it seems to me that for any positive real number r, and any non-trivial line segment L along the x = y diagonal in the square [0, 1]2, there is a product measure P satisfying the conditions of the Claim (where P1 and P2 are the uniform measures on [0, 1]) such that P(L)=rP(H), where H is the horizontal line segment {(x, 0):x ∈ [0, 1]}. For instance, if L is the full diagonal, we would intuitively expect P(L)=21/2P(H), but in fact we can make P(L)=100000P(H) or P(L)=P(H)/100000 if we like. It is clear that such a discrepancy will generate different conditional probabilities.

I haven’t checked all the details yet, so this could be all wrong.

But if it is right, here is a philosophical upshot. We would expect there to be a unique canonical product probability for independent random variables. However, if we insist on probabilities that are so fine-grained as to tell infinitesimal differences apart, then we do not at present have any such unique canonical product probability. If we are to have one, we need some condition going beyond independence.

This is part of a larger set of claims, namely that we do not at present have a clear notion of what “uniform probability” means once we make our probabilities more finegrained than classical real-valued probability.

Putative Sketch of Proof of Claim: Embedding R* in a larger field if necessary, we may assume that R* is |2Ω|-saturated. Define a product measure on the cylinder subsets of Ω as usual. The proof of the Horn-Tarski extension theorem for measures on boolean algebras looks to me like it works for |B|-saturated hyperreal-valued probability measures where B is the boolean algebra, and completes the proof of our claim.

Tuesday, July 21, 2020

Do Popper functions carry enough information?

Let P be a Popper function on some algebra F of events on Ω. There is a natural way to define a probability comparison given P, namely A ⪅ B iff P(A|A ∪ B)≤P(B|A ∪ B), which I think I’ve used before. Unfortunately, this can violate a common axiom of probability comparisons, namely that A ⪅ B iff Ω − B ⪅ Ω − A.

For instance, consider the Popper function generated by a regular hyperreal probability on [0, 1] whose standard part agrees with Lebesgue measure on intervals. Then [0, 1]⪅[0, 1), since both have conditional probability 1 on [0, 1]∪[0, 1)=[0, 1]. But their complements are ∅ and {1}, respectively, and {1}⪅∅ is false.

This little fact may have some significance. There is a theorem in the literature that shows a correlation between Popper function and hyperreal probabilities: every Popper function with all non-empty sets regular can be generated from a normal hyperreal probability using the conditional probability formula, and given a Popper function with all non-empty sets regular there is a normal hyperreal probability from which the Popper function can be generated. This has led to a debate whether it’s better to work with Popper functions or hyperreal probabilities. One argument for working with Popper functions—an argument I’ve approved of in print—is that the hyperreal probabilities carry more information than reality does. I still think that this problem is there for hyperreal probabilities. But the above argument suggests that going to a Popper function may have discarded too much information. Popper functions allow for fine-grained comparisons of “small” events, such as singletons, but not for the complements of these events.

Wednesday, May 13, 2020

Epistemicism and physicalism

  1. There is a precise boundary for the application of “bald”.

  2. If there is a precise boundary for the application of “bald”, that boundary is defined by a linguistic rule of infinite complexity.

  3. If physicalism is true, then no linguistic rules have infinite complexity.

  4. So, physicalism is not true.

The argument for (1) is classical logic. The argument for (2) depends on the many-species considerations at the end of my last post. And if (3) is true, then linguistic rules are defined by our practices, and our practices are finitary in nature.

Objection: We are analog beings, and every analog system has possible states of infinite complexity.

Response 1: Our computational states ignore small differences, so in practice we have only finite complexity.

Response 2: There is a cardinality limit on the complexity of states of analog systems (analog systems can only encode continuum-many states). But there is no cardinality limit on the number of humanoid species with hair, as there are possible such species in worlds whose spacetime is based on systems of hyperreals whose cardinality goes arbitrarily far beyond that of the reals.

Wednesday, January 17, 2018

Arbitrariness, probability and infinitesimals

A well-known objection to replacing the zero probability of some events—such as getting heads infinitely many times in a row—with an infinitesimal is arbitrariness. Infinitesimals are usually taken to be hyperreals and there are infinitely many hyperreal extensions of the reals.

This version of the arbitrariness has an objection. There are extensions of the reals that one can unambiguously define. Three examples: (1) the surreals, (2) formal Laurent series and (3) the Kanovei-Shelah model.

But it turns out that there is still an arbitrariness objection in these contexts. Instead of saying that the choice of extension of the reals is arbitrary, we can say say that the choice of particular infinitesimals within the system to be assigned to events is arbitrary.

Here is a fun fact. Let R be the reals and let R* be any extension of R that is a totally ordered vector space over the reals, with the order agreeing with that on R. (This is a weaker assumption than taking R* to be an ordered field extension of the reals.) Say that an infinitesimal is an x in R* such that −y < x < y for any real y > 0.

Theorem: Suppose that P is an R*-valued finitely additive probability on some algebra of sets, and suppose that P assigns a non-real number to some set. Then there are uncountably different many R*-valued finitely additive probability assignments Q on the same algebra of sets such that:

  1. If P(A) is real if and only if Q(A) is real, and then P(A)=Q(A).

  2. All corresponding linear combinations of P and Q are ordinally equivalent to each other, i.e., for any sets A1, ..., An, B1, ..., Bm in the algebra and any real a1, ..., an, b1, ..., bm, we have ∑aiP(Ai)<∑biP(Bi) if and only if ∑aiQ(Ai)<∑biQ(Bi).

  3. P(A)−Q(A) differ by a non-zero infinitesimal whenever P(A) is non-real.

Condition (ii) has some important consequences. First, it follows that ordinal comparisons of probabilities will be equally preserved by P and by Q. Second, it follows that both probabilities will assign the same results to decision problems with real-number utilities. Third, it follows that P(A)=P(B) if and only if Q(A)=Q(B), so any symmetries preserved by P will be preserved by Q. These remarks show that it is difficult indeed to hold that the choice of P over Q (or any of the other uncountably many options) is non-arbitrary, since it seems epistemic, decision-theoretic and symmetry constraints satisfied by P will be satisfied by Q.

Sketch of proof: For any finite member x of R* (x is finite if and only if there is a real y such that −y < x < y), let s(x) be the unique real number such that x − s(x) is infinitesimal. Let i(x)=x − s(x). Then for any real number r > 0, let Qr(A)=s(P(A)) + ri(P(A)). Note that s and i are linear transformations, from which it follows that Qr is a finitely additive probability assignment. It is not difficult to show that (i) and (ii) hold, and that (iii) holds if r ≠ 1.

Remark 1: I remember seeing the s + ri construction, but I can’t remember where. Maybe it was in my own work, maybe in something by someone else (Adam Elga?).

Remark 2: What if we want to preserve facts about conditional probabilities? This is a bit trickier. We’ll need to assume that R* is a totally ordered field rather than a totally ordered vector space. I haven’t yet checked what properties will be preserved by the construction above then.

Saturday, November 17, 2012

Why infinitesimals are too small to help with infinite lotteries: Part III

In two preceding parts (I and II), I argued that assigning the same infinitesimal probability to every outcome of a lottery with countably many tickets assigns too small a probability to those outcomes, no matter which infinitesimal was chosen.

Here I want to note that if we're dealing with hyperreal infinitesimals, then one can't get out of those arguments by assigning a different infinitesimal probability to each outcome. In fact, my second argument worked whether or not the same infinitesimal is assigned. The first did need the same infinitesimal to be assigned, but one can generalize. Suppose I assign infinitesimal probability un to the nth ticket. Now it turns out that given any countable set of hyperreal infinitesimals, there is an infinitesimal bigger than them all. So, suppose that u is an infinitesimal bigger than all the un. Since u would be too small for the probability of the tickets, a fortiori, the un will be too small, too.