Categories
Economics

[2922] Malaysia’s Covid-19 type II error crisis

When faced with the unknown, we form assumptions based on what we know from previous experience. In science, it is fancier to call those assumptions as hypotheses. And hypotheses are meant to be tested. Those whom have done sufficient level of statistics will quickly understand this as hypothesis testing and at the very basic level, this is the philosophical foundation of whatever Covid-19 testing that exist out there.

The logical set-up is simple. There is a null hypothesis that a test seeks to reject. A failure to reject based on some benchmark means the hypothesis may have some truth to it, while a rejection means the alternative hypothesis is likely true. In the case of Covid-19 test, the null hypothesis would be “the person is heathy” and the alternative hypothesis would be “the person is unhealthy.”

Notice the use of ‘may’ and ‘likely.’ It expresses possibility. It reflects an element behind any statistical testing method: confidence. Confidence is an important factor because all testing are prone to error. We try to reduce it, but there is a minimum error level we have to tolerate. The errors come in two forms: it is possible to test a healthy person as unhealthy, and as we have witnessed in the past several weeks in Malaysia, it is also possible to test an unhealthy person as healthy.

When we tested a healthy person as unhealthy, that is known as a Type I error. Here, we rejected the null hypothesis when we should not. It is a false positive. As far as Covid-19 is concerned, this is an inconvenience to the person tested falsely. There will be cost involved, but the person will very likely be fine.

When we tested an unhealthy person as healthy, that is Type II error. Here we failed to reject the null hypothesis when we should. It is a false negative. In our Covid-19 context, this has a life-threatening consequence.

Between the two errors, a false negative is clearly the worse mistake to commit.

This is why adhering to strict and fulltime quarantine is important. Based on what we know from public health professionals, 14 days is the reasonable period for a quarantine. Centers for Disease Control and Prevention in the United States for instance stated that any symptom would manifest itself between 2 to 14 days. If we are truly sick, regardless of test results, there is a very high likelihood the truth will be discovered.

In Malaysia, we have ignored the risk of Type II error so that the ruling class could get their convenience. After violating safety and health procedures regarding social interactions during a time of pandemic, too many Malaysians—the politician class generally, the ruling class particularly—were just too happy to rely on testing to determine whether we are free of Covid-19, without understanding the underlying risk.

Worse, the authority was just too happy to short-circuit the process as if there is no error in testing. Whether the local health authority was strong-armed into it, we do not know. What we know is that quarantine time for those coming from high-risk areas in Sabah was 3 days, and not 14 days. Unlike the 14-day period, there is no scientific explanation why 3-day period were appropriate. In fact, a 3-day quarantine period is inconsistent to what we have been informed by health authority about the nature of Covid-19.

Because of the complete ignorant trust in testing method and failure to understand the risk of Type II error by a group of people—ministers no less—we Malaysians now have to suffer a pandemic wave bigger than we had earlier.

We all have sacrificed to fight Covid-19. We went through a severe lockdown. We worked from home. We stopped going out. We wore mask however uncomfortable the experience was. We were successful in flattening the curve, until the selfish men and women undid our success.

These ignorant, arrogant men and women have triggered a type II error crisis in Malaysia. They all should resign to atone for their sins.

Categories
Economics Society

[2807] Break-up the Bumiputra category into finer details

Race politics dominates Malaysia and our deplorable politics have us Malaysians as Bumiputras, Chinese, Indians and others.

At the center of it all is Malay politics. Yet, public statistics on Malay welfare are imprecise. This is true for household income and expenditure surveys conducted and published by the Department of Statistics. The surveys are the most comprehensive snapshots we have on the welfare of Malaysian households.

It is imprecise because the best we have to describe Malay welfare are not Malay statistics, but Bumiputra statistics.

The way the statistics is presented (or even measured) strengthens the flawed notion that the Bumiputras are Malays. Yet, we know the Bumiputras comprise not just the Malays but also the Orang Aslis in the Peninsula, and the Borneo natives.

Foreigners in particular are guilty of this but more unforgivably, so do the locals. When ethno-nationalist Malays want to back their point with hard data for instance, they would go to the household surveys and cite the Bumiputra figures as proofs, casually suggesting all Bumiputras are Malays with no hesitation as if there is nothing wrong with the statistics.

Our contemporary politics also means the recognition is not merely a pedantic concern. Sarawak parties especially are becoming increasingly important nationally, possibly convincing the federal government to spend more money there.

How can this be relevant? For example, I would like to know change in welfare of those Borneo native households as federal spending increases. It is not enough to claim they would do better because of the spending. We need data and it is certainly not enough with the Bumiputra net cast so widely.

So, as far as the category Bumiputra is concerned, I think it should be broken into its finer components to allow us to see exactly the state of various groups’ welfare.

After all, is it not ironic that for all the centrality of Malay politics, statistics on Malay welfare is not available on its own? We can know the income of the median Chinese and Indian households but we cannot know the median for Malay families. To belabor the point, what we know instead are Bumiputra statistics, which are at best a proxy to the state of the Malays. And we know it is a proxy because we know the Malays make-up the majority within the group. How big a majority? Interesting question, is it not?

And we also know how mean and median behave mathematically. Change in population will change both easily.

I have a lingering suspicion that the Malays are doing better than the reported Bumiputra average/median. My suspicion is based on the fact most Malays live in the Peninsula while the statistics show the Peninsula as a geographic group does better than the Malaysian Borneo (even when certain states such as Kelantan can do worse than Sarawak). The only way to conclusively address the suspicion is to look at the Bumiputra components cleverer than what we have been doing so far.

At the very least, regardless of my suspicion, improvement in reported welfare statistics with the Bumiputra category split into its constitutions can lead to better public debates and better policies. Without the split, we are forever condemned to debate from imprecise premises.

Categories
Economics

[2752] Dude, where is my standard error?

When the economy grew 6.4% from a year ago, does it really mean it grew exactly at that rate?

Those kinds of statistics are supposed to give us the hard figures that we all can fall back as the one and only truth. Like the physical ones where a meter ruler is a meter long. But those with statistical learning would understand these macro numbers, from the GDP to industrial production to prices are not free from errors (even the ruler has an error but I would think that error here would be considerably smaller compared to that suffered by macro numbers, unless, it is an astonishingly bad ruler). The GDP for instance is not exactly an account of a small company that has in it all of the company’s expenditure. That macro figure is at best an estimation of what is happening in the economy. The fact that we keep restating (not rebasing) the GDP figures every now and then tells you that just as much.

Yet, after working in the financial sector, I am quite surprised to learn at how standard error/standard deviation plays a minuscule role in most analyses. I think this is a problem because without reference to errors, data providers in various government agencies as well as analysts and economists in the financial market give the illusion that their data and their analyses (strictly non-normative commentary of the data) come with absolute confidence (academic economists have better record at this). When the economy grew at 6.4% in a period, it is 6.4%.

But that confidence is overblown. It is not really 6.4% exactly. The truth is that it is possible the GDP had grown around that figure. What exactly, nobody knows. Maybe that is a technological question that would be solved some time in the future. In the meantime, there are some errors in the data.

Before we go on further, for the benefits of those without basic statistical training, I want to emphasize that these errors are not mistakes. They are simply uncertainty that comes along with the data. Uncertainties are there because we cannot know everything about the world. But we can know enough to know about the general situation. Hence the usefulness of these inexact macro figures. I call it inexact, because the true figures fall within a range and it is a stretch to suggest a point figure is the true figure with certainty.

I have no doubt that these economists, analysts and statisticians understand the meaning of errors and its importance. I am not overly worried about this group who work revolves around data. They know there are revisions and they know the numbers can change. They know there are errors. They know these macro numbers provide a useful guide to the happenings in the economy and these figures are not exact numbers. It is more of a sample — a good sample to generalize the population — rather than the actual universe. Whenever they refer to a number, they have the statistical caveat at the back of their mind.

I worry about the non-expert consumers of these data and analyses. These users do not understand this and they take the figures put down as the truth. Consider for example the discontent against the inflation rate in Malaysia, where there are critics who claim it is too low. I think the publication of standard errors of consumer prices would partly help address their concerns by telling them that there is uncertainty in the recorded prices. Still, this will not address the criticism against the CPI too much because the critics also appear fail to understand that the weight of the final CPI number is in such a way that it measures the middle Malaysians. But we have enough microdata that those weights can be reconstructed to fit more than the middle Malaysian. But I think this is a different issue which I have addressed in the past.

I am bringing up the non-reporting of standard error/standard deviation issue because I am bit peeved when I see news reports that goes something like ”Malaysian industrial production growth grew slower at 4.3% from 4.4% last month.” Or the GDP grew faster at 6.4% in 2Q versus 6.2% in 1Q. I mean really, is it truly a deceleration/acceleration? Are we not just sensationalizing it? I am particularly annoyed when economic-illiterate politicians start to sensationalize these figures, spreading uninformed views to the wider public.

(Another example is the idea that China is the largest economy in the world in PPP terms. But how about including the standard error inside too before making that pronouncement? I bet Chinese GDP has an outrageously big interval.)

Is it not enough just to say, ”hey, the economy is doing okay”?

When I see that kind of changes, I am more inclined to say it is stable. In fact, we can get more scientific about it. Calculate the index’s standard deviation and do hypothesis testing to see if the change is significant or not. It is very easy to do such testing these days.

I admit, it is less sexy and mouthful to say ”there is X% probability that the economy grew faster compared to the rate in previous period”, than to say ”the economy grew faster today versus yesterday.” But are we sacrificing truth for sexy, short, punchy, headlines?

I think yes.

I am guilty of not providing the standard deviation too, but I think we (can I use the pronoun we?) need to change our ways. Yes, I think we mainly write for each other, but we have to realize, these writings go out to the public as well. Our statistical caveat might not exist in others’ mind. We need to put those caveats explicitly in the open.

By sharing the standard deviation, I also think it shows others that we are being humble about our data. It says, “these are my best bets” instead of “this is it and there is no other way about it.”

Ideologically, from libertarian point of view, the humbleness is important. Libertarians believe in the superiority of the market over state actions. My belief (before I get banged up for being a blind market apostle, there are instances of market failure where the government needs to come in) in the superiority comes partly from the fact that we do not know everything about the world. I think the idea of standard error is part of that philosophy: the idea that we do not know everything. Again, there might be a time when technology will solve that and bring about a libertarian nightmare, but right now, there are enough cases out there to tell us to be humble.

Categories
Politics & government Sci-fi

[2098] Of one data point

I am unsure if I am recalling this accurately but at back in my mind, amid cobwebs of vague memories, I somehow remember reading an Asimov’s short story in a stuffy old library at the Malay College in Kuala Kangsar. You will forgive me if it is not even Asimov’s writing. It may well be a work of some other science fiction author. What I do have vivid recollection is the subplot of the story, however. Through the retelling of it, I hope that it may cause others to refrain from committing hasty generalization.

The story is set some time in the far future, maybe on Earth, maybe on Trantor or at some other place, I do not know. What is important is that the realm of human knowledge has expanded greatly. This includes in the field of statistics and in particular, sampling methods used to ascertain public opinion.

Sampling methods used today in real life suffer from certain errors arising from randomness and uncertainty. Notice how each time a respectable polling agency in reports result of a survey, it includes the margins of error of the findings, or more accurately, the standard errors, along with the averages. In the science fiction, statisticians of the future have developed a way to eliminate, fully, the errors associated with sampling.

In fact, the field of statistics in that fiction has reached a stage so advanced that the opinion of the public can be gauged accurately by simply sampling a person, who is a member of the public. In other words, all that is required to make general inference about the society is just one data point.

A sample size of one and that is it.

One.

Only one.

1Malaysia!

Oh my, I do not know how that gets in there.

Anyway, unfortunately in real life, reliability of a sample and therefore, the ability to generalize its statistics for inferential purposes decrease as the sample size decreases, more so at some range closer to zero. We are still finding ourselves a long way from living a statistician’s wet dream.

Yet, all too often in Malaysia today, individuals are quick to generalize the result of a by-election to describe national mood. It is perhaps acceptable to make an inference out of a series of by-elections held within a certain timeframe but it is dangerous to make a claim that a by-election signals a countrywide trend. It is dangerous because it is misleading.

A by-election only gauges the opinion of a certain type of individuals and these individuals are certainly not representative of the whole country. The voters in Bagan Pinang, from instance, are quite different from voters of Manei Urai, Datok Keramat, Damansara Utama or Likas. Although the national issues that they care about may coincide, their attitude toward the same issues is not the same due to their worldviews. And then, there are local issues. It is definitely safe to say that local issues that they face are different enough that one-size-fits-all approach is doomed to failure.

These voters, taken as whole, may provide some concrete statistics on the direction of national politics but individually in isolation, they are not so helpful.

With respect to Bagan Pinang, there are many other differentiating factors that further make result of its by-election unique to itself. As an example, not many areas have an army camp resides within its boundary. Another is its status as resort town, or rather, a resort town full of abandoned projects. Suffice to say, Bagan Pinang is not Malaysia.

Therefore, I have to disagree to sweeping statements made by multiple persons after the election. In The Star, Isa Samad was quoted as saying “The people of all races have spoken and this is an endorsement of the Prime Minister’s 1Malaysia concept.”[1] Deputy UMNO President Muhyiddin Yassin meanwhile said, “This is a significant victory and more importantly the people’s endorsement of the Prime Minister’s policies.”[2]

Perhaps, the people they are referring to are restricted to the voters of Bagan Pinang only. If it refers to Malaysians as a whole, then these two politicians and others who share similar tendency to generalize in so grandly a manner will have a hard time rationalizing trends in other areas.

This is not to say information from Bagan Pinang is worthless. It is not to say information that Bagan Pinang provides with national politics in mind is worthless. Rather, information from this by-election should be contextualized by taking into account several past and future by-elections held at different places if it is to make national sense. Without such contextualization, the one data point of Bagan Pinang might as well be a noise, or an outlier.

In the meantime, save a national election itself, the best barometers of national mood are countrywide surveys done properly. Unless, of course, we are living in a world created by that science fiction.

Mohd Hafiz Noor Shams. Some rights reserved Mohd Hafiz Noor Shams. Some rights reserved Mohd Hafiz Noor Shams. Some rights reserved

[1] — Isa thanked the people of Bagan Pinang for the victory, saying it was a win for Prime Minister Datuk Seri Najib Tun Razak’s 1Malaysia concept.

“The people of all races have spoken and this is an endorsement of the Prime Minister’s 1Malaysia concept,” he told reporters.

Isa also thanked the Barisan machinery for working tirelessly during the by-election.

“I’m also happy that the Malays, Chinese and Indians are now with Barisan. I hope this will have a domino effect for Barisan in the future,” he said. [Polling Day Live Coverage: Isa wins with thumping majority. Sarban Singh. Zulkifli Abd Rahman. The Star. October 11 2009]

[2] — A beaming Deputy Prime Minister Tan Sri Muhyiddin Yassin, who was present when the official results were announced just after 8pm, said the people had endorsed Prime Minister Datuk Seri Najib Razak’s 1Malaysia concept.

“This is a significant victory and more importantly the people’s endorsement of the Prime Minister’s policies. I congratulate the people of Bagan Pinang, including the Indians and Chinese, who came out in full support of Barisan,” he said at the tallying centre at the Port Dickson Muncipal Council hall. [Thumping win for Isa. Wong Sai Wan Sarban Singh. Zulkifli Abd Rahman. A. Lechutmanan. The Star. October 12 2009]

Mohd Hafiz Noor Shams. Some rights reserved Mohd Hafiz Noor Shams. Some rights reserved Mohd Hafiz Noor Shams. Some rights reserved

First published in The Malaysian Insider on October 12 2009.

Categories
Politics & government

[2004] Of Ms Fui does it again…

This is the kind of use of statistics that I absolutely abhor.

In the 1990 general elections, PAS’ support base stood at 375,867 votes. Last year, it reached 1.14 million, an almost threefold increase in 18 years. The huge increase in PAS’ support in last year’s general elections came mainly from its new supporters — the non-Malays.

By comparison, BN’s votes increased from 2.98 million in 1990 to 4.1 million last year, an improvement of only one-third. [BN vs Pakatan: Chinese reaction to PAS is the key. Fui K. Soong. The Straits Times via The Malaysian Insider. June 8 2009]

It is so bad, I think it is self-apparent. The logical gap is too wide to hide.

Spot the problem. Or problems.

Mind you, this is a CEO of an MCA think tank…

Mohd Hafiz Noor Shams. Some rights reserved Mohd Hafiz Noor Shams. Some rights reserved Mohd Hafiz Noor Shams. Some rights reserved

p/s — hints.

What was the percentage of Chinese who voted for PAS in 1990? In 2008?

What is the growth rate of total voters?

What about 1999?

Ong Kian Ming more or less raised this question in my Facebook account: how many seats PAS contested in 1990? in 2008?