How to Breach Genetic Privacy

Massive parallel sequencing technology has opened up endless possibilities in areas such as diagnosing clinical conditions, finding new drug targets, predicting disease risk and fighting crime. A room with twenty modern sequencing machines can sequence around a thousand human genomes per day. Most practical applications require knowledge of only a tiny section of the genome, which means that the rate at which genetic information can be acquired is truly astonishing. With it comes serious ethical considerations. What happens if your genetic information leaks and can be accessed by employers, insurance companies or adversaries with an axe to grind?

Erlich and Narayanan (2014) describe some of the techniques that can be used to breach the genetic privacy of individuals (with real-world examples of exploits) and discuss some of the methods that can be used to safeguard it from intruders.

How adversaries can breach genetic privacy

There are three larger categories of attacks: based on identity tracing, attribute disclosure using DNA, and completion attacks. Identity tracing is based on meta-data from scientific research, such as genotypic sex, date of birth, zip code and surname. Attribute disclosure attacks are based on accessing the genetic information of a person and then matching it against an anonymous sample linked to sensitive information. Finally, completion attacks allows the inference of target genotypic information based on other areas of the target genome or the genomes of relatives.

Identity tracing attacks

Identity tracing attacks starts with genomic information from an unknown individual. However, this is usually associated with metadata in the form of quasi-identifiers, such as genotypic sex, age, date of birth, zip code, surname and so on. Armed with this information, the adversary can drastically narrow down the range of possible targets to a small group, and then pin-point the individual with the help of information found social media websites such as Facebook. This is done with a wide range of techniques, such as surname inference, DNA phenotypic, demographic identifiers, pedigree structure and side-channel leaks.

How Modern Genomics Crushed Bigfoot Pseudoscience

Thousands of people around the world believe in the existence of a large primate that roams the mountain forests. It is known by many names, such as Bigfoot, Yeti and Sasquatch. Many of these enthusiasts even claim to have genuine biological samples from these creatures. Skeptics have so far remain unconvinced. No authentic photographs or video material has been produced (the one on the right is a man in a suit) and no bodies have been found. Meanwhile, cryptozoologists complain that scientist are not taking them seriously.

To remedy this problem, Sykes et. al. (2014) requested samples from all over the world, subject them to rigorous decontamination protocols, amplified the DNA and then sequence them in order to find out their identity. Guess what they found?

Risk Factors: Misunderstandings and Abuses

Although risk factors occupy a central place in medical and epidemiological research, it is also one of the most misunderstood concepts in all of medicine.

The World Health Organization (2009) defines a risk factor as: “A risk factor is any attribute, characteristic or exposure of an individual that increases the likelihood of developing a disease or injury. Some examples of the more important risk factors are underweight, unsafe sex, high blood pressure, tobacco and alcohol consumption, and unsafe water, sanitation and hygiene.” The CDC (2007) offers a similar definition: “an aspect of personal behavior or lifestyle, an environmental exposure, or a hereditary characteristic that is associated with an increase in the occurrence of a particular disease, injury, or other health condition.” However, the CDC also uses the term risk factor when it comes to sexual violence. For instance, they consider alcohol and drug use, antisocial tendencies, hostility towards women, and community-level tolerance to sexual violence.

Based on these sources, we can develop a simplified definition of a risk factor: if A is a risk factor for B, then the presence of A increases (but not necessarily in a causal sense) the probability of B occurring.

A is a risk factor for B does not necessarily mean that A causes B. It might be the case that A causes B only indirectly via some third factor, that B causes A, or that some third factor causes both A and B. In other words, correlation does not on its own imply causation. However, it is possible to disentangle these possibilities by measuring B at the start of the study. If physical punishment of children is a risk factor for aggressiveness, we can find out what comes first by measuring baseline child aggressiveness.

A is a risk factor for B does not mean that A will cause B in every instance of A. Smoking causes lung cancer, but some smokers can smoke all their lives without developing lung cancer. This does not mean that smoking is not a cause of lung cancer. It just means that there are other factors that also play a role. It is common for pseudoscientific cranks to bring up exceptions of this kind to argue against a correlational or causal association in an effort to spread uncertainty and doubt.

Half of Americans Believe in Medical Conspiracy Theories

An interesting study was recently published in JAMA Internal Medicine by Oliver and Wood (2014). They report the results of a YouGov survey that looked at the acceptance of medical conspiracy theories in the United States and what, if any, effect the belief in medical conspiracy theories had on health-related behavior, such as taking herbal supplements, getting a flu shot and preference for organic foods. The results were chilling as almost half of the U. S. population believed in at least one medical conspiracy. Those who held three or more were less likely to go to the doctor or dentist and fewer got vaccinated against seasonal influenza. They were also more likely to take herbal supplements.

The selection of medical conspiracy theories

Oliver and Wood selected six different medical conspiracy theories to include in their research. Although the researchers did not justify their selection, it seems representative and wide as it spanned from FDA and alternative medicine to discredited beliefs about the origin of HIV

The Pitfalls of fMRI-Based Lie Detection

A while ago, an interesting paper on the promise and pitfalls of fMRI-based lie detection was published by Farah, Hutchinson, Phelps and Wagner (2014) in Nature Reviews Neuroscience. It is part of an ongoing article series by the journal examining the interplay between neuroscience and law. This installment discussed the reliability of observed associations between certain brain areas and deception, current limitations of fMRI-based lie detectors, how U. S. courts have treated appeal to fMRI data put forward as evidence as well as ethical and legal issues with the procedure. This post will also discuss ways of beating an fMRI-based lie detector.

Another article in that series that deals with common misconceptions about memory, memory distortions and the consequence of ignorance was covered here.

How does fMRI work?

An fMRI indirectly measure brain activity by measuring blood-oxygen-level dependent (BOLD) activity. This typically involve a lot of controls to make sure that researchers capture the neural correlates of what they want to study instead of irrelevant confounders. Typically, researchers compare BOLD activity during deception and truth-telling in an attempt to find the BOLD-signature of deception, which would give clues about the neural correlates of deception (i.e. patterns of brain activation associated with deception).

The theoretical rationale for fMRI-based deception is that there is probably a relationship between deception and cognition because deception is more demanding on memory and various executive functions than truth-telling.

What are the neural correlates of deception?

The paper performed a meta-analysis with the activation likelihood estimation (ALE) method. This is a way to measure overlap in neuroimaging data based on so-called “peak-voxel coordinate information” and thereby find out how reliable the association between deception and certain brain regions is. After applying their specific inclusion criteria, they identified 23 relevant studies. Their meta-analysis identified several areas as being associated with deception e. g. parts of the prefrontal cortex, the anterior insula and inferior parietal lobule. However, the between-study variation was enormous and no region was always identified.

Limitations

Despite the apparent high identification rate of deception, fMRI-based lie detection has a long list of very important limitations that effectively undermine any confidence in this technique for legal purposes

Butchering Scientific Studies

Sometimes, people who promote pseudoscience online try to reference the scientific literature. In one sense, this is progress. They are going from just making arbitrary assertions to trying to justify them. In another sense, it is a turn for the worse. That is because the papers they reference are either of incredibly low scientific quality or rarely support what is being claimed. However, the behavior gives the illusion of evidential support for some readers. A lot of the time, they damage their own position by spamming long lists of links to videos and blog articles, but some promoters of pseudoscience are more sophisticated.

Previously, I wrote a short introduction on how to counter cranks that reference the scientific literature. Consider this to be the intermediate to advanced version. It will attempt to provide scientific skeptics with additional tools to counter pseudoscience online. The focus will be on research articles, specifically clinical trials. However, the general arguments can often be extended to other forms of research articles. Some of the tools are evidential or methodological in nature and directly related to the meat of the article such as whether or not there was a control group or control for confounders, the appropriateness of the statistical analysis and whether the conclusion accurately reflected the results. Others are more sociological in nature, looking at the journal itself, the presence or absence of peer-review, impact factor, who the authors are etc. These do not necessary count against the research in the article directly and should not be used alone, but provide useful external arguments if combined with criticisms of the study itself. There is of course some overlap between and within these broad categories.

First, a word of warning. Knowledge can be used for good or evil, and this is no exception. It is very dangerous to find oneself in a situation when the studies that run counter to one's position are subjected to merciless criticisms while the research that support it is being accepted with little or no critical thought. This is known as pseudoskepticism and something to avoid at all cost. It can even undermine the rationality of some of the giants in science seemingly without difficulty.

How HIV/AIDS Denialists Abuse Bayes’ Theorem

Image by Matt Buck, under Attribution-ShareAlike 2.0 Generic.

Note: Snout (Reckless Endangerment) has made some good arguments in the comment to this post. The gist is that HIV/AIDS denialists overestimate the false positive rate by assuming that the initial test is all there is, when in fact, it is just the beginning of the diagnostic process. Snout also points out that it is probably wrong to say that most people who get tested have been involved in some high-risk behavior, as a lot of screening goes on among e. g. blood donors etc. I have made some changes (indicated by del or ins tags) in this post because I find myself convinced by the arguments Snout made.

There have already been several intuitive introductions to Bayes’ theorem posted online, so there is little point in writing another one. Instead, let us apply elementary medical statistics and Bayes’ theorem to HIV tests and explode some of the flawed myths that HIV/AIDS denialists spread in this area.

The article will be separated into three parts: (1) introductory medical statistics (e. g. specificity, sensitivity, Bayes’ theorem etc.), (2) applying Bayes’ theorem to HIV tests to find the posterior probability of HIV infection given a positive test result in certain scenarios and (3) debunking HIV/AIDS denialist myths about HIV tests by exposing their faulty assumptions about medical statistics. For those that already grasp the basics of medical statistics, jump to the second section.

(1) Introductory medical statistics

A medical test usually return a positive or a negative result (or sometimes inconclusive). Among the positive results, there are true positives and false positives. Among the negative results, there are true negatives and false negatives.

True positive: positive test result and have the disease.
False positive: positive test result and do not have the disease

True negative: negative test result and do not have the disease.
False negative: negative test result and have the disease.

For the purpose of this discussion, $+$ will indicate a positive test, $-$ will indicate a negative test, $HIV$ will indicate having HIV and $\neg HIV$ will indicate not having HIV.

$P(A)$ is the probability of an event A, say, the probability that a fair dice will land on three. Conditional probabilities, such as $P(A \mid B)$, represents the probability of event A, given that event B has occurred. If A and B are statistically independent events, then $P(A \mid B) = P(A)$, if $P(B) \neq 0$ (because the definition of $P(A \mid B)$ has $P(B)$ in the denominator).

Let us define some conditional probabilities that are relevant for HIV tests and Bayes theorem:

Together with evolution, heritability is perhaps one of the most misunderstood and abused concepts in biology.

Some white supremacists appeal to moderate to high estimates of heritability for phenotypic traits to justify genetic determinism, that genes explain between-group differences, the discrimination of ethnic groups or other malignant and pseudoscientific beliefs that are incompatible with science.

Some egalitarian dislikes scientific results regarding moderate to high heritability estimates because they believe that it indicate that the environment is unimportant in explaining the phenotype of individuals and latch onto single studies showing low heritability as if that meant that genes are less important.

As we shall see, both of these groups believe things that are flawed from a scientific standpoint. But before we discuss why and how this is, it might be beneficial to know something about what heritability actually is. Our definition of heritability will be unpacked and improved in several stages to facilitate understanding.

Some Falsehoods about the Y chromosome and Male Brains

Note: Greg Laden has made a comment on this post saying that I misrepresented his position. I am open to the possibility and have therefore asked some follow-up questions, but at the time of writing this note (2012-07-26 22:23 GMT +1 DST), Laden has not clarified the situation for me. Keep this in mind while reading this post. Will update this again when he does.

Note: I just noticed (2012-07-28 22:08 GTM +1 DST) that Heina Dadabhoy did not mean what she actually said, but said it as a joke in response to a tweet by Zvan. There is an alternative explanation, namely as a post hoc rationalization when Heina discovered that she had been called on it, but it seems less likely. In essence, this means that we can probably consider both the claim made by Greg and Heina to be jokes or awkwardly expressed science. The only think left now is for Greg to finish writing up his follow-up and/or setting me straight by explaining more in detail in what way I misrepresented him.

Note: As a clarification (2012-07-28 23:06 GMT +1 DST) for Kelseigh Nieforth (‏@Nezchan), I reject this alternative explanation. It is possible, but relatively implausible. I did not intend to sound “mean-spirited & insulting”, quite the opposite. My intent was to rebuke what I felt was going to be the standard misogynist reply (i.e. claiming that Heina only said it was a joke when she noticed it had gotten a lot of attention and reflected badly upon her).

Note: Greg Laden has clarified his position over at his Scienceblogs blog. The general idea is that testosterone alters the male brain during different stages of development and “damaged” referred to the fact that androgens and other biosocial factors influence certain men to be more statistically likely to exhibit socially noxious and harmful behaviors that are incompatible with progressive, egalitarian and peaceful world. I have no general problems with this position (note added 2012-08-03 20:16 GTM +1 DST).

Note: This blog post has been linked by a men’s rights activist blog. All forms of discrimination is morally wrong, but most men’s rights activism I have come across seems to be equal parts pseudoscience and blanket anti-feminism. I therefore, in general, reject men’s rights activism. This post should not, and cannot, be interpreted as giving men’s rights activism any support, whatsoever (clarification added 2012-08-04 14:14 GMT +1 DST).

The background to this story is that Heina Dadabhoy and Greg Laden, at a panel discussion on gender differences at SkepchickCon/CONvergence, claimed that the Y chromosome was “broken” and that the male brain is a female brain damaged by testosterone. Amidst substantial criticism of these claims, the FtB blogger Stephanie Zvan decided to take upon herself to defend these flawed notions. As we shall see, her attempt is filled with incorrect characterizations and selective use of the scientific literature,

But first, let us make sure we have understood the claims being put forward in the video, so that we do not incorrectly characterize them as something they are not. A video of the panel discussion can be found here. I will post enough of the discussion for context, but readers are encouraged to check if I have gotten everything right. Laden was especially hard to take a transcript of, because he talks very fast and often changes mid-sentence, but hopefully I got the gist. It starts with a question from the audience at 35:41 about the gender differences in autism diagnosis and how males are supposedly more often autistic than females:

Heina Dadabhoy: That is an underdiagnosis issue, actually. They have been doing more and more research on women and autism. A lot of us women who fall on the spectrum only find out when we are adults, because a lot of the behaviors that manifest…the ways that girl tend to manifest it is slightly different and you know a girl who gets obsessed with something they are like “oh, well she is a girl and she has her little obsessions, how cute and when it is a boy it is like “oh, why isn’t he out beating up his peers?” so that is a big issue with autism.

Member in the audience: …inaudible… [probably something to do with differential disease susceptibility between genders e. g. red-green color blindness or hemophilia – E. K.]

Heina Dadabhoy: That is the Y chromosome. It’s broken [Dadabhoy smiles and laughs – E. K.]

Greg Laden: There is… there is … One thing that psychology does…There is some reasonable evidence that certain….There are gender differences.. [inaudible]. But there are gender differences. One of the most important gender differences.. in other words males versus females do not overlap that much at all… in certain areas and one of…one place they do not overlap at all, and you can’t change this… with culture… much..like you can change spatial orientation by giving everyone Tetris when they are born and will be the same. What we can’t change is that, for example, is the number of kids that cannot read until much later…the age at which you start to read and how you have dyslexia and so on that are boys is an order of magnitude higher in girls and you can do everything you want to fix that and you can only fix them a little bit. Most of those differences disappear and are not necessarily that significant, but is real. You know, the male brain is a female brain damaged by testosterone in various stages in it’s life. I think probably there are some very interesting adult difference…you cannot look at at a person and say that, but population differences between males and females that has to do with brain development because hormonal differences and…most of them are probably kind of trivial but there probably are some…yeah autism…I don’t think that is an example of one, but there probably are some things but if we where that different, it would be a hard time communicating…[inaudible].

So, right of the bat we can see that Zvan has incorrectly characterized both what Dadabhoy and Laden had stated. Dadabhoy stated that the Y chromosome was broken, not, as Zvan wants to have it that the Y chromosome is a broken X chromosome. Laden stated that male brain is a female brain damaged by testosterone in various stages in it’s life and did not use the term development. As we shall see, it is these false characterizations that Zvan’s bases her arguments on, but the bigger problem is that Zvan has no scientific foundation for her argument, leading the entire tortuous justification of the notion that men are genetically and neurologically “broken” to collapses onto itself.

The Y chromosome is not broken, but contains 86 unique and functioning genes

In her attempt to justify the absurd notion that men are genetically broken, Zvan appeals to the fact that the Y chromosome cannot recombine with the X chromosome to the same degree that the X chromosome can with another X chromosome. While this is true, this does not justify the original claim that the Y chromosome is a broken X chromosome, or the stronger claim that the Y chromosome is broken. In fact, the Y chromosome contains 86 fully functioning genes and this does not even count the genes that exists on both the X and Y chromosome. For the vast majority of individuals, the Y chromosome is fully functional and does not produce genetic defects or pathology. So nothing is actually “broken”.

X-linked recessive disorders signify a problem with the X chromosome, not the Y one

Zvan points out that males are more at risk for certain heritable disease because the related gene only occurs once, while in females it occurs twice (since they have two X chromosomes). This is also true, but the causative factor is the disabling mutation in the X chromosome that causes the disease, not something to do with the Y chromosome. So in other words, what is “broken” in these cases, is the X chromosome, not the Y.

Lack of large-scale recombination is sometimes a good thing

The loss of an ability for large-scale recombination is not something uniformly bad. In fact, if large-scale recombination between the Y chromosome and X chromosome was possible, it could result in males without the necessary sex-determining or sex-influencing regions in their Y chromosomes and females with harmful genes only found on the Y chromosomes, so the lack of large-scale recombination between X and Y is clearly adaptive. A loss does not need to be evolutionary or physiologically detrimental.

Genetic Risk Factors and Parental Responsibility

The interaction of nature and circumstance is very close, and it is impossible to separate them with precision. Nurture acts before birth, during every stage of embryonic and pre-embryonic existence, causing the potential faculties at the time of birth to be in some degree the effect of nurture. We need not, however, be hypercritical about distinctions; we know that the bulk of the respective provinces of nature and nurture are totally different, although the frontier between them may be uncertain, and we are perfectly justified in attempting to appraise their relative importance.

– Sir Francis Galton, Inquiries into human faculty and its development (1883).

The nature versus nurture (or biology versus the environment) controversy has raged on for thousands of years. Modern science, however, has rejected this dichotomy as trivially false. It is not nature versus nurture, but nature through nurture. Both play essential roles in shaping organisms such as ourselves and they often interact with each other. However, as Galton remarked above, one could still discuss the relative merits of partial biological and environmental explanations. When people reduce the complex interaction of biology, psychology, biological and social environment to “mostly biology” or “mostly environment”, they are perpetually restraining humanity into the black-and-white cage that is nature versus nurture, despite paying lip service to modern science. Worse is that “mostly biology” is incorrectly interpreted as some form of genetic determinism, whereas “mostly environment” is erroneously conceived as the notion of the blank slate and the hail of vitriolic straw man arguments begin. The fact that some Internet commentators, journalists and other interested parties do not have sufficient scientific understanding, especially with regards to biology and psychology, makes it even more troubling. This, in turn, leads to a lot of misunderstandings about the science.

Clearly not the best setup for an intellectually productive discussion. Read more of this post