Til Death Do Us Chart

A Study on the Effect of Life Events on Marital Status






Scott Mcalister
Thomas Oldfield
Heather Rodney
Brian Rotolo
Elizabeth Yim






Background

I get knocked down, but I get up again, and you’re never gonna keep me down.

Chumbawamba, 1997

A lot can change in twenty years. The 1997 National Longitudinal Survey of Youth follows over 8000 participants from their teens through their thirties, asking hundreds of questions and recording how the responses change over time. When we found this dataset, we knew we wanted to use this project to explore this extraordinary window into lives at their most interesting and chaotic. And what else could we do but dive straight into the dirt: relationship, cohabitation, and marital status.

We utilized Pandas, Plotly, and Seaborn to explore the data and determine the needs and focus of our model. This allowed us to dig through the massive amount of data, visualize results, and look for patterns and relationships between certain variables. Out of this exploration, we engineered our own composite field, lovingly titled the Longitudinal Chaos Index.

The Longitudinal Chaos Index (LCI) is a measure of the objective chaos of a subject’s love life throughout the study. The more changes in your cohabitation/marital status, the higher your LCI. To create the LCI feature, we looked at reported marital status over the span of the study. The specific interview question used in the study was “Respondent's marital status in this month in [1994-2016]," which was recorded every month after the subject’s fourteenth birthday. There were six possible responses:

  • Never Married, Not Cohabitating
  • Never Married, Cohabitating
  • Married
  • Legally Separated
  • Divorced
  • Widowed

The LCI was designed to start at zero for each subject and increase with any change to the research question response. All but two changes result in an increase by one point. The first exception is the change from “Never Married, Cohabitating” to “Married,” which we determined would not increase a subject’s chaos and therefore does not change the LCI. The second exception is the change from “Married” to “Divorced,” which increases the LCI by two points. After the survey subjects’ LCIs were calculated, we created scoring divisions to categorize LCIs into final, qualitative measurements:

  • Sad and Alone (SA)
  • Happy Together (HT)
  • It’s About the Journey (AJ)
  • Train Wreck (TW)

Exploration



Machine Learning

After defining the LCI, we created a simple model (click to open/close the single iterator code viewer) to evaluate each variable individually and determine which variables most affected LCI. We then created a multiple linear regression model (click to open/close the model regression code viewer) to predict LCI based on life events.

Click to open/close the eleven most predictive interview questions and the adjustments made to fit our predictive model.

Rank Year Interview Question Possible Responses Model Adjustments
9% 1999 Have you had sexual intercourse since the last interview on [date of last interview], that is, made love, had sex, or gone the way with a person of the opposite sex?
  • Yes
  • No
  • Approximate equivalent to age 18
    8% 2014 Respondent's marital status in this month [July] in 2014?
  • Never Married, Not Cohabitating
  • Never Married, Cohabiting
  • Married
  • Legally Separated
  • Divorced
  • Widowed
  • Approximate equivalent to age 33 (current age for participants under age 33)
    7% 2002 Relationship of the parent figure(s)/guardian(s) in household to the youth as of the survey date?
  • Both biological parents
  • Two parents, biological mother
  • Two parents, biological father
  • Biological mother only
  • Biological father only
  • Adoptive parent(s)
  • Foster parent(s)
  • No parents, grandparents
  • No parents, other relatives
  • Anything else
  • Approximate equivalent to age 21
    7% 2014 Respondent's monthly arrest status in March 2014? (Calculated for each month beginning with the month that R turned 12.)
  • R not arrested in this month and not arrested in a previous month
  • Number of times R arrested in this month
  • R arrested previously but not in this month
  • Approximate equivalent to monthly average at age 33 (current age for participants under age 33)
    6% 1997 Delinquincy Score Index? (Scores range from 0 to 10; higher scores indicate more incidents of delinquency.)
  • 0
  • 1 to 2
  • 3 to 4
  • 5 to 6
  • 7 to 8
  • 9 to 10
  • Approximate equivalent to age 16
    5% 2011 Total respondent timing from round 15 interview (measured in seconds). This is the total interview time excluding the interviewer remarks, the locator section, and the locator questions in the household information section.
  • .1 to 1000.0
  • 1000.1 to 2000.0
  • 2000.1 to 3000.0
  • 3000.1 to 4000.0
  • 4000.1 to 5000.0
  • 5000.1 to 6000.0
  • 6000.1 to 7000.0
  • 7000.1 to 8000.0
  • 8000.1 to 9000.0
  • 9000.1 to 10000.0
  • 10000.1 to 20000.0
  • 20000.1 to 30000.0
  • Unmeasurable for predictive model, therefore excluded
    5% 1998 Substance Use Index? (Scores range from 0 to 3; higher scores indicate more instances of substance use since the date of the last interview.)
  • 0
  • 1
  • 2
  • 3
  • Approximate equivalent to age 17; Will be determined by frequency
    2% 2002 Collapsed distance in miles between the respondent's reported address and the father's reported address?
  • Lived in the same household
  • 1 to 5 Miles
  • 6 to 10 Miles
  • 11 to 30 Miles
  • 31 to 60 Miles
  • 61 to 100 Miles
  • 101 to 200 Miles
  • 201 to 400 Miles
  • 401 to 700 Miles
  • 700 Miles
  • Approximate equivalent to age 22
    2% 2010 Where 1 means disagree strongly and 7 means agree strongly, how much do you agree or disagree that the following statements describe who you are and how you act? "When I was in school, I used to break rules quite regularly."
  • Disagree strongly
  • Disagree moderately
  • Disagree a little
  • Neither agree nor disagree
  • Agree a little
  • Agree moderately
  • Agree strongly
  • Age equivalent is irrelevant because question refers to past behavior in grade school
    2% 2002 How much of the time during the last month have you felt so down in the dumps that nothing could cheer you up?
  • All of the time
  • Most of the time
  • Some of the time
  • None of the time
  • Approximate equivalent to age 21
    2% 1997 What percent of your peers belong to a gang that does illegal activities?
  • Almost none (less than 10%)
  • About 25%
  • About half (50%)
  • About 75%
  • Almost all (more than 90%)
  • Approximate equivalent to age 16

    Calculate your LCI

    Calculate your LCI based on your life events, using our predictive model below.


    Sad and Alone (<0.8)

    Not necessarily a life of cats and single serving freezer pizzas, but we do predict a very low chaos future for you.

    Happy Together (0.8-2)

    Marriage? Stable living situation? One messy cohabitation? All of these fit into the low-to-mid chaos range, and so do you!

    It’s About the Journey (2-6)

    As chaos increases, life is taking some interesting turns. Divorce, tragedy, repeated breakups are the hallmarks of this range. Good luck!

    Train Wreck (>=6)

    A chaos value of more than 6 takes effort. Skill. Commitment (or... well, you get it). Buckle up, the road ahead may be bumpy.

    Conclusions

    People are hard to predict. Even using the comprehensive research of the NLS, we were unable to find single variables that strongly correlated with our LCI.

    However, the accuracy of predictions, and the utility of the model, is increased by combining variables. Speaking broadly, rule-breaking behavior in teenage years and further removed relationships with biological parents increase the likelihood of a higher LCI. Still, the model comes with built-in weaknesses. Many of the statistics we used were challenging to measure numerically: assigning number values to “foster parents” or “single father” is somewhat arbitrary, and does not lend itself well to this type of model. Taken as a whole, our model demonstrates interesting profiles and trends within populations, but its predictive value remains unreliable.




    Implications

    With the rise of big data, with every move we make and click we take leaving behind digital trails across countless databases, the emergence of strong predictive trends from individual actions to life outcomes could be seized upon to dramatic effect. Facebook and Google, digital marketers, not to mention entities like the NSA, all have access to the kinds of personal information requested in the NLS, and will soon be rivaling its 20+ year span of collection. Working with the NLS, we can gain some insight into what a deep data portfolio on an individual might contain, and the type of conclusions that might be run on it.

    Taken in that light, it becomes somewhat reassuring that no strong trends or ironclad predictions arose from our model. The statistical chaos of human behavior wards off any possibility of “Minority Report” style dystopias, at least for now.