Valentine’s Day is around the corner and love is in the air… but, shock horror, nearly every second marriage ends in a divorce! Unfortunately, I can tell you first hand that this is an experience you’d rather not have. In this post, we see how data science, in the form of the OneR
package and an interesting new data set, might potentially help you to avoid that tragedy… so read on!
In a scientific study last year in Turkey, nearly 200 participants (married as well as divorced) were being asked to rate how important they find the following statements (some of them seem to have got a little lost in translation from the original Turkish version):
1. When one of our apologies apologizes when our discussions go in a bad direction, the issue does not extend.
2. I know we can ignore our differences, even if things get hard sometimes.
3. When we need it, we can take our discussions with my wife from the beginning and correct it.
4. When I argue with my wife, it will eventually work for me to contact him.
5. The time I spent with my wife is special for us.
6. We don’t have time at home as partners.
7. We are like two strangers who share the same environment at home rather than family.
8. I enjoy our holidays with my wife.
9. I enjoy traveling with my wife.
10. My wife and most of our goals are common.
11. I think that one day in the future, when I look back, I see that my wife and I are in harmony with each other.
12. My wife and I have similar values in terms of personal freedom.
13. My husband and I have similar entertainment.
14. Most of our goals for people (children, friends, etc.) are the same.
15. Our dreams of living with my wife are similar and harmonious
16. We’re compatible with my wife about what love should be
17. We share the same views with my wife about being happy in your life
18. My wife and I have similar ideas about how marriage should be
19. My wife and I have similar ideas about how roles should be in marriage
20. My wife and I have similar values in trust
21. I know exactly what my wife likes.
22. I know how my wife wants to be taken care of when she’s sick.
23. I know my wife’s favorite food.
24. I can tell you what kind of stress my wife is facing in her life.
25. I have knowledge of my wife’s inner world.
26. I know my wife’s basic concerns.
27. I know what my wife’s current sources of stress are.
28. I know my wife’s hopes and wishes.
29. I know my wife very well.
30. I know my wife’s friends and their social relationships.
31. I feel aggressive when I argue with my wife.
32. When discussing with my wife, I usually use expressions such as “you always” or “you never”.
33. I can use negative statements about my wife’s personality during our discussions.
34. I can use offensive expressions during our discussions.
35. I can insult our discussions.
36. I can be humiliating when we argue.
37. My argument with my wife is not calm.
38. I hate my wife’s way of bringing it up.
39. Fights often occur suddenly.
40. We’re just starting a fight before I know what’s going on.
41. When I talk to my wife about something, my calm suddenly breaks.
42. When I argue with my wife, it only snaps in and I don’t say a word.
43. I’m mostly thirsty to calm the environment a little bit.
44. Sometimes I think it’s good for me to leave home for a while.
45. I’d rather stay silent than argue with my wife.
46. Even if I’m right in the argument, I’m thirsty not to upset the other side.
47. When I argue with my wife, I remain silent because I am afraid of not being able to control my anger.
48. I feel right in our discussions.
49. I have nothing to do with what I’ve been accused of.
50. I’m not actually the one who’s guilty about what I’m accused of.
51. I’m not the one who’s wrong about problems at home.
52. I wouldn’t hesitate to tell her about my wife’s inadequacy.
53. When I discuss it, I remind her of my wife’s inadequate issues.
54. I’m not afraid to tell her about my wife’s incompetence.
Now, the question is whether one can decide – on the basis of their ratings alone – whether a person will actually get divorced. Let us see if data science can help us in this love related matter!
The data and a link to the corresponding article can be found here: Divorce Predictors data set, I unpacked the data for your convenience, you can download it here: divorce.csv. Let us now use the OneR
package (on CRAN) to analyse it:
library(OneR) divorce <- read.csv("data/divorce.csv", sep = ";") divorce$Class <- factor(ifelse(divorce$Class == 0, "married", "divorced")) data <- optbin(divorce) model <- OneR(data, verbose = TRUE) # 18. My wife and I have similar ideas about how marriage should be ## ## Attribute Accuracy ## 1 * Atr18 98.24% ## 2 Atr11 97.65% ## 2 Atr17 97.65% ## 2 Atr19 97.65% ## 5 Atr9 97.06% ## 5 Atr16 97.06% ## 5 Atr20 97.06% ## 5 Atr40 97.06% ## 9 Atr26 96.47% ## 10 Atr12 95.88% ## 10 Atr14 95.88% ## 10 Atr15 95.88% ## 10 Atr25 95.88% ## 10 Atr30 95.88% ## 15 Atr29 95.29% ## 15 Atr36 95.29% ## 15 Atr39 95.29% ## 18 Atr4 94.71% ## 18 Atr8 94.71% ## 18 Atr21 94.71% ## 18 Atr27 94.71% ## 22 Atr5 94.12% ## 22 Atr37 94.12% ## 22 Atr38 94.12% ## 25 Atr41 93.53% ## 25 Atr44 93.53% ## 27 Atr1 92.94% ## 27 Atr2 92.94% ## 27 Atr10 92.94% ## 27 Atr24 92.94% ## 31 Atr22 92.35% ## 31 Atr28 92.35% ## 31 Atr31 92.35% ## 31 Atr33 92.35% ## 35 Atr13 91.76% ## 35 Atr32 91.76% ## 35 Atr35 91.76% ## 38 Atr23 91.18% ## 38 Atr34 91.18% ## 40 Atr54 90.59% ## 41 Atr50 89.41% ## 42 Atr3 88.82% ## 43 Atr42 87.65% ## 44 Atr51 87.06% ## 45 Atr49 84.71% ## 45 Atr53 84.71% ## 47 Atr7 82.35% ## 48 Atr47 81.76% ## 49 Atr48 80.59% ## 50 Atr52 80% ## 51 Atr43 78.82% ## 52 Atr45 77.06% ## 53 Atr6 74.12% ## 54 Atr46 68.24% ## --- ## Chosen attribute due to accuracy ## and ties method (if applicable): '*' summary(model) ## ## Call: ## OneR.data.frame(x = data, verbose = TRUE) ## ## Rules: ## If Atr18 = (-0.004,1.19] then Class = married ## If Atr18 = (1.19,4] then Class = divorced ## ## Accuracy: ## 167 of 170 instances classified correctly (98.24%) ## ## Contingency table: ## Atr18 ## Class (-0.004,1.19] (1.19,4] Sum ## divorced 3 * 81 84 ## married * 86 0 86 ## Sum 89 81 170 ## --- ## Maximum in each column: '*' ## ## Pearson's Chi-squared test: ## X-squared = 154.56, df = 1, p-value < 2.2e-16 plot(model)
prediction <- predict(model, data) eval_model(prediction, data) ## ## Confusion matrix (absolute): ## Actual ## Prediction divorced married Sum ## divorced 81 0 81 ## married 3 86 89 ## Sum 84 86 170 ## ## Confusion matrix (relative): ## Actual ## Prediction divorced married Sum ## divorced 0.48 0.00 0.48 ## married 0.02 0.51 0.52 ## Sum 0.49 0.51 1.00 ## ## Accuracy: ## 0.9824 (167/170) ## ## Error rate: ## 0.0176 (3/170) ## ## Error rate reduction (vs. base rate): ## 0.9643 (p-value < 2.2e-16)
So, the best predictor is the rating on statement 18. The question you should ask your partner before marrying him or her is, therefore, the following:
What is a good marriage for you?
A simple question but one that might reveal some major differences between your conceptions of what a good marriage is. In that case, the outlook is not good. The accuracy of the prediction is a whopping 98.24%! By the way, this is even slightly better than the 98.23% given in the paper (which is achieved by an artificial neural network).
Had I only known this 20 years ago…
Happy Valentine’s Day and stay tuned as we will take a little break and hopefully see you back on March 17’th!
Beautiful!!!
As always, highly appreciated! ?
Interesting!
Please note that the original paper conducting the study uses the word “spouse” instead of “wife”. In fact, the use of “wife” suggests that the questions were only posed to men (except for one where the word used is “husband”) which would imply a strong bias towards men’s point of view…
For your reference, I am referring to the following paper:
https://dergipark.org.tr/en/pub/nevsosbilen/issue/46568/549416
cited at the data source link you give in the article, namely:
https://archive.ics.uci.edu/ml/datasets/Divorce+Predictors+data+set
Thank you… yes, obviously those aspects got lost in translation! Please note that I used the neutral “partner” throughout the post (and “him or her”).
What a lovely article! Could I request to add you on LinkedIn?
I see you only have a follow…
Thank you for your feedback, I really appreciate it! Yes, just look under “More…” and send me an invite.
Awesome dataset. Thanks for sharing! I found some different (similar) results with:
– Most important questions would be #17 (We share the same views about being happy in our life with my spouse) and #40 (We're just starting a discussion before I know what's going on) followed by yours, #18.
– Results of model using XGBoost: accuracy 98%.
And, similarly, running correlations, you’ll get highest values (~92%) for questions 40, 17, 19, and 18:
lares::corr_var(df, Class_Married)
Thank you, Bernardo… different methods lead to different results! Yet it is reassuring that #18 is always among the top results.
Interesting that XGBoost performs (slightly) worse than OneR…