Risky behavior trial data

Source: RiskyBehavior/risky.Rmd

Data from a randomized HIV-prevention trial for high-risk couples. Treatment arms were control, woman-only counseling, and couple counseling; one outcome is the number of unprotected sex acts after three months.

Code
from pathlib import Path
import pandas as pd
import matplotlib.pyplot as plt

root = Path("../../ROS-Examples")
risky = pd.read_csv(root / "RiskyBehavior/data/risky.csv")
risky.head()
sex couples women_alone bs_hiv bupacts fupacts
0 woman 0 1 negative 7 32.0
1 woman 0 0 negative 2 5.0
2 woman 0 0 positive 0 15.0
3 woman 0 0 negative 24 9.0
4 woman 1 0 negative 2 2.0

Quick structure check

Code
risky.describe(include="all")
sex couples women_alone bs_hiv bupacts fupacts
count 434 434.000000 434.000000 434 434.000000 434.000000
unique 2 NaN NaN 2 NaN NaN
top woman NaN NaN negative NaN NaN
freq 217 NaN NaN 337 NaN NaN
mean NaN 0.373272 0.336406 NaN 25.910138 16.489579
std NaN 0.484232 0.473025 NaN 31.917963 26.825769
min NaN 0.000000 0.000000 NaN 0.000000 0.000000
25% NaN 0.000000 0.000000 NaN 5.000000 0.000000
50% NaN 0.000000 0.000000 NaN 15.000000 5.000000
75% NaN 1.000000 1.000000 NaN 36.000000 20.925600
max NaN 1.000000 1.000000 NaN 300.000000 200.000000
Code
# Summaries by randomized arm when a treatment column is present.
for col in risky.columns:
    if risky[col].nunique(dropna=True) <= 5:
        print("\n", col)
        print(risky.groupby(col).size())

 sex
sex
man      217
woman    217
dtype: int64

 couples
couples
0    272
1    162
dtype: int64

 women_alone
women_alone
0    288
1    146
dtype: int64

 bs_hiv
bs_hiv
negative    337
positive     97
dtype: int64

The original R page only loads and displays the data; later causal chapters use this kind of randomized-treatment dataset for treatment-effect modeling.