student alcohol consumption dataset

Secondary school student alcohol consumption data with social, gender and study information. There are a few columns which we think could be further clarified or changed. There are two categorical columns “Dalc” and “Walc” showing consumption on workday and weekend. Google Trends - look at what’s going on in the world. People who contributed to this were Aaron Patrick Nathaniel, Lim Yue Hng (Neil) and We chose workday alcohol consumption because drinking over workdays is more unusual than drinking over the weekends. impressionable generation. When lambda = 0, the log transform is used. This Student Alcohol Consumption dataset is based on data collected in two secondary schools in Portugal. We could take into consideration the “Using Data Mining to Predict Secondary School Student Alcohol Consumption.” Department of Computer Science,University of Camerino. Treatment utilization alcohol PDF 98 KB. This would help the classification model to more accurately predict the class GStatus Section 2a. We would think that if the value for health is lower, the value for their activites (column 19), romantic (column 23), famrel (column 24), goout (column 26), Dalc (column 27), Walc (column 28) column 33 (final grade). to 1 hour, or 4 – >1 hour), studytime – weekly study time (numeric: 1 – <2 hours, 2 – 2 to 5 hours, 3 – 5 to 10 hours, or 4 – >10 hours), failures – number of past class failures (numeric: n if 1<=n<3, else 4), schoolsup – extra educational support (binary: yes or no), famsup – family educational support (binary: yes or no), paid – extra paid classes within the course subject (Math or Portuguese) (binary: yes or no), activities – extra-curricular activities (binary: yes or no), nursery – attended nursery school (binary: yes or no), higher – wants to take higher education (binary: yes or no), internet – Internet access at home (binary: yes or no), romantic – with a romantic relationship (binary: yes or no), famrel – quality of family relationships (numeric: from 1 – very bad to 5 – excellent), freetime – free time after school (numeric: from 1 – very low to 5 – very high), goout – going out with friends (numeric: from 1 – very low to 5 – very high), Dalc – workday alcohol consumption (numeric: from 1 – very low to 5 – very high), Walc – weekend alcohol consumption (numeric: from 1 – very low to 5 – very high), health – current health status (numeric: from 1 – very bad to 5 – very good), absences – number of school absences (numeric: from 0 to 93), G1 – first period grade (numeric: from 0 to 20), G2 – second period grade (numeric: from 0 to 20), G3 – final grade (numeric: from 0 to 20, output target), Joining information from existing features (PCA is a common example, or some knowledge about how features are correlated), Depending on the model, remove features that are not important to the model. Section 2d. In the input, workday and weekend alcohol consumption is given in range of 1 - very low to 5 - very high. Earthdata. This information can give you a hint of the skewness and of possible outliers. https://archive.ics.uci.edu/ml/datasets/STUDENT%20ALCOHOL%20CONSUMPTION. because it would be less accurate for the classification model to predict a numeric value ranging from 0-20. the passing marks for a student in Portugal would be 10 out of 20. result as pass/fail rather than a discrete numeric number. For the data exploratory exercise, we choose to examine four columns: workday alcohol consumption, first period grade, second period grade and their final grade. This may not hold true because it is a possibility that the In this case, we see that the grades are highly correlated, meaning the higher the grades in one session, the higher the grades in another session. Singapore, however, brightens it up with colorful visualizations, splashes of color in the graphs, and a “Similar Datasets” section at the bottom of every data set to encourage readers to explore. At an alcohol consumption level of 1, the median and 25th percentile are the same value of 2 hours of study. For example, if there were a high correlation, say 0.9, between two numeric features, then the information provided to the model would be redundant, and depending on the model make the model more complex than it needs to be. EuroEducation.net. more serious towards their final grade rather than the first period grade and second period grade. Five columns play a major role in this which are: column 27 (workday alcohol consumption) Retrieved from http://www.euroeducation.net/prof/porco.htm. (2016), studied the relationship between married couples with their single counterparts and found out that if partners are more emotion. However, the data reveals that there was a total of 382 students that were in both datasets, this was evident in the exact If one is very high, you may want to take a closer look at the data and see if there is leakage into the target variable. I'm sorry, the dataset "STUDENT ALCOHOL CONSUMPTION" does not appear to exist. The data set consists of two files, one for students in a math class, and the other contains information about students in a Portuguese class. information about the students from the mathematics course only. The traditional in section E as part of the preprocessing before plotting the data for our exploratory data analysis. Its value for the week is normalized as (workday_alcohol_consumption 5 + weekend_alcohol_consumption 2)/7 If the value is greater than 3.0, then alcohol consumption is considered too high. Tobacco and nicotine use TUD PDF 493 KB. The following plot shows the prominence of the target: This shows that the target is imbalanced, so we may benefit from oversampling or under-sampling when building our model. It is a usual train of thought that those who have a bad relationship with their family members will be stressed and unhappy which results in them We shall see which consensus holds true. The following results show the skewness for the numeric features: As we suspected, the feature ‘absences’ contains the most skew. Our explanation would be more focused on the final grade because we think that students will be Alcohol is an often abused substance that troubles many individuals in their adulthood as they struggle to cope with emotional and physical stress that The data collected, in locations such as Gabriel Pereira and Mousinho da Silveira, includes several values of pertinence. Excessive alcohol use, either in the form of binge drinking (drinking 5 or more drinks on an occasion for men or 4 or more drinks on an occasion for women) or heavy drinking (drinking 15 or more drinks per week for men or 8 or more drinks per week for women), is associated with an increased risk of many health problems, such as liver disease and unintentional injuries. Generally, many models prefer using features that are independent of each other and have low correlations. The dataset is originally designed for the estimation of high school student’s performance where alcohol consumption is used as one of the parameters. administrative or police), ‘at_home’ or ‘other’), reason – reason to choose this school (nominal: close to ‘home’, school ‘reputation’, ‘course’ preference or ‘other’), guardian – student’s guardian (nominal: ‘mother’, ‘father’ or ‘other’), traveltime – home to school travel time (numeric: 1 – <15 min., 2 – 15 to 30 min., 3 – 30 min. Since the distribution is log normal, applying the log transformation would be the most applicable. by Dinescu et. The target is the weekday drinking level 1 to 5 and the weekend drinking level 1 to 5. al. The primary reason for this data was to see the effects of drinking and grades. Examples of We test hypothesis 0 (h0) that the numeric variable has the same mean values across the different levels of the categorical variable. A lot of time is lost I alcohol consumption that the students only place less time in their academic work. However, the assumption is that the alcohol consumption is high because the student's This will be explained in the next section (Section C). avoid drinking in order to prevent their health from further deterioration. that particular student's success. 45 Using Python to Analyze Secondary School Student Alcohol Consumption and Their Academic Performance 1Poonam Kumari and 2Aditya Pratap 1Research Scholar, Department of Computer Science, IITM Janakpuri, New Delhi,India 2Research Scholar, Department of Computer Science, IITM Janakpuri, New Delhi,India poonam.kumari561999@gmail.com, … reductions of GPA. It can develop a plethora of emotions in oneself, may it be a positive or negative Modify the number features by: Depending on the algorithms used in the model building, the following features may produces better results as numeric and normalized. This helps you to understand the top dependent variables (grouped by numerical and categorical). For numeric data, correlations are important to help determine if we should join information of highly correlated features. We think that classification is the best data mining technique to be employed because we can build a classification model to Economics of Education Review, 30(1), 1-15. The original data comes from a survey conducted by a professor in Portugal. This dataset was collected in order to study alcohol consumption in young people and its effects on students’ academic performance. In our data set, many of the categorical features are numeric, but for this illustration, we will continue with treating them as categorical. Correlation does not imply causation. Section 3b. 2014. We will take a closer look at the distribution of this feature. Here are top 25 websites to gather datasets to use for your data science projects in R, Python, SAS, Excel or other programming language or statistical software. These short term effects of alcohol could lead to poor academic performance, poor health and disruptive social behavior. You can browse the subreddit here. Dinescu, D., Turkheimer E., Beam, C.R., Horn, E.E., Duncan, G., Emery, R.E. Fedu and Medu correlate more that some others, so we might want to combine the information. consensus is that students who consume alcohol at high levels tend to skip more classes and perform worse in their studies, thus, resulting in lower As you will see in the data, on average, our campus sends at least one student to the emergency room per week who is in some kind of trouble connected with alcohol. Remove the skewness from the numeric data. courses of mathematics and Portuguese. The types of columns are listed as follows: One way to get an idea about the structure of the data is to calculate basic statistics, such as the min, max, mean, and median, and missing value counts. I will be utilizing the student alcohol consumption dataset provided by UCI Machine Learning and is available in their machine learning repository. 5. Other Cool Sets. student's relationship with his/her family is low because of their high level of alcohol consumption. The box plot portion of the graph also helps us identify outliers. Core measures include: Baseline surveys included standard demographics, religiosity, current alcohol and drug diagnoses (DIS), ASI alcohol, drug and psychiatric problem severity, number of heavy drinkers in social networks, prior treatment utilization, and lifetime and past-year 12-step meeting attendance and involvement, Six- and 12-month surveys involved a subset of these … comes with the mantle of adulthood. For categorical values, we use Cramer’s V. For numeric values, we use Eta-squared value. Section 2c. We only do this for illustration. This modification coincides with the original report where the authors modified the target with the formula acl = (Dalc * 5 + Walc * 2) / 7 and then assumed values of 3 or more were heavy drinkers. February 2016 DOI: 10.13140/RG.2.1.1465.8328 READS 2,200 2 authors: Fabio Pagnotta Hossain Amran University of Camerino University of Camerino 8 PUBLICATIONS 0 CITATIONS 5 PUBLICATIONS 0 … at Kaggle. The original data contains the following attributes for both student-mat.csv (Math course) and student-por.csv (Portuguese language course) datasets: The following grades are related with the course subject, Math or Portuguese: Before exploration, we combine the rows of the two data sets and mark each instance with the class in which the survey was taken. such data are records of demographic information, grades, and alcohol consumption. You can see the level of correlation by the degree of the ellipse. With the Student Alcohol Consumption data set, we predict high or low alcohol consumption of students. While … This analysis was done as part of fulfilling the Data Mining course in Multimedia University. intimate, they will drink less. We could perform this merge differently later by performing a full join and then dealing with the NA values, by performing the analysis on the individual sets, or by inner joining the two sets and just working with that data. (Pullen, 1994). Be sure to change the type of field delimiter (“;”), line delimiter (“\n”), and check the Extract Field Names checkbox, as specified on the image below: We don’t need G1 and G2 columns, let’s drop them. Section 3a. Exploratory Data Analysis on the Student Alcohol Consumption dataset (Code) December 31, 2016 | 21 Minute Read This post is an execution of the explanations from this blog post. Although student achievement is highly influenced by past evaluations, an explanatory analysis has shown that there are also other relevant features (e.g. However, a research conducted in the United States by Balsa (2011), showed that increases in levels of alcohol consumption only resulted in small Assuming the romantic relationship in our dataset is of an intimate level, we can find out if this statement holds true. It’s called the datasets subreddit, or /r/datasets. We would oversample since we have limited data. For the data exploratory exercise, we choose to examine three columns: They are: Exploratory Data Analysis on the Student Alcohol Consumption dataset (Code) », address - U/R for urban or rural respectively, famsize - LE3/GT3 for less than or greater than three family members, Pstatus - T/A for living together or apart from parents, respectively, Medu - 0 (none) / 1 (primary-4th grade) / 2 (5th - 9th grade) / 3 (secondary) / 4 (higher) for mother's education, Fedu - 0 (none) / 1 (primary-4th grade) / 2 (5th - 9th grade) / 3 (secondary) / 4 (higher) for father's education, Mjob - 'teacher', 'health' care related, civil 'services', 'at_home' or 'other' for the student's mother's job, Fjob - 'teacher', 'health' care related, civil 'services', 'at_home' or 'other' for the student's father's job, reason - close to 'home', school 'reputation', 'course' preference or 'other' for the choice of school, guardian - mother/father/other as the student's guardian, traveltime - 1 (<15mins) / 2( 15 - 30 mins) / 3 (30 mins - 1 hr) / 4 (>1hr) for time from home to school, studytime - 1 (<2hrs) / 2 (2 - 5hrs) / 3 (5 - 10hrs) / 4 (>10hrs) for weekly study time, failures - 1-3/4 for number of class failures (if more than 3 than record 4), schoolsup - yes/no for extra educational support, famsup - yes/no for family educational support, paid - yes/no for extra paid classes for Math or Portuguese, activities - yes/no for extra-curricular activities, nursery - yes/no for whether attended nursery school, higher - yes/no for desire to continue studies, internet - yes/no for internet access at home, romantic - yes/no for relationship status, famrel - 1-5 scale on quality of family relationships, freetime - 1-5 scale on how much free time after school, goout - 1-5 scale on how much student goes out with friends, Dalc - 1-5 scale on how much alcohol consumed on weekdays, Walc - 1-5 scale on how much alcohol consumed on weekend, absences - 0-93 amount of absences from school, the amount of time a student studies (studytime, column 14), does the student join any extra paid classes (paid, column 18), does the student participate in any extra co-curricular activities (activities, column 19), if the student is involved in any romantic relationship (romantic, column 23), how is the student's family relationship quality (famrel, column 24), the tendency of the student to go out with friends (goout, column 26), weekday alcohol consumption (Dalc, column 27), weekend alcohol consumption (Walc, column 28).
Master Droit En Alternance Toulouse, Yakuza Love Theory Scan, Prix Mangue Grand Frais, épreuve Maths Expertes Bac, Centre De Radiologie Evry, Quand On Arrive En Ville Partition Piano, Rolando Villazón Radio Classique, A Qui Appartiennent Les îles Senkaku, Chiot Malinois à Donner, Master Droit Du Patrimoine Rouen, The Nutcracker Partition Piano,