STA303 Winter 2021: A note from the Prof

Information about the final project for STA303 to help prospective employers/clients, interested 2nd cousins and whoever else better understand these student achievements and skills.

If you’re reading this, you may have been directed to this link by a former student of mine in order to provide you with more information about the large project they completed in my course. This page doesn’t have information about specific student projects, but should give you a sense of the kinds of skills students who completed this project will have. You may wish to skip directly to the skills description.

One sentence description: Students merged, wrangled, visualized, summarized and modelled data on hiring, salary and promotion to meet a client brief, and reported on their methods and findings appropriately for a general executive audience and, separately, for a technical audience.

But first, some context

STA303: Methods of Data Analysis II is a course delivered by the Department of Statistical Sciences at the University of Toronto.

In Winter 2021 (January to April), STA303 was run completely online due to COVID-19. Students rose to this challenge remarkably well, but it was a significant challenge to wellbeing and focus, none-the-less.

STA303 is a communication- and application-focused course where students learn:

The models covered include: linear mixed models, generalized linear models, generalized linear mixed models and generalized additive models.

Project task summary

I’ve included full information on this assessment at the end of this page, but it is not needed to understand it. The project was worth 30% of students’ final grades and students could choose to complete it individually or as a team. Teamwork was recommended, but the task was the same either way. Teams were not required as students had good reasons for completing individually, such as being located in a challenging time zone, lack of access to internet appropriate for calling, or other caring or work obligations that made scheduling meetings untenable.

Students were consulting for Black Saber Software, analyzing their hiring and employee data and tasked with creating a report appropriate for the Board of Directors on the topic of gender parity in hiring, wages and promotion. Note: Black Saber isn’t a real company, it would be massively reckless to provide 600 people with employee data with this level of detail. It would be very easy to identify individuals from data like this. That said, this dataset was simulated based on real research and employment trends and draws on my own consulting experiences.

Each team or individual created a consulting company for the purposes of this activity and to register their group/individual status completed a pseudo-NDA, of which the only real part was reminding them that they had already agreed to several codes of conduct as part of their enrolment at U of T and made clear my expectations of their professionalism. It also gave them a chance to familiarize themselves with a common requirement for consulting.

The deliverable

The final submission was a report that included:

Students were tasked to answer the research questions posed by the client, communicate their findings in ways appropriate to the audience for each section of the report, choose appropriate methods and create professional visualizations and tables to explain their results.

Reports were written in a reproducible R Markdown file (a code and text file type popular for use with the programming language R). Students were provided with a basic template that they could choose to use.

Skills demonstrated

Students who completed this project to a reasonable standard can do the following (organized under broad headings):

Statistical reasoning and knowledge

Ethical professional practice

Writing

Programming

General

Full assessment instructions and rubric

Note: If you’re a fellow instructor and interested in using any of this, reach out! I am happy to share originals, etc.

“Client” emails

Email 1 (text for screen readers/easier reading follows)

Hi,

Thanks again for the great work on the Pax Aurora project last year. I’m reaching out with another piece of work I’m hoping you can take on.

Internally, people have been raising concerns about potential bias in our hiring and remuneration processes. I don’t think there is much to it, but I want to be able to report to the Board that we’ve had external statisticians take a look and that everything is totally above board. That said, several companies in our area have had bad press about this recently, nasty complaints from staff to reporters, etc. and I want to ensure we’re out ahead of any potential issues. Obviously, I value your discretion and don’t want this mentioned more widely. NDAs will be required from anyone working on the project.

I’ll be able to provide: Hiring data for our new grad program (we have a new selection pipeline that is AI-automated up to the final interviews), and Data about promotion and salary for our entire staff. Let me know if you have any questions, and if you’re ready to get the ball rolling, please complete this NDA (docx or pdf) and submit them through our NDA portal. After that, I’ll reach out with the data. Would be great if the final deliverable could have a summary targeted for the Board of Directors, as well as a more technical piece for our data team to look over as well.

I assume same fee structure as last time applies? That will be fine on our end. Will pay hourly with a cap at 40 hours unless otherwise discussed.

Best,

Gideon Blake

Chief People Officer | Black Saber Software

Email 2 (text for screen readers/easier reading follows)

Email from Gideon:

Hi again, great to know that you’re on board for this project.

The data team set something up for you to get the data. I’ve forwarded the details they sent, see below, all the descriptions of the data are there.

According to the legal team, it was better to only give you data for our current employees, so those are the only people in that data. Generally, we have a pretty good retention rate, though.

One of my People and Talent guys mentioned that we don’t collect data on ethnicity/race but that the team is considering it for EDI initiatives after a conference they went to, and he said it might also be related to salary. Once again, I don’t think anything like that will be an issue for us.

The board wants to hear that our hiring, promotion and salary processes are all fair, and based on talent and value to the company. They’re especially interested in the hiring pipeline as we’ve been trialling an AI service that screens applications and then invites candidates to submit a pre-recorded video that the system rates for relevant features. Candidates are also invited to do a timed technical coding task and submit a writing sample and these are also assessed by the system. More on that in the docs, too.

Let me know if you have any questions, looking forward to the report on April 21.

Best,

Gideon Blake
Chief People Officer | Black Saber Software 

Forwarded email:

Hi G,

The team has prepped the documentation you asked for, the consultants can access the data by running `devtools::install_github(“sta303-bolton/sta303project”)`. Data dictionary attached.

Best,
V

Valin Hess
VP Data | Black Saber Software 

R package

The project package can be installed by running: devtools::install_github("sta303-bolton/sta303project"). Also viewable on https://github.com/sta303-bolton/sta303project.

The package contains:

Additional files

Submission information

Information Note
Name Final project
Type Type 1
Value 30%
Due Wednesday, Apr 21 at 6:00 p.m. ET
Instructions and submission link

Instructions: https://q.utoronto.ca/courses/204826/assignments/506357

Submission: On Quercus.

  • Submission must include PDF of the report AND the Rmd that created the PDF, as well as any additional files (e.g. images or additional R scripts ). You do not need to include the original data.

  • One team member will submit on behalf of the group but all members should be able to see the submission status.

Late submission policy

For assessments in Type 1, late assessments will still be accepted through Crowdmark, but only if they are your first submission. You will lose 10 percentage points on the assessment, per day, with submissions accepted for up to 3 days after the due date. I.e., 72 hours after the initial due date.

Grace period: Projects submitted before 8:00 p.m. ET on April 21 will have no late penalty applied.

Accommodations and extension policy

Please note that as this assessment is due near the very end of the assessment period, there is limited flexibility for accommodations and extensions. While I would love to be able to otherwise, a busy final assessment period will not be sufficient reason to grant an extension.

If you miss a type 1 assessment due to illness or a serious personal emergency, please complete this formwithin ONE week of the due date of the assignment. Upon receipt of your form, we will contact you via email within 3 business days to arrange an accommodation. The sooner you can reach out, the better.

Regrade requests Each member of your company must complete a declaration (download the DOCX or PDF version here) and submit these, along with a detailed justification in this form. See the form and declaration for full instructions.