Extract schedule and instructor data from a section XML node

Processes schedule and instructor information from a Stanford course section XML node. This function extracts meeting times, locations, and detailed instructor information, combining multiple instructors' data into semicolon-separated lists.

Usage

extract_schedule_data(section)

Arguments

section

An xml2 node object representing a course section. Expected to contain child nodes for:

schedule nodes, each containing:
- days
- startTime
- endTime
- location
- Optional instructor nodes, each containing:
  - name
  - sunet
  - role

Value

A tibble containing schedule information, or NULL if no schedules are found. The tibble includes:

Basic schedule information:

section_id: Character. Section identifier (from classId)
days: Character. Days of the week (e.g., "MonWedFri")
start_time: Character. Start time
end_time: Character. End time
location: Character. Meeting location

Instructor information (NA if no instructors):

instructors: Character. Combined strings in "name (role)" format
instructor_names: Character. Semicolon-separated list of names
instructor_sunets: Character. Semicolon-separated list of SUNet IDs
instructor_roles: Character. Semicolon-separated list of roles

Details

The function performs the following steps:

Locates all schedule nodes within the section
For each schedule:
- Extracts basic schedule information (days, times, location)
- Processes instructor information if present
- Combines multiple instructors' data into consolidated strings

Instructor information is formatted in several ways:

Combined format: "name (role)"
Separate fields: names, SUNet IDs, and roles in semicolon-separated lists

If no instructors are found, instructor fields are set to NA.

Error handling

If schedule data extraction fails, the function throws an error with details.

Examples

if (FALSE) { # \dontrun{
section_node <- xml2::xml_find_first(course_node, ".//section")
schedule_data <- extract_schedule_data(section_node)
} # }