Skip to contents

Parses XML data containing Stanford University course information into a structured data frame. The function processes detailed course data including basic course information, section details, schedules, and instructor information.

Usage

process_courses_xml(xml_doc, department)

Arguments

xml_doc

An xml2 document object containing Stanford course data. Expected to have a structure with course nodes containing section and schedule information.

department

Character string. Department code (e.g., "CS") used to identify the department for all courses in the XML.

Value

A tibble containing course information with columns:

  • objectID: Character. Unique course identifier

  • year: Character. Academic year

  • subject: Character. Subject code

  • code: Character. Course number

  • title: Character. Course title

  • description: Character. Course description

  • units_min: Numeric. Minimum units

  • units_max: Numeric. Maximum units

  • Additional columns for section, schedule, and instructor information when available

  • department: Character. Department code

NULL if no courses are found (with a warning)

Details

The function processes course data in several stages:

  1. Locates all course nodes in the XML using XPath

  2. For each course:

    • Extracts basic course information (ID, title, units, etc.)

    • Extracts section data including schedules and instructors

    • Joins section data with basic course information

  3. Adds department code to all courses

Course sections may include:

  • Term information

  • Class components (e.g., lecture, discussion)

  • Schedule details (days, times, locations)

  • Instructor information

  • Enrollment data

See also

Examples

if (FALSE) { # \dontrun{
xml_data <- xml2::read_xml("cs_courses.xml")
cs_courses <- process_courses_xml(xml_data, "CS")
} # }