Extract basic section information from XML node

Extracts fundamental section-level information from a Stanford course section XML node into a structured tibble. This function processes the core attributes of a course section, such as term information, component type, and enrollment data.

Usage

extract_section_info(section, course_id)

Arguments

section

An xml2 node object representing a single course section. Expected to contain child nodes for:

term
termId
sectionNumber
component
classId
currentClassSize
maxClassSize

course_id

Character string. The parent course identifier used to link section data back to the course.

Value

A tibble with one row containing:

objectID: Character. Course identifier (from course_id)
term: Character. Academic term (e.g., "Autumn", "Winter")
term_id: Character. Unique term identifier
section_number: Character. Section number within the course
component: Character. Section type (e.g., "LEC", "DIS", "LAB")
class_id: Character. Unique identifier for this section
current_class_size: Numeric. Current number of enrolled students
max_class_size: Numeric. Maximum enrollment capacity

Details

The function extracts the following section attributes using XPath:

Term details (term name and ID)
Section identification (section number, class ID)
Component type (e.g., lecture, discussion)
Enrollment information (current and maximum class sizes)

All text fields are extracted using xml_find_first() to get the first matching node. Enrollment numbers are converted to numeric format.

Error Handling

The function assumes all required nodes are present in the XML. Missing nodes will trigger an error through the tryCatch block.

Examples

if (FALSE) { # \dontrun{
section_node <- xml2::xml_find_first(course_node, ".//section")
course_id <- "CS106A-2023-2024"
section_info <- extract_section_info(section_node, course_id)
} # }