The function extract speeches from the Riksdagen Records based on the definition of a speech as the utterances (<u>) coming after a speaker introduction (<note type="speaker">). The function returns the segments of the speech.

For multiple files, parallelism can be used.

extract_speeches_from_record(record_path)

extract_speeches_from_records(
  record_paths,
  mc.cores = getOption("mc.cores", detectCores() - 1L),
  ...
)

assert_and_complement_paths(record_paths)

Arguments

record_path

a file path to a record XML file

record_paths

a vector of file paths to a record XML file

mc.cores

the number of cores to use (Linux and Mac only) in mclapply. Defaults to available cores - 1.

...

further arguments supplied to mclapply.

Value

The function returns a tibble data frame with the following variables:

record_id

The id of the record.

speech_no

The speech number in the record.

speech_id

The id of the XML node to the introduction of the speaker.

who

The id of the person giving the speech.

id

The id of the XML node for the segment of the speech.

text

The speech segment as plain text.

Details

The function checks if there is a file at the record_path. If its not a file, it test to complement with the corpora path