extract_speeches_from_record.Rd
The function extract speeches from the Riksdagen Records based on the definition of a speech as the utterances (<u>) coming after a speaker introduction (<note type="speaker">). The function returns the segments of the speech.
For multiple files, parallelism can be used.
extract_speeches_from_record(record_path)
extract_speeches_from_records(
record_paths,
mc.cores = getOption("mc.cores", detectCores() - 1L),
...
)
assert_and_complement_paths(record_paths)
The function returns a tibble
data frame with the following variables:
The id of the record.
The speech number in the record.
The id of the XML node to the introduction of the speaker.
The id of the person giving the speech.
The id of the XML node for the segment of the speech.
The speech segment as plain text.
The function checks if there is a file at the record_path. If its not a file, it test to complement with the corpora path