In this post we delve into the get_work() function and the parameters you can specify to query for more specific information.
Example 1 - Basic functionalities
First, we will query for all debates on current affairs (one possible type of parliamentary work - and the default option) organized in January-August 2022. We use the get_work() function. Indeed, in contrast to the get_mp function, you always need to specify a date range using the following arguments:
As an enormous amount of data can be retrieved through the get_work() function, it is highly recommended to test your query on a limited date range before expanding the date range to comprise multiple months to years.
Note also that we include a chunk of code that lets you inspect the first 10 observations in the data frame created. This allows you to get a feel of the data obtained before diving in deeper. This is how the resulting data frame looks like:
Inspecting this data frame, shows you what the default parameters deliver. Namely, the “details” about “debates” in current affairs in “plenary” sessions for the “dates” specified. Though, much more is possible!
The get_work() function has the following arguments:
date_range_from: the start date YYYY-MM-DD
date_range_to: the end date YYYY-MM-DD
fact: options include “written_questions”, “debates”, “oral_questions_and_interpellations”, “parliamentary_initiatives”, and “council_hearings”
type: options include “details”, “speech” and “documents”
plen_comm: choose between plenary “plen” and commission “comm” sessions
use_parallel: Boolean of which the default value is set to TRUE, select FALSE in case you do not have multiple workers to speed up the calls made
raw: Boolean of which the default value is set to FALSE, select TRUE in case you wish to retrieve the unprocessed data
extra_via_fact: Boolean of which the default value is set to FALSE, select TRUE in case you wish to retrieve the unique identifiers of documents outside the date range specified but linked to the “fact”
two_columns_pdf: Boolean of which the default value is set to FALSE, select TRUE in case you encounter PDFs divided into two text columns. A standard method cannot deal with this, this method ensures that this is read correctly
So let’s, for instance, query for some specific type of parliamentary work conducted in 2021. Here, we opt to retrieve data about parliamentary initiatives (fact="parliamentary_initiatives") discussed in the plenary (plen_comm="plen") sessions (date_range_from="2021-01-01"and date_range_to="2021-12-31"). We also specify that we are interested in the type="details" as this will deliver us a data frame containing the essentials about each parliamentary initiative.
Now we can inspect the data a bit more. Something you can check, for example, is which topics are covered in the parliamentary initiatives at that time. We use some dplyr code, in this case count() to quickly count the unique values in the column result_thema_1.
Example 2 - Retrieving documents related to parliamentary work
Here, we opt to retrieve data about written questions (fact="written questions") issued in March 2022 (date_range_from="2022-03-01"and date_range_to="2022-03-31"). Importantly, we specify that we are interested in type="document" as this will deliver us a data frame containing the documents related to each written question in the selection.
Note that to get a good feel of the data, it is often recommended to sort the data frame by id_fact (e.g. when retrieving documents about fact="parliamentary_initiatives, multiple documents are associated to one initiative). This thus gives you a clearer overview of all documents related to one unique ‘activity’.
Example 3 - Retrieving speech related to parliamentary work
For this example, we opt to retrieve data about oral questions and interpellations (fact="oral_questions_and_interpellations") discussed in the committee meetings (plen_comm="comm") of March 2022 (date_range_from="2022-03-01"and date_range_to="2022-03-31"). Next, we specify that we are interested in the type="speech" as this will deliver us a data frame containing the ‘speech’ or spoken word by MPs and ministers related to each oral question in the selection.
Now we are ready to join the oq_speechdata frame with the oq_details data frame containing the ‘speech’ per oral question. We do this based on a combination of two unique identifiers: id_fact and journaallijn_id.
Finally, we can also search these spoken words for specific key words occurring. We again use the search_terms() function. We opt to retrieve data about the PFOS/PFAS debate in Belgium (search_terms = c("PFOS", "PFAS")).