Skip to content

Handle votes with joint responsible committees #1122

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 11, 2025

Conversation

tillprochaska
Copy link
Collaborator

@tillprochaska tillprochaska commented Mar 16, 2025

A procedure can have multiple responsible committees (see Rules of Procedure and an example procedure).

  • I’ve updated the procedure scraper to properly handle procedures with a joint committee responsible and added tests.
  • I’ve updated the CSV export and removed the responsible_committee_code in the votes.csv table. Instead, there are now separate tables committees.csv and responsible_committee_votes.csv.

After deployment, we need to re-scrape the procedure files and then re-aggregate the votes.

Fixes #1121

@tillprochaska tillprochaska force-pushed the multiple-responsible-committees branch 2 times, most recently from 6f66b36 to 57f9698 Compare March 16, 2025 18:05
@tillprochaska tillprochaska marked this pull request as ready for review March 16, 2025 18:05
@tillprochaska tillprochaska requested a review from linusha March 16, 2025 18:05
Copy link
Collaborator

@linusha linusha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More of a conceptual objection: I think extracting the responsible committees from the votes.csv severely limits the usefulness of that file and makes finding stuff on specific topics (or maybe better: limiting the list of votes to sensible candidates) much harder.
I know that one can just join the two tables, but I am thinking about a quick Excel-thing here.

There are multiple ways to handle this imo:

  1. One can say does not matter and just do it anyways, and rethink this once negative feedback comes in. I think relying on this is not a good solution, in this case.
  2. I would need to spend some more time on reading the procedural rules, but as long as there is a hard cap on the number of responsible committees, we could just include several columns in the votes.csv, most of them empty.
  3. I even would be okay with a solution that just has one columns and string-concatenates them ("LIBE, IMCO", as an example).

Side-note: As its cheap, should we also collect the Committees for opinion? Probably should not even be in this PR and here I would also be totally fine with extracting that in a separate table in any case, possibly also duplicating information in the case we decide to leave the responsible committees in the votes.csv. We have duplicated information in the export anyways. Just wanted to raise the matter here as it seemed fitting.

@tillprochaska tillprochaska force-pushed the multiple-responsible-committees branch 5 times, most recently from faa2cf2 to 7eae4cf Compare April 23, 2025 15:15
@tillprochaska tillprochaska force-pushed the multiple-responsible-committees branch from 7eae4cf to e1faf83 Compare April 23, 2025 15:18
Copy link
Collaborator

@linusha linusha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think the way we separate the tables here now at least point towards an inconsistency in our mental-model of possible target audiences for the export.

I don't think spending time or energy on resolving this makes sense. Can be merged.

Comment on lines +601 to +616
with self.committees.open() as committees:
exp = func.json_each(Vote.responsible_committees).table_valued("value")
query = (
select(func.distinct(exp.c.value)).select_from(Vote, exp).order_by(exp.c.value)
)
committee_codes = Session.execute(query).scalars()

for committee_code in committee_codes:
committee = Committee[committee_code] if True else None
committees.write_row(
{
"code": committee.code,
"label": committee.label,
"abbreviation": committee.abbreviation,
}
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for my understanding - this means that the committee table will not be guaranteed to contain all committes, but all committees that were responsible for a vote that we collected, correct?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, exactly. The list of committees we obtain from the Publications Office contains every committee that has ever exists (some of them which have existed for only a very short amount of time decades ago). This is similar to what we do for e.g. EuroVoc concepts as well, we only export what’s actually relevant to the dataset.

@tillprochaska tillprochaska merged commit a9b8057 into main May 11, 2025
4 checks passed
@tillprochaska tillprochaska deleted the multiple-responsible-committees branch May 11, 2025 20:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

There can be more than one responsible committee
2 participants