Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More light-weight LMScores proto. #98

Open
agutkin opened this issue May 12, 2021 · 6 comments
Open

More light-weight LMScores proto. #98

agutkin opened this issue May 12, 2021 · 6 comments
Assignees
Labels
enhancement New feature or request

Comments

@agutkin
Copy link
Collaborator

agutkin commented May 12, 2021

In a simple scenario there is no need to return symbols every time LMScores proto is requested since these are generally immutable if the scores don't need to be sorted.

@agutkin agutkin added the enhancement New feature or request label May 12, 2021
@agutkin agutkin self-assigned this May 12, 2021
@agutkin
Copy link
Collaborator Author

agutkin commented May 12, 2021

I suppose we can introduce the flags controlling this behavior in the request protos.

@roark-google

@aadhamm
Copy link

aadhamm commented Apr 2, 2023

Dear @agutkin,

if no one worked on your suggestion yet, I can propose working on it by adding a new optional field to the LMScores message that controls whether or not to include the symbols and implement test cases using the Google Test framework to validate the solution.

if that suits you, it will be a great opportunity to contribute to one of google's research repos and I will be happy to assign me the task.

@agutkin
Copy link
Collaborator Author

agutkin commented Apr 3, 2023

Hi Adham,

Thanks for volunteering! Unfortunately we cannot accept external pull requests -- this is because it is impossible to re-integrate the external changes back into our internal repository automatically -- this is the way MozoLM was set up.

One way around this is to directly (but manually) integrate the requested changes into our internal system and then export them to Github. In this scenario you send a PR, I review it and integrate it manually internally and re-export. Not great.

@aadhamm
Copy link

aadhamm commented Apr 4, 2023

Still sounds great, have no issue with this approach, as I am a fresh graduate and I am very motivated to learn and contribute to open source and real cases, to help increase my skills and find a job.

I will begin with implementing my proposal, test it and send a PR and then we evaluate the situation, if that suits you.

Thank you.

@aadhamm
Copy link

aadhamm commented Apr 18, 2023

@agutkin

I have made the required changes and made test cases using the Google test framework; all have passed while the message can be serialized and deserialized correctly using the modified code., I will make a pull request to review it.

Test Cases

  • f the symbols field is not needed, it can be omitted and the message will still work correctly.
  • If the symbols field is needed, it can be included and the message will work correctly.
  • The probabilities field should always be present and have the correct number of elements.
  • The normalization field should be present and have the correct value.

Screenshot from 2023-04-18 06-22-53

#include <gtest/gtest.h>
#include "mozolm/models/lm_scores.pb.h"

TEST(LMScoresTest, NoSymbolsField) {
  mozolm::LMScores scores;
  scores.add_probabilities(0.1);
  scores.set_normalization(1.0);

  // Serialize and deserialize the message to ensure it works without the symbols field.
  std::string serialized;
  ASSERT_TRUE(scores.SerializeToString(&serialized));
  mozolm::LMScores deserialized_scores;
  ASSERT_TRUE(deserialized_scores.ParseFromString(serialized));
  ASSERT_EQ(1, deserialized_scores.probabilities_size());
  ASSERT_EQ(0.1, deserialized_scores.probabilities(0));
  ASSERT_EQ(1.0, deserialized_scores.normalization());
  ASSERT_EQ(0, deserialized_scores.symbols_size());
}

TEST(LMScoresTest, WithSymbolsField) {
  mozolm::LMScores scores;
  scores.add_symbols("a");
  scores.add_probabilities(0.1);
  scores.set_normalization(1.0);

  // Serialize and deserialize the message to ensure it works with the symbols field.
  std::string serialized;
  ASSERT_TRUE(scores.SerializeToString(&serialized));
  mozolm::LMScores deserialized_scores;
  ASSERT_TRUE(deserialized_scores.ParseFromString(serialized));
  ASSERT_EQ(1, deserialized_scores.probabilities_size());
  ASSERT_EQ(0.1, deserialized_scores.probabilities(0));
  ASSERT_EQ(1.0, deserialized_scores.normalization());
  ASSERT_EQ(1, deserialized_scores.symbols_size());
  ASSERT_EQ("a", deserialized_scores.symbols(0));
}

TEST(LMScoresTest, ProbabilitiesField) {
  mozolm::LMScores scores;
  scores.add_probabilities(0.1);
  scores.add_probabilities(0.2);
  scores.set_normalization(1.0);

  // Ensure the probabilities field has the correct number of elements.
  ASSERT_EQ(2, scores.probabilities_size());
  ASSERT_EQ(0.1, scores.probabilities(0));
  ASSERT_EQ(0.2, scores.probabilities(1));
}

TEST(LMScoresTest, NormalizationField) {
  mozolm::LMScores scores;
  scores.add_probabilities(0.1);
  scores.set_normalization(2.0);

  // Ensure the normalization field has the correct value.
  ASSERT_EQ(2.0, scores.normalization());
}

int main(int argc, char **argv) {
  testing::InitGoogleTest(&argc, argv);
  return RUN_ALL_TESTS();
}

aadhamm added a commit to aadhamm/mozolm that referenced this issue Apr 18, 2023
making changes according to issue google-research#98
@agutkin
Copy link
Collaborator Author

agutkin commented Apr 18, 2023

Thank you @aadhamm!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants