Skip to content

Wikipedia url#56

Open
zachjesus wants to merge 11 commits into
masterfrom
wikipedia-url
Open

Wikipedia url#56
zachjesus wants to merge 11 commits into
masterfrom
wikipedia-url

Conversation

@zachjesus

Copy link
Copy Markdown
Contributor
  • Added two functions to DublinCoreMapping.py named get_wikipedia_urls() and add_wikipedia_url. The former returns a list of wikipedia urls for a given author, and the latter adds a wikipedia url to a books attributes in the db.

…ikipedia_urls. There is still a function to add urls and there is now one to remove them as well. Add url doesnt need a prefix, it will auto put one, but a custom one may be included and defualt will not be added (if wanted forever whatever reason). The remove function acccepts a inputs with prefix+url or just url.
@eshellman

Copy link
Copy Markdown
Contributor

You've probably heard that there are only two really difficult problems in programming: cache invalidation, naming things, and off-by-one errors.

Will look through this tomorrow, but in https://github.com/gutenbergtools/pglaf-workflow which is the code that Robert has adapted to carry wikipedia urls, the name of the field is just WIKIPEDIA_URL. So in this code, we don't want to introduce new names. Adding "book_" to "wikipedia_url" seems redundant. The DublinCore object is already about a book. If we introduce other types of wikipedia url it would likely modify something else, like author or subject.

@zachjesus

Copy link
Copy Markdown
Contributor Author

Yeah I added it because we have authror wikipedia urls, a bit semantic but I wanted to distinguish so no one passed author wikipedia URLS via the dispatcher to the section thats for books, I can easily change it you want.

@eshellman

Copy link
Copy Markdown
Contributor

yes, change it

@eshellman eshellman left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just reviewed changes in DublinCore

Comment thread libgutenberg/DublinCore.py
Comment thread libgutenberg/DublinCore.py Outdated
Comment thread libgutenberg/DublinCore.py
Comment thread libgutenberg/DublinCore.py Outdated
Comment thread libgutenberg/DublinCore.py Outdated
Comment thread libgutenberg/DublinCore.py Outdated
@zachjesus

Copy link
Copy Markdown
Contributor Author

Thanks Ill check them out and make the edits.

@eshellman eshellman left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can talk at 2

Comment thread libgutenberg/Models.py
Comment thread libgutenberg/Models.py
Comment thread libgutenberg/DublinCoreMapping.py Outdated
Comment thread libgutenberg/DublinCoreMapping.py Outdated

@eshellman eshellman left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our objective is to be able to deal with bare wiki urls on the Dublin core object and have them be stored as prefixed urls in the database text field. I think you've coded it so the prefixes are in both.

return wikipedia_url(*checked) if checked else None


def format_wikipedia_url(url_or_text):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to make clear what this does, maybe name this "format_wikipedia_url _marc" or add a comment that this is formatting the text for a wikipedia url marc attribute.

self.wikipedia_urls = []


def add_wikipedia_url(self, url_or_text):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't this just repeating the code above in format_wikipedia_url ?

self._project_gutenberg_id = None
self.request_key = ''
self.scan_urls = set()
self.wikipedia_urls = []

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self.wikipedia_urls = []
self.wikipedia_urls = set()

using a set here would prevent having duplicates and make deletions simpler

"""Sync MARC 500 wiki rows to wikipedia_urls (matched by lang and title)."""
if not self.book:
return
wanted = {check_wikipedia_url(text): text

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this code is very confusing. it looks to me like the created attributes for added wiki urls do not have the wikipedia prefix added.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants