Wikipedia url#56
Conversation
…ikipedia_urls. There is still a function to add urls and there is now one to remove them as well. Add url doesnt need a prefix, it will auto put one, but a custom one may be included and defualt will not be added (if wanted forever whatever reason). The remove function acccepts a inputs with prefix+url or just url.
|
You've probably heard that there are only two really difficult problems in programming: cache invalidation, naming things, and off-by-one errors. Will look through this tomorrow, but in https://github.com/gutenbergtools/pglaf-workflow which is the code that Robert has adapted to carry wikipedia urls, the name of the field is just WIKIPEDIA_URL. So in this code, we don't want to introduce new names. Adding "book_" to "wikipedia_url" seems redundant. The DublinCore object is already about a book. If we introduce other types of wikipedia url it would likely modify something else, like author or subject. |
|
Yeah I added it because we have authror wikipedia urls, a bit semantic but I wanted to distinguish so no one passed author wikipedia URLS via the dispatcher to the section thats for books, I can easily change it you want. |
|
yes, change it |
eshellman
left a comment
There was a problem hiding this comment.
just reviewed changes in DublinCore
|
Thanks Ill check them out and make the edits. |
eshellman
left a comment
There was a problem hiding this comment.
Our objective is to be able to deal with bare wiki urls on the Dublin core object and have them be stored as prefixed urls in the database text field. I think you've coded it so the prefixes are in both.
| return wikipedia_url(*checked) if checked else None | ||
|
|
||
|
|
||
| def format_wikipedia_url(url_or_text): |
There was a problem hiding this comment.
to make clear what this does, maybe name this "format_wikipedia_url _marc" or add a comment that this is formatting the text for a wikipedia url marc attribute.
| self.wikipedia_urls = [] | ||
|
|
||
|
|
||
| def add_wikipedia_url(self, url_or_text): |
There was a problem hiding this comment.
isn't this just repeating the code above in format_wikipedia_url ?
| self._project_gutenberg_id = None | ||
| self.request_key = '' | ||
| self.scan_urls = set() | ||
| self.wikipedia_urls = [] |
There was a problem hiding this comment.
| self.wikipedia_urls = [] | |
| self.wikipedia_urls = set() |
using a set here would prevent having duplicates and make deletions simpler
| """Sync MARC 500 wiki rows to wikipedia_urls (matched by lang and title).""" | ||
| if not self.book: | ||
| return | ||
| wanted = {check_wikipedia_url(text): text |
There was a problem hiding this comment.
this code is very confusing. it looks to me like the created attributes for added wiki urls do not have the wikipedia prefix added.
DublinCoreMapping.pynamedget_wikipedia_urls()andadd_wikipedia_url. The former returns a list of wikipedia urls for a given author, and the latter adds a wikipedia url to a books attributes in the db.