-
Notifications
You must be signed in to change notification settings - Fork 512
Allow plain text .qmd files as source notebooks
#1521
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…not key: vals from lists
- Adopt Quarto for documentation and notebooks making use of [this nbdev PR](AnswerDotAI/nbdev#1521) that allows full `.qmd` driven packages - Convert all `ipynb` files to `.qmd` format - Use nbdev_docs to generate the documentation website - Adopt logger that solves #3 (#3)
|
What are the next steps here? |
|
I may be having challenges with this, but just wanted to check to see if you've seen this before or if it's something external to your fork: nbdev_proc_nbs: Any thoughts? What else would you like to see to help debug? |
|
I got this PR to work for my personal use cases and didn't see much initial interest on this PR to bring it into the main branch. Seems like there's gotten to be a bit more traction since I first made the PR, and I'm happy to push this forward.
From my side, it has been awhile since I've rebased with the main. I will do that and see what bugs/clashes have come up since then and try to resolve those. Beyond that it's up to the maintainers to see if this is worth incorporating into the main branch (I think it definitely is, but I am biased. The @TinasheMTapera I am not positive, but this bug looks a lot like the weird edge cases I encountered when trying to parse .qmd files as valid nbdev source. Could you share a minimal .qmd file that reproduces this bug? I'm a bit new at contributing to larger OSS projects on github, but I feel that this bug doesn't need its own issue since it is pertinent only to this PR. |
|
@bhoov you haven't requested Jeremy Howard or any of the maintainers to review the PR. I recently asked in discord, why this PR was not reviewed and the answer I received from Jeremy was:
|
|
@bhoov I was able to figure out the problem, and it was not related to your PR, but rather to an edge case of |
Github actually does not allow me to request reviewers or assign people to this repository, otherwise I would have. @jph00 can you review this PR? :)
|
|
Will do! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this. Tbh it's a long way from being something that we could merge at this stage. Rather than try to get this PR into shape, my suggestion would be to gradually add a series of PRs which have the minimal code necessary for each piece. Start with something that adds the most useful bit you need with the least amount of code.
| if stdin: return _write(f_in=sys.stdin, f_out=sys.stdout) | ||
| if fname is None: fname = get_config().nbs_path | ||
| for f in globtastic(fname, file_glob='*.ipynb', skip_folder_re='^[_.]'): _write(f_in=f, disp=disp) | ||
| for f in globtastic(fname, file_re=r'.*\.ipynb$', skip_folder_re='^[_.]'): _write(f_in=f, disp=disp) # Don't clean .qmd files |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we leave the glob as it was then?
| def cell(self, cell): | ||
| if cell.cell_type=='raw': self._update(_fm2dict, cell) | ||
| elif cell.cell_type=='markdown' and 'title' not in self.fm: self._update(_md2dict, cell) | ||
| elif (cell.cell_type=='markdown' and 'title' not in self.fm): self._update(_md2dict, cell) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure this needs changing?
| _re_fm_nb = re.compile(_RE_FM_BASE+'$', flags=re.DOTALL) | ||
| _re_fm_md = re.compile(_RE_FM_BASE, flags=re.DOTALL) | ||
|
|
||
| _re_fm_title_desc = re.compile(r'^#\s+(\S.*?)(?:\n|$)(?:\s*\n)*(?:>\s+(\S.*?)(?:\n|$)(?:\s*\n)*)?', flags=re.MULTILINE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has gotten far too complicated. Perhaps reduce the scope to only support frontmatter titles etc in qmd? Try to get to a PR that adds as little code as possible, and keeps the code no more complex than what we had before.
| check_fname = path/".last_checked" | ||
| last_checked = os.path.getmtime(check_fname) if check_fname.exists() else None | ||
| nbs = globtastic(fname, file_glob='*.ipynb', skip_folder_re='^[_.]') if fname.is_dir() else [fname] | ||
| nbs = globtastic(fname, file_re=r'.*\.ipynb$|.*\.qmd$', skip_folder_re='^[_.]') if fname.is_dir() else [fname] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could simplify this in the multiple places it occurs, eg:
| nbs = globtastic(fname, file_re=r'.*\.ipynb$|.*\.qmd$', skip_folder_re='^[_.]') if fname.is_dir() else [fname] | |
| nbs = globtastic(fname, file_re=r'.*\.(ipynb|qmd)$', skip_folder_re='^[_.]') if fname.is_dir() else [fname] |
Although it feels like a glob would be nicer. What if we added a file_exts param to globtastic which took a comma separated list of file extensions? Then we could just have file_exts='ipynb,qmd'.
| "Process cells and nbdev comments in a notebook" | ||
| def __init__(self, path=None, procs=None, nb=None, debug=False, rm_directives=True, process=False): | ||
| self.nb = read_nb(path) if nb is None else nb | ||
| self.nb = read_nb_or_qmd(path) if nb is None else nb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably not worth breaking backwards incompatibility to change the name of this function. It's OK if read_nb is slightly misleadingly named :)
| import re | ||
|
|
||
| import sys,os,inspect,shutil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| import re | |
| import sys,os,inspect,shutil | |
| import sys,os,inspect,shutil,re |
| if intermediate_md_source: raw_cells.append(_qmd_to_raw_cell(intermediate_md_source, 'markdown')) | ||
|
|
||
| # Construct the final notebook dictionary | ||
| notebook_dict = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't hard code dicts like this. Use the functions we have.
|
|
||
|
|
||
| # %% ../nbs/api/15_qmd.ipynb | ||
| def _get_fence_ticks(source:str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use a proper well-tested parser - don't do this kind of thing manually.
|
|
||
| # %% ../nbs/api/15_qmd.ipynb | ||
| @call_parse | ||
| def ipynb_to_qmd( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Too much code here - and it shouldn't be printing on success. Keep things much simpler! :)
|
Thanks for the review Jeremy. I definitely prioritized feature completeness at the expense of tasteful code and minimal changes 🙃. Still new to contributing to larger existing projects, I'll get there Do you suggest closing this mammoth PR and instead introduce bite-sized PRs to the main branch? Or should I make smaller PRs to a dedicated "qmd_support" branch? |
|
I'd suggest closing this and do a single small PR, and work through that
together first.
Thanks for your patience! :)
… Message ID: ***@***.***>
|

Addresses #1461
Using quarto and its VS code extension, I find that writing
.qmdfiles to be a smoother interactive alternative to.ipynbfiles. That.qmdfiles are plain text comes with several advantages:.qmdseamlessly integrates with Cursor AI/other AI copilots..qmdis fully compatible with standard git tooling.qmdworks better with VIM keybindings.qmdfiles don't need a specialnbdev_cleanstep to remove cell metadata and outputs, meaning your source files are not altered in any way by nbdev's transpilation process (something that bothers me immensely when developing in.ipynb)Turns out,
nbdevdoesn't need many changes to implement this feature..qmdin addition to .ipynbread_qmd/write_qmdfunction for converting the.qmdto/from nbdev'sAttrDictformat. This means two-way sync (vianbdev_update) also works for.qmdand its corresponding.pyfiles.execnb'srun_allto generate outputs for the docs inside_proc/-cached .ipynb files.It looks like there have been other attempts to allow .qmd support for nbdev (see this quarto issue) or allow plain-text support (see #1499). However, .qmd support is still missing in the current version of nbdev, and the latter seems to introduce
jupytextas an additional dependency which uses the slowquarto convertcommand to pair a .ipynb and .qmd (this PR introduces a faster .qmd <-> .ipynb parser). Now you can seamlessly develop using a mix of.qmdand.ipynb, whichever you prefer, with no additional dependencies.I've written up a small tutorial for setting good VSCode defaults in
nbs/tutorials/develop_in_plain_text.qmdA few notes of caution and room for improvement:
00_core.ipynband00_core.qmd, as both of these will create the intermediate_proc/00_core.ipynbnbdev_preparewill run executable cells in.qmddocuments twice: 1x when testing and, because outputs aren't saved, 1x when generating the docs.The PR is in a pretty stable position already (see this fork of nbdev rewritten entirely using
.qmdfiles). There may be edge cases that I haven't considered, but in all I hope this is nearing a good shape to distribute.